Keyboard Shortcuts
Likes
Search
Linux AX.25 stack now toxic for connected packet connections with Ubuntu 20.04 / 5.8.0-44-generic #50
开云体育Hello Everyone,I wanted to check with the larger community to see if others are experiencing system crashes when making connected AX.25 sessions.? I have confirmed that this is NOT an RFI thing and sending unconnected (UI) transmissions (beacons) small or large is fine, and even initiating the beginning of connected session to a non-existent remote station callsign is OK with axcall, linpac, etc.? The issue is that once a valid AX.25 connection is established, I begin to receive data from the remote station and then seemingly when my station is to send an ACK packet, the machine locks hard.? No segmentation failure, no kernel panic, the Gnome3 display stays up but the screen no longer updates , nothing in the logs and even stops pinging from a different machine on the LAN.? The machine is 100% crashed and this is 100% reproducible. Is anyone else seeing this? ?? $ uname -a ?? Linux hampacket3 5.8.0-44-generic #50~20.04.1-Ubuntu SMP Wed Feb 10 21:07:30 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux --David KI6ZHD |
I am NOT seeing your described symptom on various (7) Raspberry Pis running the Linux AX.25 stack with Winlink, APRS & Chattervox at 1200 & 9600 baud. My hardware is mainly RPi 4's and sound cards using direwolf. I do NOT use Linpac or Gnome3 The two kernels I am using from the Raspberry Pi Foundation (raspbian): Linux test119022321 5.4.79-v7l+ #1373 SMP Mon Nov 23 13:27:40 GMT 2020 armv7l GNU/Linux Linux pi400 5.10.11-v7l+ #1399 SMP Thu Jan 28 12:09:48 GMT 2021 armv7l GNU/Linux What TNC device are you using? /Basil N7NIX |
开云体育Hello Basil, The issue here is *connected* AX.25 session so your Winlink connections would be impacted.? APRS would not be impacted since it's unconnected traffic and still works fine. The two kernels I am using from the Raspberry Pi Foundation (raspbian): Linux test119022321 5.4.79-v7l+ #1373 SMP Mon Nov 23 13:27:40 GMT 2020 armv7l GNU/Linux Linux pi400 5.10.11-v7l+ #1399 SMP Thu Jan 28 12:09:48 GMT 2021 armv7l GNU/Linux The way the Canonical backports fixes into older kernels is difficult to track and they aren't dating things in the changelog but I see this ???? (posted 2021-02-04) -- ? * Focal update: v5.4.55 upstream stable release (LP: #1890343) ??? - AX.25: Fix out-of-bounds read in ax25_connect() ??? - AX.25: Prevent out-of-bounds read in ax25_sendmsg() ??? ... ??? - AX.25: Prevent integer overflows in connect and sendmsg -- ? * Focal update: v5.4.44 upstream stable release (LP: #1881927) ??? - ax25: fix setsockopt(SO_BINDTODEVICE) --
On this system, it's a D710 in KISS mode connected to a Lenovo T470 (i7-7600U with 16GB RAM) running 64bit Ubuntu 20.04.? I'm 99.9% sure that if I switched away from a serial attached hardware TNC to a software-based TNC like Direwolf, I would still see the issue.? I've done more testing with reverting the kernel to older versions but the ones I've tested so far still fail as well: ? 5.8.0-44 : BAD ? 5.8.0-43 : BAD ? 5.8.0-41 : BAD ? .. ? 5.8.0-36 : BAD ? .. ? 5.4.0-66 : BAD ? .. ? 5.4.0-42 : BAD It's clear that my mistake is that after Canonical pushes a new kernel version and I apply it, I should reboot then test things to KNOW if the AX25 stack has been impacted.? I skipped a few reboot cycles as different kernels were installed so I really don't know where to start.? Even then, I'm thinking this issue might be more of a libc interface issue or something else since going way back to 5.4.0-42 dated 2020-07-10 is seeing the same issue. ?? root@hampacket3:/var/log/apt# dpkg -l | grep -e libc6 -e linux-libc ?? ii? libc6:amd64??????????????????????????????? 2.31-0ubuntu9.2?????????????????????? amd64??????? GNU C Library: Shared libraries ?? ii? libc6-dbg:amd64??????????????????????????? 2.31-0ubuntu9.2?????????????????????? amd64??????? GNU C Library: detached debugging symbols ?? ii? libc6-dev:amd64??????????????????????????? 2.31-0ubuntu9.2?????????????????????? amd64??????? GNU C Library: Development Libraries and Header Files ?? ii? linux-libc-dev:amd64?????????????????????? 5.4.0-66.74?????????????????????????? amd64??????? Linux Kernel Headers for development That libc6 was installed on "2021-01-28" It's frustrating as I have no clue where to start as the machine just locks up and doesn't give any hint of where to start troubleshooting.? Could be an 64bit thing.? Could be an SMP thing.? Dunno, --David KI6ZHD |
The issue here is *connected* AX.25 session so your WinlinkYes, I know that but your subject line implies that you already know the problem is in the AX.25 stack and that might/probably not be the case. I am using native Linux AX.25 stack with Dire Wolf DEVELOPMENT version 1.7 A (Feb 15 2021) on 4 different systems and it is working fine. The way the Canonical backports fixes into older kernels isThe following is the commit summary from kernel.org for kernel 5.11.4 and ax.25 commits. I looked at each of these commits & none of them should cause your symptom. 2020-11-20 rose: Fix Null pointer dereference in rose_send_frame() Anmol Karn 1 2020-07-23 AX.25: Prevent integer overflows in connect and sendmsg Dan Carpenter 1 2020-07-22 AX.25: Prevent out-of-bounds read in ax25_sendmsg() Peilin Ye 1 2020-07-22 AX.25: Fix out-of-bounds read in ax25_connect() Peilin Ye 1 2020-07-04 Documentation: networking: ax25: drop doubled word Randy Dunlap 1 2020-05-20 ax25: fix setsockopt(SO_BINDTODEVICE) Eric Dumazet 1 2020-04-28 Docs: networking: convert ax25.txt to ReST Mauro Carvalho Chehab 3 2019-09-24 ax25: enforce CAP_NET_RAW for raw sockets Ori Nimron 1 I suggest swapping some big components in your system including NOT using the internal D-710 TNC. Once you do that you can test against the Direwolf user land ax.25 stack & the Linux kernel mode ax.25 stack. Just did a successful connection test between two ARM machines using these kernels 5.10.17-v7l+ and 5.4.79-v7l+. Also there was a massive Linux kernel code merge from the middle of December 2020 to the first week of Jan 2021 that took the kernel from version 5.4.83 to 5.10.11 that caused a few problems. Not sure what that means for Ubuntu. (posted 2021-02-04) What TNC device are you using? /Basil N7NIX On this system, it's a D710 in KISS mode connected to a Lenovo T470 It's clear that my mistake is that after Canonical pushes a root@hampacket3:/var/log/apt# dpkg -l | grep -e libc6 -e linux-libc |