Keyboard Shortcuts
Likes
Search
RPi Kernel Panic on Bookworm
Hi David,
I have a test Raspberry Pi 4 + 64bit Bookworm setup ( 6.1.0-rpi7-rpi-v8 #1 SMP PREEMPT Debian 1:6.1.63-1+rpt1 (2023-11-24) aarch64 GNU/Linux ) using the Linux AX.25 stack + VE7FET Ax.25 libs/apps/tools + Direwolf 1.7 (with GPIO PTT) and a Syba USB sound device but it's NOT connected to a radio to send RF traffic.? That said, it's been running beacon for several *months* w/o any crashes:? Does your Pi have a radio input to hear traffic? One thing I notice below fJon's most recent post is that they are using a 32bit kernel (aka amrv6l) but 64bit binaries (Pat) where my setup is using using the 64bit kernel.? That might be an important difference.? I didn't see Jon mention that he was using Pat; besides, I wouldn't think you could run a 64bit binary on a 32bit kernel/arch? ? Thanks ? Mike ? |
I think there is another status reply I made before this one, pending...
I moved to 64bit Bookworm on the Pi-3, and setup the Pi-1 to monitor the serial interface on Pi-3 so I can capture the crash. Linux rms-gw3 6.1.0-rpi8-rpi-v8 #1 SMP PREEMPT Debian 1:6.1.73-1+rpt1 (2024-01-25) aarch64 GNU/Linux A new process appears in the panic string this time, but the behavior is similar.? 17h 24m uptime, and we panic? with this message (a clip).? This time the panic string is kworker.? I think that's a generic kernel task ---------- [62681.000007] Internal error: Oops: 0000000096000005 [#1] PREEMPT SMP [62681.006386] Modules linked in: mkiss ax25 cmac algif_hash aes_arm64 aes_generic algif_skcipher af_alg bnep vc4 snd_soc_hdmi_codec drm_display_helpe r cec drm_dma_helper drm_kms_helper brcmfmac snd_soc_core binfmt_misc brcmutil hci_uart snd_compress cfg80211 btbcm snd_pcm_dmaengine fb_sys_fops rasp berrypi_hwmon bcm2835_codec(C) syscopyarea sysfillrect cdc_acm bcm2835_v4l2(C) bcm2835_isp(C) bluetooth sysimgblt v4l2_mem2mem bcm2835_mmal_vchiq(C) v ideobuf2_dma_contig videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_common videodev ecdh_generic snd_bcm2835(C) ecc snd_pcm rfkill libaes ? raspberrypi_gpiomem snd_timer vc_sm_cma(C) snd mc uio_pdrv_genirq uio drm fuse dm_mod drm_panel_orientation_quirks backlight ip_tables x_tables ipv6 i 2c_bcm2835 [62681.074035] CPU: 0 PID: 8218 Comm: kworker/u8:0 Tainted: G???????? C???????? 6.1.0-rpi8-rpi-v8 #1? Debian 1:6.1.73-1+rpt1 [62681.085158] Hardware name: Raspberry Pi 3 Model B Rev 1.2 (DT) [62681.091075] Workqueue: events_unbound flush_to_ldisc -------- I was not making any changes to the Pi-3 at the time, and only recognized the system had faulted when I attempted to check winlink mail via VHF from another system.? It would not respond but I could see lights on the TNC decoding my attempts to connect. A few hours before I was checking on system health and noticed this. Feb 02 09:42:04 rms-gw3 rmsgw_aci[7546]: Channel Stats: 1 read, 1 active, 0 down, 1 updated, 0 errors?? <---- last good update which matches timestamp on winlink status page Feb 02 10:14:01 rms-gw3 rmsgw_aci[7616]: Channel Stats: 1 read, 1 active, 1 down, 0 updated, 0 errors? <----- things start to break down here Feb 02 10:42:01 rms-gw3 rmsgw_aci[7664]: Channel Stats: 1 read, 1 active, 1 down, 0 updated, 0 errors Feb 02 10:54:39 rms-gw3 rmsgw_aci[7698]: Channel Stats: 1 read, 1 active, 1 down, 0 updated, 0 errors Feb 02 10:54:54 rms-gw3 rmsgw_aci[7721]: Channel Stats: 1 read, 1 active, 1 down, 0 updated, 0 errors Feb 02 10:55:01 rms-gw3 rmsgw_aci[7742]: Channel Stats: 1 read, 1 active, 1 down, 0 updated, 0 errors Feb 02 10:55:11 rms-gw3 rmsgw_aci[7775]: Channel Stats: 1 read, 1 active, 1 down, 0 updated, 0 errors Feb 02 10:55:52 rms-gw3 rmsgw_aci[7802]: Channel Stats: 1 read, 1 active, 1 down, 0 updated, 0 errors Feb 02 10:59:19 rms-gw3 rmsgw_aci[7835]: Channel Stats: 1 read, 1 active, 1 down, 0 updated, 0 errors Feb 02 11:14:01 rms-gw3 rmsgw_aci[7875]: Channel Stats: 1 read, 1 active, 1 down, 0 updated, 0 errors Feb 02 11:42:01 rms-gw3 rmsgw_aci[7976]: Channel Stats: 1 read, 1 active, 1 down, 0 updated, 0 errors Feb 02 12:14:01 rms-gw3 rmsgw_aci[8193]: Channel Stats: 1 read, 1 active, 1 down, 0 updated, 0 errors Tracing the script that does ACI, I notice it is failing to detect a device from the netstat command and reporting a device is down. root@rms-gw3:/usr/local/etc/rmsgw# netstat --protocol=ax25 -l Active AX.25 sockets Dest?????? Source???? Device? State??????? Vr/Vs??? Send-Q? Recv-Q *????????? WA6BGS-10????????? LISTENING??? 000/000? 0?????? 0??? ? In the device column, it should show ax0.? Even though this was not displaying in the netstat command, the gateway was passing traffic between 9:42 and 12:14 before it crashed.? I'm the only user, so it's very lightly used while I work out these bugs. Now I restarted the Pi-3 and to try something different I did not give kissattach an IP address.? It starts okay and passes traffic, but the ACI script fails because the rmschanstat script looks for an IP to determine the interface is up. I'm running out of things to change or try besides abandoning the Pi. 32 bit Bullseye and Bookworm - same results with mostly similar crash times. Pi-1 and Pi-3 - same results. 64 bit Bookworm on Pi-3 - same results so far. I'm following the same build/config recipe each time.? Nearly identical packages added from the repos, and the same git code pull for the rmsgw software. I put the xml and config files in the same place from the same source, and the gateway starts as expected each time. 64bit bookworm appears to be a 32/64 bit kernel (lscpu), but every app I'm running returns "ELF 64-big LSB" including the rmsgw app I compiled from git source. Any suggestions will be considered. I would like to compare config notes with others, especially if you are using a TNC. -Jon |
Mike and David,? all great input and I didn't consider 32/64 for application stability.? I chose 32 because if anyone in our club has a Pi-1 or Pi-2, they will be choosing the 32 bit release and rmsgw is such a light weight app and compiles from source...? I started with the most compatible release.???
The Pi-1 is a pilot project for our club, and the gateway and client radios are all within my house.? I've told others to finish soldering their TNC kit and connect, but so far I'm the only one.? I will attempt to monitor the number of ax25 packets that travel through the host.? I am usually checking winmail twice a day, and don't always have a lot of mail to get/send.? The quantity of packets on that interface are probably small, even for 1200 baud. The Pi-1 crashed on its own today without the beacon app, after 1d 3h 47m uptime.? Here is what it cried out when going down. [100075.375841] CPU: 0 PID: 3001 Comm: netstat Tainted: G???????? C???????? 6.1.0-rpi7-rpi-v6 #1? Raspbian 1:6.1.63-1+rpt1 Netstat, while not part of the ax25 tools, does query the ax25 interface statistics.? You might be onto something with the ax25 packet count theory.? I wasn't home when it crashed so I did not run netstat. The rmsgw_aci is set to run on cron and it does call the netstat command several times, one specifically for the ax25 protocol.??? See for yourself.??? The -o is the output log file. strace -ormsgw_aci.strace-log -fvtTq -s1024 /usr/local/bin/rmsgw_aci I'm going to downgrade the Pi-1 to Bullseye, patch it thoroughly, and drop this config on it again and include beacon running every 35 minutes.? I'll re-image my Pi-3 and load a 64bit Bookworm release and get it queued up for a second round of tests.? The two Pi's will swap roles for monitoring the serial console to catch the panic strings when the other goes crazy We're collecting quite a few variables to test here.? This could take a while if crashes only happen every few days. 32 or 64 bit? Bullseye or Bookworm? How many ax25 packets until the system becomes unstable? Beacon or no-beacon? |
?Hi Jon, [62681.074035] CPU: 0 PID: 8218 Comm: kworker/u8:0 Tainted: G???????? C???????? 6.1.0-rpi8-rpi-v8 #1? Debian 1:6.1.73-1+rpt1 ? I'm not actually sure what to make of that; all of my crashes have been trigged by user space command that have something to do with the AX.25 stack. ?In your crash output, where there AX.25 functions named in the "Call Trace" section? ?I wonder if this crash might have been something different. Feb 02 10:14:01 rms-gw3 rmsgw_aci[7616]: Channel Stats: 1 read, 1 active, 1 down, 0 updated, 0 errors? <----- things start to break down here ? Sounds like you are pretty familiar with the rmschanstat and rmsgw_aci oddities. ?Mine shows my ax port down at the top of the hour, but up at the bottom of the hour. ?I don't put any faith in the logs it produces. ?But ... Tracing the script that does ACI, I notice it is failing to detect a device from the netstat command and reporting a device is down. This is very strange indeed; I'll keep an eye on my box for similar behavior. I agree this explains the rmschanstat output, but it looks like you were still able to pass traffic when the Pi was in this state. ?This crash seems different ... I would like to compare config notes with others, especially if you are using a TNC. Unfortunately, I'm using Direwolf on my builds, so it's not a direct comparison. ?However, I haven't seen anything to indicate that type of TNC is relevant to the crash. I see your other note, so I'll respond to some items there too. ? Cheers ? Mike |
Hi Jon,
The Pi-1 crashed on its own today without the beacon app, after 1d 3h 47m uptime.? Here is what it cried out when going down. Thanks for sharing this; this is very interesting. ?It shows that the crash can move from one user space app to another. ?It also proves the theory that we can't work around this by disabling beacons or turning off Pat; the crash will just be triggered by something else. I agree that netstat is going to walk the kernel AX.25 memory structures, even if it's not directly related to other ax25-tools. ?Regarding rmsgw_aci, if you know what time of day the Pi crashed, I bet it matches your rmsgw_aci schedule in cron. We're collecting quite a few variables to test here.? This could take a while if crashes only happen every few days. Here's my feed back. ?I'm all 64-bit and all Bookworm. ?I also moved beacon to the TNC (Direwolf), but, as I think this crash proves, beacon is just a red-herring. So, on the packet question, I'm going to retract my earlier statement that a certain amount of traffic is required. I started thinking about this last night and was disappointed that there wasn't a common way we could test out our builds. ?I did, however, remember something from the Direwolf user's guide. ?WB2OSZ used a set of test audio to tune the Direwolf algorithms, and that gave me an idea. ?I tracked down the ?and extracted an AIFF file. ?Then I built a loop to continuously play this test audio through the on-board Pi headphone jack. ?I connected the headphone back back to a USB audio adapter and configured Direwolf to decode the audio back into the AX.25 stack. I've had this setup running for about 2 days now. ?No crash thus far, but I've looped about 70,000 AX.25 packets (4.3MByte), which is probably more data than my station would see in 6 months: ax0: flags=67<UP,BROADCAST,RUNNING> ?mtu 255
? ? ? ? ax25 MYCALL ?txqueuelen 10 ?(AMPR AX.25)
? ? ? ? RX packets 70570 ?bytes 4573313 (4.3 MiB)
? ? ? ? RX errors 0 ?dropped 0 ?overruns 0 ?frame 0
? ? ? ? TX packets 2659 ?bytes 103705 (101.2 KiB)
? ? ? ? TX errors 0 ?dropped 0 overruns 0 ?carrier 0 ?collisions 0
Should you be so inclined, you could test against the same set of data, although your setup would be a bit different with the hardware TNC. ?I'll let you know if I end up getting a crash with this method; it's probably not worth starting a test of your own until I can demonstrate a crash. ? Cheers ? Mike |
I'm following this recipe to build my system.? I must be doing something consistently wrong across all builds, or I'm experiencing the same problem with my config.
I've ruled out the power supply issues with a 5v supply that is extremely capable of doing the job.? The babysitting pi never suffers a hiccup since it's not running any ax25 code. Bullseye/Bookworm/32/64 from raspberrypi download., using the Lite version, no desktop. Image the SD card, setup a first user, enable SSH, boot, go through the basic steps, setup timezone and en_us locale. Patch until there are no more patches to apply.? This next list varies by one package name on Bullseye.? python-pip-whl instead of python3-pip-whl. apt install -y rsync build-essential autoconf dh-autoreconf \ automake libtool git libasound2-dev libncurses5 \ libncurses5-dev libncursesw5-dev libudev-dev \ bc mg jed whois chrony ax25-apps \ ax25-tools git libxml2 libxml2-dev xutils-dev build-essential \ libax25-dev libx11-dev zlib1g-dev libncurses5-dev autoconf \ autogen libtool cmake libgps-dev screen lm-sensors \ python3-pip python3-pip-whl python3-requests # create user/group that will run the ACI and RMSGW process groupadd rmsgw useradd -r -d /nonexistent -s /usr/sbin/nologin -g rmsgw rmsgw git clone https://github.com/nwdigitalradio/rmsgw chown -R rmsgw:rmsgw rmsgw cd rmsgw ./autogen.sh ./configure make -j`nproc --all` make install ln -s /usr/local/etc/rmsgw /etc chown -R rmsgw:rmsgw /usr/local/etc/rmsgw chown -R rmsgw:rmsgw /usr/local/bin/rms* Populate the XML and CONF files with my data,? banner, channels.xml, gateway.conf, sysop.xml Populate the ax25 files, axports, ax25d.conf with the minimum entries (removed all the default stuff) Start a screen session so I can detach, and run this as root /usr/sbin/kissattach /dev/ttyACM0 radio???? <<-sometimes give it a 44 address or leave empty->> /usr/sbin/ax25d /usr/sbin/mheardd??? <<-sometimes I forget to launch this->> /usr/sbin/beacon -c WA6BGS -d "beacon" -t 35 radio "RMS Gate = WA6BGS-10" Add crontab for rmsgw { echo "# m h? dom mon dow?? command" echo "14,42 * * * * /usr/local/bin/rmsgw_aci > /dev/null 2>&1" } | crontab -u rmsgw - Everything works really great till about a day passes. How does this compare to your non-crashy config? 4.3MB at 1200 baud...? whoa. |
Related?
/g/KM4ACK-Pi/topic/howto_patch_the_kernel_for/95904470?p=,,,20,0,0,0::recentpostdate/sticky,,,20,0,0,95904470,previd%3D0,nextid%3D1691495389076951633&previd=0&nextid=1691495389076951633 Cheers, de John ve1jot PS: I have two ports going usually..HF on network105 at 300b, and vhf 1200b port ve1jot-7 going to the provinces node stack..uptime seems infinite until I updated to latest bullseye, it pooched my entire system when I had a mem card failure.., and now I have to rebuild ugh! |
¿ªÔÆÌåÓýAlso, there was a grant for working on the bugs in ax25 linux, quite awhile ago: " Grant: Fixing the Linux kernel AX.25 Date: December 2021 Amount: €179,690 Changes to the Linux kernel over the years have improved and modernized the kernel, but have also made existing AX.25 implementations incompatible and turned preexisting issues into bugs. This can make systems unpredictable or even unusable. Linux kernel development is complex, requiring deep specialized knowledge, and bugs are hard to trace. This may be one of the reasons, why the Linux kernel AX.25 stack is currently in such a bad state. This ARDC grant funds will allow the Deutscher Amateur Radio Club to hire software developers who can create a stable Linux AX.25 implementation and prevent Linux distributions from dropping pre-compiled AX.25 support. The fixed and functional Kernel-AX.25 stack will improve global amateur radio infrastructure. Professional kernel development can bring Linux AX.25 back to life." |
JJ, how far back must we go to get a Raspbian release where this isn't broken?
First to get an install of ax25 that doesn't crash while I work on doing a custom kernel on Bookworm. The kernel.org links seem to indicate this is a problem for ax25 on all linux platforms, not just the Pi.? Is that correct? -Jon |
¿ªÔÆÌåÓýAhh, well, there's been work done since kernel 5.15, I think it
was kernel 6.2 I was trying, and that was for a different reason
than crashing..there was an issue with axip/axudp links going dead
and I found that kernel 6.2 seemed to resolve this issue, but I'm
not sure if any of the recent patches address all problems, or
just a few hi... On 2024-02-03 1:53 a.m., Jon Bousselot
KK6VLO wrote:
JJ, how far back must we go to get a Raspbian release where this isn't broken?Correct. |
The bug description sounds more like a disconnect bug David was describing a few weeks ago. ?This doesn't look related to our kernel panic. ?Still I was curious ... I used?apt-get source linux-image-$(uname -r) to pull the source for my kernel, which resulted in a linux-6.1.69 source tree. ?Down under net/ax25, I manually verified that 3 of the patches had been applied and a 4th applied at include/net/ax25.h . ?Seems like this has been patched in the source version for my system. The only thing I couldn't figure out is why apt loads 6.1.69 source, when dmesg shows the running kernel is 6.1.63 . ?I guess the distro maintainers have moved on from .63, but I haven't updated yet. ?I'm a bit reluctant to update since the bug is currently reproducible. ?Instead, I pulled the 6.1.63 source from kernel.org and directly verified the ax.25 patch, so I know these fixes are in my running kernel. ?The patch was made for 5.15 a kernel and there are minor code differences with the 6.1.x kernels. ?However, it's very apparent the patch was applied to the running kernel version. ? Cheers ? Mike ??
|
I'm going down this route, starting on the Pi-1
git the rpi-6.1.y kernel Verify the patches referenced in this thread exist in the code tree (they are) compile eat lunch and then dinner install the kernel and see if it crashes 17 hours later. For my ubuntu x86_64 desktop, I also did the apt source for the current source tree, and apt source linux-image-unsigned-6.5.0-15-generic in the downloaded tree, I notice the ax25 patches appear to be applied, so I wonder if desktop distros have these updates in place? Add that to my list of things to try. If I'm seeing this correctly, the stable kernel tree for rpi does NOT have the ax25 patches applied.? The ax25 patches do appear in 5.15.y and the 6.1.y.? The only two I checked. |
On Sat, Feb 3, 2024 at 11:29 AM, Jon Bousselot KK6VLO wrote:
I'm going down this route, starting on the Pi-1 ? Hi Jon, Can you share why you think the patches aren't in the Rasp Pi OS kernel? I pulled the raspberrypi.com git repo and diff'ed against the kernel.org source; they came out the same. ?I see the patches as applied: root@hammytest:/usr/src# diff kernel.org/linux-6.1.63/include/net/ax25.h raspberrypi.com/linux/include/net/ax25.h
root@hammytest:/usr/src# diff kernel.org/linux-6.1.63/net/ax25/af_ax25.c raspberrypi.com/linux/net/ax25/af_ax25.c
root@hammytest:/usr/src# diff kernel.org/linux-6.1.63/net/ax25/ax25_dev.c raspberrypi.com/linux/net/ax25/ax25_dev.c
root@hammytest:/usr/src# diff kernel.org/linux-6.1.63/net/ax25/ax25_subr.c raspberrypi.com/linux/net/ax25/ax25_subr.c
I'd be interested in your results, but I don't understand how the patch fits the crash behavior. ? Thanks ? Mike |
If I'm looking at the right thing, the ax25 patches don't appear here, and this stable version is what I think Raspbian is running.? Let me know if you learn otherwise, and where you found it.
Go down to line 62 and it doesn't match what I believe we both understand is the patched version. What release did you pull from raspberryPi git? I ran this git clone --depth=1 --branch rpi-6.1.y https://github.com/raspberrypi/linux |
downloaded whatever version this brings? (which looks like it is rpi-6.1.y)
git clone --depth=1 It does have the ax25 patch.? Downloaded this git clone --depth=1 --branch=stable And it does NOT have the patches. Based on the links given in the KM4ACK-PI group, this seems like a reasonable path to gain stability. ? I'm looking for a reliable answer on where raspbian gets their kernel source and what branch it is.? If this fails, I'll be getting the 5.15 kernel |
Have you tried running sudo rpi-update
That brings in the latest kernel, not the stable version.
Get
On Feb 3, 2024, at 13:02, Jon Bousselot KK6VLO <jon-bousselot@...> wrote: downloaded whatever version this brings? (which looks like it is rpi-6.1.y) |
I did not know this existed.? Tons of warnings, upgrade first, read release notes later.? Thanks for the info!
I did this on Pi-3 and it took way less time than custom compile.? Now I have this version. Linux rpi3-dev 6.1.74-v8+ #1725 SMP PREEMPT Mon Jan 22 13:35:32 GMT 2024 aarch64 GNU/Linux Will load up rmsgw and see if we get more than 17 hours uptime. And I'm a bit puzzled as to what that program did.? The boot commandline doesn't match the kernel. |
Here's the info from the RPI website as to what it does
.
Get
On Feb 3, 2024, at 17:03, Jon Bousselot KK6VLO <jon-bousselot@...> wrote: I did not know this existed.? Tons of warnings, upgrade first, read release notes later.? Thanks for the info! |