开云体育

ctrl + shift + ? for shortcuts
© 2025 开云体育

Re: RPi Kernel Panic on Bookworm


 

? Hi everyone,

Been quiet on this topic recently as I haven't had much to report. ?I've had a lack of any crashes for over 12 days now, which seems to be related to disabling Pat in my environment. ?Please don't jump to conclusions here; this is a complex issue. ?As I've said in the past, Pat doesn't appear to be the cause of the crash, just the process that trips over the kernel garbage to trigger it.

Since I've had some time to think about this problem, here's what I've noticed:

? - Jon and I both run rmsgw. ?We both have crashes on the system running rmsgw.
? - The process that triggers the crash is mobile (beacon, pat, netstat), but is never rmsgw.
? - Jon and I both have few outside connections to rmsgw. ?My last outside connection was 10 days ago.
? - Jon has crashes in just a few hours; my crashes take days to weeks.
? - Jon frequently self checks his mail (possibly hourly? but I don't think he stated). ?I self check my mail infrequently (daily).

Based on these facts, my theory is this. ?rmsgw puts the kernel in some sort of bad state. ?This state is tripped over later by some unsuspecting process, causing the kernel crash.

To prove this out, I need to separate my rms server from my rms client. ?Unfortunately, I have only one radio, so I decided to take this test off-air. ?In my test setup, I built 2 Pi's, let's call them RMS and PAT to distinguish their roles. ?The Pis were built by imaging the SD card from PROD in the manner I've described previously.

Instead of connecting to a radio, I simply connected sound card to sound card using a pair of TRS cables (headphone to microphone in both directions). ?Direwolf was configured to match and PTT was disabled. ?RMS ran rmsgw from ax25d and a shell loop on PAT checked my mail every 30 minutes using pat -s (send only, as not to eat my actual inbox).

I let this setup run over night. ?In the morning, neither Pi had crashed, but I did notice that PAT was no longer able to connect to RMS. ?Tracking through the logs, it looks like about 7 hours after I setup the test, connections started failing. ?Digging into the RMS Pi, I found the netstat condition that Jon first reported. ?Note ax0 missing from the device column:

Dest ? ? ? Source ? ? Device ?State ? ? ? ?Vr/Vs ? ?Send-Q ?Recv-Q
* ? ? ? ? ?MYCALL-10 ? ? ? ? ?LISTENING ? ?000/000 ?0 ? ? ? 0
?
The kernel on RMS had been trashed, but the Pi was still operating. ?Checking the PAT Pi, netstat output looked normal. ?Realizing I still needed an AX25 event to trigger the crash, I used axcall on RMS ?to generate some traffic. ?The RMS Pi immediately crashed, blaming axcall as it went down:

[61160.353159] CPU: 1 PID: 130380 Comm: axcall Tainted: G ? ? ? WC ? ? ?6.1.0-rpi7-rpi-vB #1 Debian 1:6.1.63-1+rpt1
?
For me, this is great news. ?I have an off-air way to quickly show the problem. ?This also continues to show that the crash is mobile between processes and demonstrates an unrelated trigger event. ?Next steps are to reproduce the crash to ensure it is reliable. ?I'm also going to move RMS and PAT out of the RF environment (e.g. the other end of the house) to ensure there is no RFI element.

? Cheers
? Mike

Join [email protected] to automatically receive all group messages.