开云体育

ctrl + shift + ? for shortcuts
© 2025 开云体育

Re: RPi Kernel Panic on Bookworm


 

开云体育


Hello Mike,

Again, thank you for the detailed email and I think this all helps in tracking down the real issue here.? I've been discussing this on the side with Bernard F6BVP who maintains FPAC (node) and FBB (BBS) and uses the ROSE protocol heavily.? He reported that he's "running three ROSE/FPAC nodes on a local network and I haven't observed any connections issues with Raspbian OS 64bit for a long time nor with Ubuntu (20.04)".? He showed months of uptime with LOTS of connections without either any panics or any orphaned AX.25 connections.? One key point he mentioned is that he does NOT have any RF connections, it's all via AXUDP and he also noted he's NOT using mkiss for linking the AXUDP to the kernel with kissattach.

--David




On 02/10/2024 12:27 PM, Michael Dunn wrote:

? Hi everyone,

Been quiet on this topic recently as I haven't had much to report. ?I've had a lack of any crashes for over 12 days now, which seems to be related to disabling Pat in my environment. ?Please don't jump to conclusions here; this is a complex issue. ?As I've said in the past, Pat doesn't appear to be the cause of the crash, just the process that trips over the kernel garbage to trigger it.

Since I've had some time to think about this problem, here's what I've noticed:

? - Jon and I both run rmsgw. ?We both have crashes on the system running rmsgw.
? - The process that triggers the crash is mobile (beacon, pat, netstat), but is never rmsgw.
? - Jon and I both have few outside connections to rmsgw. ?My last outside connection was 10 days ago.
? - Jon has crashes in just a few hours; my crashes take days to weeks.
? - Jon frequently self checks his mail (possibly hourly? but I don't think he stated). ?I self check my mail infrequently (daily).

Based on these facts, my theory is this. ?rmsgw puts the kernel in some sort of bad state. ?This state is tripped over later by some unsuspecting process, causing the kernel crash.

To prove this out, I need to separate my rms server from my rms client. ?Unfortunately, I have only one radio, so I decided to take this test off-air. ?In my test setup, I built 2 Pi's, let's call them RMS and PAT to distinguish their roles. ?The Pis were built by imaging the SD card from PROD in the manner I've described previously.

Instead of connecting to a radio, I simply connected sound card to sound card using a pair of TRS cables (headphone to microphone in both directions). ?Direwolf was configured to match and PTT was disabled. ?RMS ran rmsgw from ax25d and a shell loop on PAT checked my mail every 30 minutes using pat -s (send only, as not to eat my actual inbox).

I let this setup run over night. ?In the morning, neither Pi had crashed, but I did notice that PAT was no longer able to connect to RMS. ?Tracking through the logs, it looks like about 7 hours after I setup the test, connections started failing. ?Digging into the RMS Pi, I found the netstat condition that Jon first reported. ?Note ax0 missing from the device column:

Dest ? ? ? Source ? ? Device ?State ? ? ? ?Vr/Vs ? ?Send-Q ?Recv-Q
* ? ? ? ? ?MYCALL-10 ? ? ? ? ?LISTENING ? ?000/000 ?0 ? ? ? 0
?
The kernel on RMS had been trashed, but the Pi was still operating. ?Checking the PAT Pi, netstat output looked normal. ?Realizing I still needed an AX25 event to trigger the crash, I used axcall on RMS ?to generate some traffic. ?The RMS Pi immediately crashed, blaming axcall as it went down:

[61160.353159] CPU: 1 PID: 130380 Comm: axcall Tainted: G ? ? ? WC ? ? ?6.1.0-rpi7-rpi-vB #1 Debian 1:6.1.63-1+rpt1
?
For me, this is great news. ?I have an off-air way to quickly show the problem. ?This also continues to show that the crash is mobile between processes and demonstrates an unrelated trigger event. ?Next steps are to reproduce the crash to ensure it is reliable. ?I'm also going to move RMS and PAT out of the RF environment (e.g. the other end of the house) to ensure there is no RFI element.

? Cheers
? Mike


Join [email protected] to automatically receive all group messages.