开云体育

ctrl + shift + ? for shortcuts
© 2025 开云体育

Re: RPi Kernel Panic on Bookworm


 

开云体育


Hello Mike,

?? - I don't think your statement of "I thought this detail was significant as?beacon?and?pat?are the only processes that produce UI frames"?is correct?? Connectionless beacons use UI frames but PAT uses connected sessions.
I think we may have talked past each other on this one before, so to clarify, Pat uses connected mode sessions to retrieve mail AND can be configured to send beacons via UI. ?Below is how I had Pat configured (from Pat's config.json) to advertise peer to peer Winlink. ?I've removed this configuration for testing, but does that make sense why I was concerned with UI frames? ?Pat can send UI frames and, of course, use connected mode sessions.

? "ax25": {
? ? "port": "dw12",
? ? "beacon": {
? ? ? "every": 3600,
? ? ? "message": "Winlink P2P",
? ? ? "destination": "BEACON"
? ? }
??}

Ok.. I didn't know that Pat included it's own "beacon" unique program.? Disabling that will hopefully help narrow things down.


? ?- What is the interval on your beacon message(s)?? Are you only sending one beacon or multiple?? Are you using a digi path on your beacons?
I have two beacons; one transmits every 30 minutes with node and RMS SSIDs. ?The other is the hourly Winlink P2P beacon you see above. ?I've de-configured /usr/sbin/beacon and Pat's beacons, and instead I'm now producing these beacons out of direwolf directly. ?No digipeater, just direct to local RF.

Got it.


As I'm sure you are aware, when you transmit through the AX.25 stack, you cannot "hear" that transmission.

No, the Linux "listen" program can print out TXed packets but you need to enable that feature with the "-a" option:

?? ?????? "-a??????? Allow for the monitoring of outgoing frames as well as incoming ones."


?If you could, chaos would ensue. ?This is a bit inconvenient in the scenario where you want to monitor your own services. ?As an example, if you ran your Winlink client (pat) on the same station as your Winlink server (rmsgw), those two could never talk. ?Such is my situation where pat/rmsgw run on the same Pi and it is my only packet station.

Ah.. ok, so you running rmsgw also on this machine is news.? Might not be a problem but it's worth knowing since as I mentioned before, there are known INCOMING "connected" mode issues with the Linux AX.25 stack but they aren't kernel panic level issues.


I found two solutions to this dilemma; the first is to digi back to yourself. ?Of course that's a massive waste of RF and I didn't want to be "that guy". ?The other, more elegant solution, involves kissnetd and a pair of loopback ax ports (a.k.a. "the complexity"). ?You can probably see why I didn't want to muddy the waters with this before; sorry about that. ?I've had this loopback config setup for about 5 years now. ?Did I mention I have disabled this? Just making sure.

Ok.. for now, please keep things simple to figure out this issue.


To answer the question, I've normally used the loopback for Pat's connections, so checking my mail doesn't cross RF. ?Realizing the loopback complexity was a liability, I removed those parts last week and I've been using the digi back to myself option. ?The schedule runs just once a day a 9:15a. ?I've had very few inbound connections to my RMS server, but I've never been able to correlate an inbound connection with a crash.

Ok, this is the second part of new news.? Pat is making an outbound connection via a local digi and back to your Raspberry Pi.? Now to be clear, are you digipeating or NODEing out and back?? I ask because when you NODE around, your SSID gets decremented by one.? That nuance might matter here.


The kernel is?6.1.0-rpi7-rpi-v8. ?I do have a photo of the crash, but groups.io compresses the picture to such an extent that it is illegible. ?I posted the text of the crash (from the serial console) in a message a few weeks ago. ?Were you looking for something that might be on the screen that wouldn't be on the serial console?

Ok.. so it's the same output as before.? Got it.


? ?- You mentioned that when you rebuilt the TEST machine to use a real sound device instead of a sound-loopback, it started to crash.? To me, this could be RFI induced.? How physically close is the radio to your Raspberry PIs?? How much RF power is the radio transmitting at?
Drats, RFI! ?Don't worry, I'm not going to pull the "not in my shack" argument. ?RFI is always an option, but is notoriously hard to diagnose. ?To answer your questions, I have 3 Pi (PROD, TEST, and an unrelated Pi4) 3 to 4 feet away from the radio and my radio is set to "Mid" power (which is 10w in Kenwood speak).?

That's a LOT of power for only being so close to each other.? Can you put them in to "EL" or Extra Low mode which is 0.5w?? That might help here.? I would also argue that moving them father apart and also onto different Z-planes aka elevation might help if this is really an RFI issue.? If it is RFI related, I would expect to see other errors like USB device drops, etc.


I mentioned the audio tap, so my data cable is your typical Mini-DIN6 on the radio end and a very atypical, butchered mess of ribbon cable, and Dupont/TRS connectors on the other (i.e. RFI playground). ?However, do think back just a moment to my loopback remarks, and understand that the crash has usually happened without any RF transmit (i.e. no RFI) while mail was being checked over the loopback.

Understood.


Since RFI is so difficult to identify, how about a differential test against RFI? ?As it happens, I have an idle Pi3 sitting here. ?This Pi, let's call it LEGACY, is the direct predecessor to PROD, so it has a fully operational packet configuration that was decommissioned a couple of months ago. ?More importantly though, it has 2 years of clean operational history (no crashes) in this environment just a few inches away from where TEST and PROD are right now. ?It would be trivial to swap LEGACY into where TEST sits today. ?In that scenario, if LEGACY were to crash, then it points the finger at RFI; if it doesn't crash then a kernel bug is likely to blame. ?Thoughts?

Swapping the SD cards around might help here and I imagine that "LEGACY" os is using an older kernel that might not have these AX.25 issues.? Did that LEGACY setup also have rmsgw and Pat running at the same time on it?? In addition to this test and since you have multiple PIs, you might consider splitting apart of the Pat and rmsgw onto different Pis.? That might help isolate the issue as well.

--David
KI6ZHD

Join [email protected] to automatically receive all group messages.