Re: RPi Kernel Panic on Bookworm
Did you find anything in the OS system logs related to this issue?? Maybe a kernel oops, etc?
? Hi David, No entries in the log files on disk; syslog simply can't get them written with the kernel panicked. However, I do get an Oops with every crash. ?Certainly on the serial console, but I also often get a wall from syslog:
Message from syslogd@hammy at Feb 25 13:03:30 ...
?kernel:[2412078.321381] Internal error: Oops: 0000000096000005 [#1] PREEMPT SMP
? Cheers ? Mike
|
Re: RPi Kernel Panic on Bookworm
I'll try the test the way you describe it (run ax25d, comment out rmsgw), but I don't think I'll get the correct kernel state with out rmsgw. ?Setup this way, there is no "LISTEN" entry, so the kernel likely doesn't have the correct kernel structures loaded.
? So, I tried testing this way, but I messed up the test; I haven't fixed persistent device names for my usb audio, so the AX.25 stack was down for a couple of days before I noticed. ? Instead of starting over, I took a peek at my PROD Pi, which has been healthy for over 3 weeks since turning off my mail check schedule. ?With 27 days of uptime, PROD finally encountered the "(null)" column in netstat output. ?PROD had been running rmsgw from ax25d during those 27 days and I've had about 6 connections to the gateway in that time. ?I decided to stop and restart ax25d. ?The stop cleared out the netstat entries, as you would expect when the service isn't listening any more. ?However, when I tried to start ax25d again, the system immediately crashed. ?I think that proves the bad state is held in the kernel and is not resolved by releasing the listening socket. ? Cheers ? Mike
|
Re: RPi Kernel Panic on Bookworm
Hello Michael,
Did you find anything in the OS system logs related to this
issue?? Maybe a kernel oops, etc?
--David
KI6ZHD
On 02/20/2024 08:24 PM, Michael Dunn
wrote:
toggle quoted message
Show quoted text
? I'm going to be away from the console of these test
Pis for a few days, so it's a bit pointless to test a crash that
only takes a few hours to happen. ?Instead, I'm going to remove
rmsgw from ax25d and let the test cycle run while I'm away. ?
? Just a quick update on this, I checked on the Pis I had left in
this state (with rmsgw disabled) and found that both were healthy
after 8 days. ?To close the loop on this and demonstrate that
rmsgw was causing the crash, I re-enabled rmsgw in ax25d.conf and
restarted the daemon. ?There were no other changes and PAT was
able to connect to check mail in the next connection cycle. ?About
12 hours later, I checked and found RMS had a faulted netstat
output:
Active
AX.25 sockets
Dest
? ? ? Source ? ? Device ?State ? ? ? ?Vr/Vs ? ?Send-Q ?Recv-Q
* ? ?
? ? ?MYCALL-10 ?(null) ?LISTENING ? ?000/000 ?0 ? ? ? 0 ?
??
I found it interesting that netstat printed "(null)" this time,
instead of just an empty column. ?Not relevant, just
interesting. ?After finding netstat like this, a quick axcall
command crashed the Pi. ?I think this confirms that rmsgw is the
culprit here.
Still, this might not be specific to rmsgw. ?I think I'm seeing
this crop up with rmsgw because it is the only process that is
actively accepting AX.25 connections on my Pi. ?I might see if I
can use node to generate some traffic on the test box.
? Cheers
? Mike
|
Re: Seeking case compatible with Pi 5 + Pimoroni NVMe BASE
Hello Greg,
Geekworm makes several cases for the RPi-5:
Is this the one you're using? or this one:
Best regards, Larry WB6BBB
PS: Geekworm also makes many PCIe NVMe boards for the RPi-5 also:
toggle quoted message
Show quoted text
On Wed, 07 Feb 2024 12:56:47 -0800, "Greg Sanders" <KE5DXA@...> wrote: I got the Geekworm Rpi 5 case. They have the Active Cooler and NVMe hat for a SSD. All fit and work good together. They also have the 5A power supplies.
Greg KE5DXA
|
Re: RPi Kernel Panic on Bookworm
Something to try (in all your free time)? let the system run nicely without rmsgw, then turn it on for a bit and if you catch the netstat output with an empty or null value, stop/start ax25d.? Does it become stable again?
? Hi Jon, I think you make a very good point here; ax25d is a wrapper to rmsgw, much like inetd was a wrapper to telnet way back in the day. ?If I recall, the wrapper handles binding to the interface and listening for connections. ?When a connection happens, it forks and execs the process (e.g. rmsgw, telnet), leaving the child with a set of open file handles to manipulate the connection. ?Put another way, both ax25d and rmsgw interact with the kernel's ax25 stack, so either could be responsible. I'll try the test the way you describe it (run ax25d, comment out rmsgw), but I don't think I'll get the correct kernel state with out rmsgw. ?Setup this way, there is no "LISTEN" entry, so the kernel likely doesn't have the correct kernel structures loaded. ?Assuming nothing happens after a day, I'll modify the test, letting rmsgw run until the kernel state is triggered and try restarting ax25d to see what happens to netstat. ? Cheers ? Mike
|
Re: RPi Kernel Panic on Bookworm
Mike,? excellent observation.
I will try this scenario on my Pi with bookworm, and replace the /usr/local/bin/rmsgw with a shell script that writes to a log file. Disconnect the vhf radio, disable the rmsgw_aci, and only have the kissattach, ax25d, and beacon process running.
And regularly check netstat for weird device entries.? This would help verify your theory that rmsgw running on current kernels is a main problem.?
Since moving down to buster, it's been crash-free for me.? Two weeks plus.
Something to try (in all your free time)? let the system run nicely without rmsgw, then turn it on for a bit and if you catch the netstat output with an empty or null value, stop/start ax25d.? Does it become stable again?
|
Re: sdrtrunk with rsp1a on raspberry pi 4
Thank you all. The issue was with sdrtrunk. The current stable version did not work. The latest overnight release did.? -- Jay
WB2QQJ
|
Re: RPi Kernel Panic on Bookworm
? I'm going to be away from the console of these test Pis for a few days, so it's a bit pointless to test a crash that only takes a few hours to happen. ?Instead, I'm going to remove rmsgw from ax25d and let the test cycle run while I'm away. ?
? Just a quick update on this, I checked on the Pis I had left in this state (with rmsgw disabled) and found that both were healthy after 8 days. ?To close the loop on this and demonstrate that rmsgw was causing the crash, I re-enabled rmsgw in ax25d.conf and restarted the daemon. ?There were no other changes and PAT was able to connect to check mail in the next connection cycle. ?About 12 hours later, I checked and found RMS had a faulted netstat output:
Active AX.25 sockets
Dest ? ? ? Source ? ? Device ?State ? ? ? ?Vr/Vs ? ?Send-Q ?Recv-Q
* ? ? ? ? ?MYCALL-10 ?(null) ?LISTENING ? ?000/000 ?0 ? ? ? 0 ? ??
I found it interesting that netstat printed "(null)" this time, instead of just an empty column. ?Not relevant, just interesting. ?After finding netstat like this, a quick axcall command crashed the Pi. ?I think this confirms that rmsgw is the culprit here.
Still, this might not be specific to rmsgw. ?I think I'm seeing this crop up with rmsgw because it is the only process that is actively accepting AX.25 connections on my Pi. ?I might see if I can use node to generate some traffic on the test box.
? Cheers ? Mike
|
Re: VarAC and ARDOP on Pi 5 + WINE?
More good tips, thanks. I’m inching my way closer to victory today. I hadn’t seen the step about the VB OCX, so I’ve done that now and things seem more stable.
Also, I noticed the UI in VarAC was fine until after I configured it to talk with my rig (Icom 7100). After a reboot, I then got the deadlock I posted here before.
I decided to try flrig for rig control. I have flrig running natively on the Pi5.
No more black screens and lockups.
In VarAC, I am receiving fine and correctly processed a few beacons from other stations.
However, when I try to beacon, VarAC crashes on PTT and leaves the rig on TX.
I’ll keep fiddling with it, but I’m thinking it’s either something with the audio devices in VARA HF or the rig control in flrig.
Closer, but not quite there yet.
Cheers,
Ken van Wyk Armata Scientia
toggle quoted message
Show quoted text
On Feb 19, 2024, at 9:52?AM, Kelly K7MHI via groups.io <kellykeeton@...> wrote:
There was a new beta released 8.6 which I loaded successfully on a pi5 latest yesterday after the wine environment is setup with all the VB OCX
just install varAC with the installer. I just loaded a new pi5 over the weekend and wine9 box 32/64 and the program ran successfully, I saw more stability possibly the font issue is now handled for example ..
|
Re: sdrtrunk with rsp1a on raspberry pi 4
I do believe so, as it has all the basic drivers and libraries
Get
On Feb 19, 2024, at 11:36, Jay Lijoi < lijoi@...> wrote:
toggle quoted message
Show quoted text
Is the rtladr package required for an sdrplay rsp1a ?
--
Jay
WB2QQJ
|
Re: sdrtrunk with rsp1a on raspberry pi 4
Is the rtladr package required for an sdrplay rsp1a ? -- Jay
WB2QQJ
|
Re: VarAC and ARDOP on Pi 5 + WINE?
There was a new beta released 8.6 which I loaded successfully on a pi5 latest yesterday after the wine environment is setup with all the VB OCX
just install varAC with the installer. I just loaded a new pi5 over the weekend and wine9 box 32/64 and the program ran successfully, I saw more stability possibly the font issue is now handled for example ..
k
|
Re: VarAC and ARDOP on Pi 5 + WINE?
On Feb 9, 2024, at 5:54?PM, Kelly K7MHI via groups.io <kellykeeton@...> wrote: Disable Linux mode in the VarAC.ini file. It seems to have issue’s finding the emoji font, I let Irad know the issue.?
Thanks for that tip. I’m back from travel now and gave it a try. Sadly, same problem. I tried also turning off advanced mode in the INI file. Same thing.
Attached is a screenshot of what my VarAC screen looks like. It’s in a deadlock state, so no UI interactions at all.
Anyone else seeing the same sort of thing? (Raspberry Pi 5 running Raspberry OS with latest updates, including Box64, Box86, and WINE.)
FWIW, I’m running Winlink + VARA quite well in this configuration.
Suggestions welcomed. Thanks.
Cheers,
Ken (K0RvW)
|
Re: sdrtrunk with rsp1a on raspberry pi 4
Did you install the rtladr package?? That has the drivers in it
Get
On Feb 18, 2024, at 12:16, Jay Lijoi < lijoi@...> wrote:
toggle quoted message
Show quoted text
Good afternoon,
I have been trying to get sdrtrunk to work with my rsp1a(sdrplay). The program runs but does not detect the rsp1a.. Am I missing something? It say there is no tuner.
Any help would be greatly appreciated,
|
Re: Different /boot configuration file location depends on Raspberry Pi hardware or variant of OS?
Look at the dates of the 2 image files.? Are they the same or different?? I noticed the image I downloaded dated October or November 2023 had the symlinks.? The latest image has the /boot/*.txt files with the disclaimer in them and are no longer symlinks.? This is a direct copy of what Debian did.? Why Debian changed it I have no clue.? You could always delete the /boot files and create symlinks there to the /boot/firmware files
Get
toggle quoted message
Show quoted text
Hello Everyone,
As I continue to learn Raspberry Pi OS Bookworm (Debian 12), I'm
finding a strange inconsistency and I think it either depends on:
?? a) the hardware I'm using or the use of Raspberry Pi OS Lite
(No GUI) vs Raspberry Pi OS Desktop (GUI)
?? b) different versions of Rpi-OS images where they are making
changes mid-release
I'm curious if anyone can confirm this on your setups:
?? Raspberry Pi 4 with 4GB of RAM running Bookworm Lite (No GUI)
????? --
????? cat /etc/os-release
????? --
????? PRETTY_NAME="Debian GNU/Linux 12 (bookworm)"
????? NAME="Debian GNU/Linux"
????? VERSION_ID="12"
????? VERSION="12 (bookworm)"
????? VERSION_CODENAME=bookworm
????? ID=debian
????? HOME_URL=
????? SUPPORT_URL=
????? BUG_REPORT_URL=
????? --
????? Shows all config files in /boot are symlinks to
/boot/firmware/.? That makes things backwards compatible:
??????? --
? ?? ??? ls -la /boot/*.txt
??????? lrwxrwxrwx 1 root root 20 Oct? 9 20:39 /boot/cmdline.txt
-> firmware/cmdline.txt
???? ? ? lrwxrwxrwx 1 root root 19 Oct? 9 20:39 /boot/config.txt
-> firmware/config.txt
??????? --
????? Here is what is in the new /boot/firmware directory for
config files
??????? --
??????? ls -la /boot/firmware/*.txt
??????? -rwxr-xr-x 1 root root? 132 Oct? 9 21:00
/boot/firmware/cmdline.txt
??????? -rwxr-xr-x 1 root root 1364 Oct 27 17:39
/boot/firmware/config.txt
??????? -rwxr-xr-x 1 root root? 145 Oct? 9 20:57
/boot/firmware/issue.txt
??????? --
Now compare that to a Raspberry Pi 5 with 8GB RAM running Bookworm
Desktop (GUI enabled):
?? $ cat /etc/os-release
?? --
?? PRETTY_NAME="Debian GNU/Linux 12 (bookworm)"
?? NAME="Debian GNU/Linux"
?? VERSION_ID="12"
?? VERSION="12 (bookworm)"
?? VERSION_CODENAME=bookworm
?? ID=debian
?? HOME_URL=
?? SUPPORT_URL=
?? BUG_REPORT_URL=
?? --
?? $ ls -la /boot/*.txt
?? -rw-r--r-- 1 root root 92 Feb? 5 18:49 /boot/cmdline.txt
?? -rw-r--r-- 1 root root 91 Feb? 5 18:49 /boot/config.txt
?? lrwxrwxrwx 1 root root 18 Dec? 4 21:04 /boot/issue.txt ->
firmware/issue.txt
????? These configuration files only contain text like:
? ? ???? --
? ? ? ?? DO NOT EDIT THIS FILE
???? ? ? The file you are looking for has moved to
/boot/firmware/config.txt
? ? ???? --
?? The new REAL location of these files is now ONLY in
/boot/firmware..
?? --
?? $ ls -la /boot/firmware/*.txt
?? -rwxr-xr-x 1 root root? 132 Feb? 5 18:57
/boot/firmware/cmdline.txt
?? -rwxr-xr-x 1 root root 1315 Feb 14 12:45
/boot/firmware/config.txt
?? -rwxr-xr-x 1 root root? 145 Dec? 4 21:04
/boot/firmware/issue.txt
?? --
By NOT making /boot/cmdline.txt a symlink
to /boot/firmware/cmdline.txt, this BREAKS
backwards compatibility with various tools, documentation, etc.?
I'm curious if anyone knows WHERE and WHY this is happening?
--David
KI6ZHD
|
Re: Different /boot configuration file location depends on Raspberry Pi hardware or variant of OS?
Dave,
?? Checked RPi5 bookworm 64, RPi4 bookworm 64, CM4 bookworm 64
with essentially the same results except for the ls -la
/boot/*.txt .? RPi4 & CM4 /boot/*.txt reported issue.txt, but
RPI5 did not. ?? ls -la
/boot/firmware/*.txt had identical results on all.?
All systems up to date AFAIK.
Bill KC9XG
On 2/18/2024 1:26 PM, David Ranch
wrote:
toggle quoted message
Show quoted text
Hello Everyone,
As I continue to learn Raspberry Pi OS Bookworm (Debian 12), I'm
finding a strange inconsistency and I think it either depends
on:
?? a) the hardware I'm using or the use of Raspberry Pi OS Lite
(No GUI) vs Raspberry Pi OS Desktop (GUI)
?? b) different versions of Rpi-OS images where they are making
changes mid-release
I'm curious if anyone can confirm this on your setups:
?? Raspberry Pi 4 with 4GB of RAM running Bookworm Lite (No GUI)
????? --
????? cat /etc/os-release
????? --
????? PRETTY_NAME="Debian GNU/Linux 12 (bookworm)"
????? NAME="Debian GNU/Linux"
????? VERSION_ID="12"
????? VERSION="12 (bookworm)"
????? VERSION_CODENAME=bookworm
????? ID=debian
????? HOME_URL=
????? SUPPORT_URL=
????? BUG_REPORT_URL=
????? --
????? Shows all config files in /boot are symlinks to
/boot/firmware/.? That makes things backwards compatible:
??????? --
? ?? ??? ls -la /boot/*.txt
??????? lrwxrwxrwx 1 root root 20 Oct? 9 20:39 /boot/cmdline.txt
-> firmware/cmdline.txt
???? ? ? lrwxrwxrwx 1 root root 19 Oct? 9 20:39 /boot/config.txt
-> firmware/config.txt
??????? --
????? Here is what is in the new /boot/firmware directory for
config files
??????? --
??????? ls -la /boot/firmware/*.txt
??????? -rwxr-xr-x 1 root root? 132 Oct? 9 21:00
/boot/firmware/cmdline.txt
??????? -rwxr-xr-x 1 root root 1364 Oct 27 17:39
/boot/firmware/config.txt
??????? -rwxr-xr-x 1 root root? 145 Oct? 9 20:57
/boot/firmware/issue.txt
??????? --
Now compare that to a Raspberry Pi 5 with 8GB RAM running
Bookworm Desktop (GUI enabled):
?? $ cat /etc/os-release
?? --
?? PRETTY_NAME="Debian GNU/Linux 12 (bookworm)"
?? NAME="Debian GNU/Linux"
?? VERSION_ID="12"
?? VERSION="12 (bookworm)"
?? VERSION_CODENAME=bookworm
?? ID=debian
?? HOME_URL=
?? SUPPORT_URL=
?? BUG_REPORT_URL=
?? --
?? $ ls -la /boot/*.txt
?? -rw-r--r-- 1 root root 92 Feb? 5 18:49 /boot/cmdline.txt
?? -rw-r--r-- 1 root root 91 Feb? 5 18:49 /boot/config.txt
?? lrwxrwxrwx 1 root root 18 Dec? 4 21:04 /boot/issue.txt ->
firmware/issue.txt
????? These configuration files only contain text like:
? ? ???? --
? ? ? ?? DO NOT EDIT THIS FILE
???? ? ? The file you are looking for has moved to
/boot/firmware/config.txt
? ? ???? --
?? The new REAL location of these files is now ONLY in
/boot/firmware..
?? --
?? $ ls -la /boot/firmware/*.txt
?? -rwxr-xr-x 1 root root? 132 Feb? 5 18:57
/boot/firmware/cmdline.txt
?? -rwxr-xr-x 1 root root 1315 Feb 14 12:45
/boot/firmware/config.txt
?? -rwxr-xr-x 1 root root? 145 Dec? 4 21:04
/boot/firmware/issue.txt
?? --
By NOT making /boot/cmdline.txt a
symlink to /boot/firmware/cmdline.txt,
this BREAKS backwards compatibility with various tools,
documentation, etc.? I'm curious if anyone knows WHERE and WHY
this is happening?
--David
KI6ZHD
|
sdrtrunk with rsp1a on raspberry pi 4
Good afternoon, I have been trying to get sdrtrunk to work with my rsp1a(sdrplay). The program runs but does not detect the rsp1a.. Am I missing something? It say there is no tuner. Any help would be greatly appreciated, -- Jay
WB2QQJ
|
Different /boot configuration file location depends on Raspberry Pi hardware or variant of OS?
Hello Everyone,
As I continue to learn Raspberry Pi OS Bookworm (Debian 12), I'm
finding a strange inconsistency and I think it either depends on:
?? a) the hardware I'm using or the use of Raspberry Pi OS Lite
(No GUI) vs Raspberry Pi OS Desktop (GUI)
?? b) different versions of Rpi-OS images where they are making
changes mid-release
I'm curious if anyone can confirm this on your setups:
?? Raspberry Pi 4 with 4GB of RAM running Bookworm Lite (No GUI)
????? --
????? cat /etc/os-release
????? --
????? PRETTY_NAME="Debian GNU/Linux 12 (bookworm)"
????? NAME="Debian GNU/Linux"
????? VERSION_ID="12"
????? VERSION="12 (bookworm)"
????? VERSION_CODENAME=bookworm
????? ID=debian
????? HOME_URL=
????? SUPPORT_URL=
????? BUG_REPORT_URL=
????? --
????? Shows all config files in /boot are symlinks to
/boot/firmware/.? That makes things backwards compatible:
??????? --
? ?? ??? ls -la /boot/*.txt
??????? lrwxrwxrwx 1 root root 20 Oct? 9 20:39 /boot/cmdline.txt
-> firmware/cmdline.txt
???? ? ? lrwxrwxrwx 1 root root 19 Oct? 9 20:39 /boot/config.txt
-> firmware/config.txt
??????? --
????? Here is what is in the new /boot/firmware directory for
config files
??????? --
??????? ls -la /boot/firmware/*.txt
??????? -rwxr-xr-x 1 root root? 132 Oct? 9 21:00
/boot/firmware/cmdline.txt
??????? -rwxr-xr-x 1 root root 1364 Oct 27 17:39
/boot/firmware/config.txt
??????? -rwxr-xr-x 1 root root? 145 Oct? 9 20:57
/boot/firmware/issue.txt
??????? --
Now compare that to a Raspberry Pi 5 with 8GB RAM running Bookworm
Desktop (GUI enabled):
?? $ cat /etc/os-release
?? --
?? PRETTY_NAME="Debian GNU/Linux 12 (bookworm)"
?? NAME="Debian GNU/Linux"
?? VERSION_ID="12"
?? VERSION="12 (bookworm)"
?? VERSION_CODENAME=bookworm
?? ID=debian
?? HOME_URL=
?? SUPPORT_URL=
?? BUG_REPORT_URL=
?? --
?? $ ls -la /boot/*.txt
?? -rw-r--r-- 1 root root 92 Feb? 5 18:49 /boot/cmdline.txt
?? -rw-r--r-- 1 root root 91 Feb? 5 18:49 /boot/config.txt
?? lrwxrwxrwx 1 root root 18 Dec? 4 21:04 /boot/issue.txt ->
firmware/issue.txt
????? These configuration files only contain text like:
? ? ???? --
? ? ? ?? DO NOT EDIT THIS FILE
???? ? ? The file you are looking for has moved to
/boot/firmware/config.txt
? ? ???? --
?? The new REAL location of these files is now ONLY in
/boot/firmware..
?? --
?? $ ls -la /boot/firmware/*.txt
?? -rwxr-xr-x 1 root root? 132 Feb? 5 18:57
/boot/firmware/cmdline.txt
?? -rwxr-xr-x 1 root root 1315 Feb 14 12:45
/boot/firmware/config.txt
?? -rwxr-xr-x 1 root root? 145 Dec? 4 21:04
/boot/firmware/issue.txt
?? --
By NOT making /boot/cmdline.txt a symlink
to /boot/firmware/cmdline.txt, this BREAKS
backwards compatibility with various tools, documentation, etc.?
I'm curious if anyone knows WHERE and WHY this is happening?
--David
KI6ZHD
|
Re: RPi Kernel Panic on Bookworm
I imagine you're getting confused by Pat's version
numbering scheme for it's infrastructure modules (wl2k-go) which
include all kinds of stuff.? It's 0.11.8 version is just one of
many versions that looks similar to what the Official AX.25 repo
as well as what the VE7FET repo uses:
??
Linux's current AX.25 woes is not a problem in these user-space
libraries and utilities.? The issues are in the kernel itself.
--David
KI6ZHD
?
On 02/12/2024 11:29 PM, JJ wrote:
toggle quoted message
Show quoted text
Hmmm..this shows ax25 version 0.11.8
newer? changes?
Just sorta stumbled upon this..haven't tried anything tho...
On 2024-02-12 12:23 a.m., David Ranch
wrote:
Hello Mike,
Again, thank you for the detailed email and I think this all
helps in tracking down the real issue here.? I've been
discussing this on the side with Bernard F6BVP who maintains
FPAC (node) and FBB (BBS) and uses the ROSE protocol heavily.?
He reported that he's "running three ROSE/FPAC nodes on a local
network and I haven't observed any connections issues with
Raspbian OS 64bit for a long time nor with Ubuntu (20.04)".? He
showed months of uptime with LOTS of connections without either
any panics or any orphaned AX.25 connections.? One key point he
mentioned is that he does NOT have any RF connections, it's all
via AXUDP and he also noted he's NOT using mkiss for linking the
AXUDP to the kernel with kissattach.
--David
On 02/10/2024 12:27 PM, Michael
Dunn wrote:
? Hi everyone,
Been quiet on this topic recently as I haven't had much to
report. ?I've had a lack of any crashes for over 12 days now,
which seems to be related to disabling Pat in my environment.
?Please don't jump to conclusions here; this is a complex
issue. ?As I've said in the past, Pat doesn't appear to be the
cause of the crash, just the process that trips over the
kernel garbage to trigger it.
Since I've had some time to think about this problem, here's
what I've noticed:
? - Jon and I both run rmsgw. ?We both have crashes on the
system running rmsgw.
? - The process that triggers the crash is mobile (beacon,
pat, netstat), but is never rmsgw.
? - Jon and I both have few outside connections to rmsgw. ?My
last outside connection was 10 days ago.
? - Jon has crashes in just a few hours; my crashes take days
to weeks.
? - Jon frequently self checks his mail (possibly hourly? but
I don't think he stated). ?I self check my mail infrequently
(daily).
Based on these facts, my theory is this. ?rmsgw puts the
kernel in some sort of bad state. ?This state is tripped over
later by some unsuspecting process, causing the kernel crash.
To prove this out, I need to separate my rms server from my
rms client. ?Unfortunately, I have only one radio, so I
decided to take this test off-air. ?In my test setup, I built
2 Pi's, let's call them RMS and PAT to distinguish their
roles. ?The Pis were built by imaging the SD card from PROD in
the manner I've described previously.
Instead of connecting to a radio, I simply connected sound
card to sound card using a pair of TRS cables (headphone to
microphone in both directions). ?Direwolf was configured to
match and PTT was disabled. ?RMS ran rmsgw from ax25d and a
shell loop on PAT checked my mail every 30 minutes using pat
-s (send only, as not to eat my actual inbox).
I let this setup run over night. ?In the morning, neither Pi
had crashed, but I did notice that PAT was no longer able to
connect to RMS. ?Tracking through the logs, it looks like
about 7 hours after I setup the test, connections started
failing. ?Digging into the RMS Pi, I found the netstat
condition that Jon first reported. ?Note ax0 missing from the
device column:
Dest
? ? ? Source ? ? Device ?State ? ? ? ?Vr/Vs ? ?Send-Q
?Recv-Q
*
? ? ? ? ?MYCALL-10 ? ? ? ? ?LISTENING ? ?000/000 ?0 ? ? ?
0
?
The kernel on RMS had been trashed, but the Pi was still
operating. ?Checking the PAT Pi, netstat output looked normal.
?Realizing I still needed an AX25 event to trigger the crash,
I used axcall on RMS ?to generate some traffic. ?The RMS Pi
immediately crashed, blaming axcall as it went down:
[61160.353159]
CPU: 1 PID: 130380 Comm: axcall Tainted: G ? ? ? WC ? ?
?6.1.0-rpi7-rpi-vB #1 Debian 1:6.1.63-1+rpt1
?
For me, this is great news. ?I have an off-air way to quickly
show the problem. ?This also continues to show that the crash
is mobile between processes and demonstrates an unrelated
trigger event. ?Next steps are to reproduce the crash to
ensure it is reliable. ?I'm also going to move RMS and PAT out
of the RF environment (e.g. the other end of the house) to
ensure there is no RFI element.
? Cheers
? Mike
|
Re: RPi Kernel Panic on Bookworm
Hmmm..this shows ax25 version 0.11.8
newer? changes?
Just sorta stumbled upon this..haven't tried anything tho...
On 2024-02-12 12:23 a.m., David Ranch
wrote:
toggle quoted message
Show quoted text
Hello Mike,
Again, thank you for the detailed email and I think this all helps
in tracking down the real issue here.? I've been discussing this
on the side with Bernard F6BVP who maintains FPAC (node) and FBB
(BBS) and uses the ROSE protocol heavily.? He reported that he's
"running three ROSE/FPAC nodes on a local network and I haven't
observed any connections issues with Raspbian OS 64bit for a long
time nor with Ubuntu (20.04)".? He showed months of uptime with
LOTS of connections without either any panics or any orphaned
AX.25 connections.? One key point he mentioned is that he does NOT
have any RF connections, it's all via AXUDP and he also noted he's
NOT using mkiss for linking the AXUDP to the kernel with
kissattach.
--David
On 02/10/2024 12:27 PM, Michael Dunn
wrote:
? Hi everyone,
Been quiet on this topic recently as I haven't had much to
report. ?I've had a lack of any crashes for over 12 days now,
which seems to be related to disabling Pat in my environment.
?Please don't jump to conclusions here; this is a complex issue.
?As I've said in the past, Pat doesn't appear to be the cause of
the crash, just the process that trips over the kernel garbage
to trigger it.
Since I've had some time to think about this problem, here's
what I've noticed:
? - Jon and I both run rmsgw. ?We both have crashes on the
system running rmsgw.
? - The process that triggers the crash is mobile (beacon, pat,
netstat), but is never rmsgw.
? - Jon and I both have few outside connections to rmsgw. ?My
last outside connection was 10 days ago.
? - Jon has crashes in just a few hours; my crashes take days to
weeks.
? - Jon frequently self checks his mail (possibly hourly? but I
don't think he stated). ?I self check my mail infrequently
(daily).
Based on these facts, my theory is this. ?rmsgw puts the kernel
in some sort of bad state. ?This state is tripped over later by
some unsuspecting process, causing the kernel crash.
To prove this out, I need to separate my rms server from my rms
client. ?Unfortunately, I have only one radio, so I decided to
take this test off-air. ?In my test setup, I built 2 Pi's, let's
call them RMS and PAT to distinguish their roles. ?The Pis were
built by imaging the SD card from PROD in the manner I've
described previously.
Instead of connecting to a radio, I simply connected sound card
to sound card using a pair of TRS cables (headphone to
microphone in both directions). ?Direwolf was configured to
match and PTT was disabled. ?RMS ran rmsgw from ax25d and a
shell loop on PAT checked my mail every 30 minutes using pat -s
(send only, as not to eat my actual inbox).
I let this setup run over night. ?In the morning, neither Pi had
crashed, but I did notice that PAT was no longer able to connect
to RMS. ?Tracking through the logs, it looks like about 7 hours
after I setup the test, connections started failing. ?Digging
into the RMS Pi, I found the netstat condition that Jon first
reported. ?Note ax0 missing from the device column:
Dest
? ? ? Source ? ? Device ?State ? ? ? ?Vr/Vs ? ?Send-Q
?Recv-Q
* ?
? ? ? ?MYCALL-10 ? ? ? ? ?LISTENING ? ?000/000 ?0 ? ? ? 0
?
The kernel on RMS had been trashed, but the Pi was still
operating. ?Checking the PAT Pi, netstat output looked normal.
?Realizing I still needed an AX25 event to trigger the crash, I
used axcall on RMS ?to generate some traffic. ?The RMS Pi
immediately crashed, blaming axcall as it went down:
[61160.353159]
CPU: 1 PID: 130380 Comm: axcall Tainted: G ? ? ? WC ? ?
?6.1.0-rpi7-rpi-vB #1 Debian 1:6.1.63-1+rpt1
?
For me, this is great news. ?I have an off-air way to quickly
show the problem. ?This also continues to show that the crash is
mobile between processes and demonstrates an unrelated trigger
event. ?Next steps are to reproduce the crash to ensure it is
reliable. ?I'm also going to move RMS and PAT out of the RF
environment (e.g. the other end of the house) to ensure there is
no RFI element.
? Cheers
? Mike
|