¿ªÔÆÌåÓý

ctrl + shift + ? for shortcuts
© 2025 ¿ªÔÆÌåÓý

RPi Kernel Panic on Bookworm


 

If I'm looking at the right thing, the ax25 patches don't appear here, and this stable version is what I think Raspbian is running.? Let me know if you learn otherwise, and where you found it.


Go down to line 62 and it doesn't match what I believe we both understand is the patched version.

What release did you pull from raspberryPi git?
I ran this
git clone --depth=1 --branch rpi-6.1.y https://github.com/raspberrypi/linux


 

On Sat, Feb 3, 2024 at 11:29 AM, Jon Bousselot KK6VLO wrote:
I'm going down this route, starting on the Pi-1

? Hi Jon,

Can you share why you think the patches aren't in the Rasp Pi OS kernel?

I pulled the raspberrypi.com git repo and diff'ed against the kernel.org source; they came out the same. ?I see the patches as applied:

root@hammytest:/usr/src# diff kernel.org/linux-6.1.63/include/net/ax25.h raspberrypi.com/linux/include/net/ax25.h
root@hammytest:/usr/src# diff kernel.org/linux-6.1.63/net/ax25/af_ax25.c raspberrypi.com/linux/net/ax25/af_ax25.c
root@hammytest:/usr/src# diff kernel.org/linux-6.1.63/net/ax25/ax25_dev.c raspberrypi.com/linux/net/ax25/ax25_dev.c
root@hammytest:/usr/src# diff kernel.org/linux-6.1.63/net/ax25/ax25_subr.c raspberrypi.com/linux/net/ax25/ax25_subr.c

I'd be interested in your results, but I don't understand how the patch fits the crash behavior.

? Thanks
? Mike


 

I'm going down this route, starting on the Pi-1


git the rpi-6.1.y kernel
Verify the patches referenced in this thread exist in the code tree (they are)
compile
eat lunch and then dinner
install the kernel and see if it crashes 17 hours later.

For my ubuntu x86_64 desktop, I also did the apt source for the current source tree, and
apt source linux-image-unsigned-6.5.0-15-generic
in the downloaded tree, I notice the ax25 patches appear to be applied, so I wonder if desktop distros have these updates in place?
Add that to my list of things to try.


If I'm seeing this correctly, the stable kernel tree for rpi does NOT have the ax25 patches applied.?
The ax25 patches do appear in 5.15.y and the 6.1.y.? The only two I checked.


 


Ahh, well, there's been work done since kernel 5.15, I think it was kernel 6.2 I was trying, and that was for a different reason than crashing..


The bug description sounds more like a disconnect bug David was describing a few weeks ago. ?This doesn't look related to our kernel panic. ?Still I was curious ...

I used?apt-get source linux-image-$(uname -r) to pull the source for my kernel, which resulted in a linux-6.1.69 source tree. ?Down under net/ax25, I manually verified that 3 of the patches had been applied and a 4th applied at include/net/ax25.h . ?Seems like this has been patched in the source version for my system.

The only thing I couldn't figure out is why apt loads 6.1.69 source, when dmesg shows the running kernel is 6.1.63 . ?I guess the distro maintainers have moved on from .63, but I haven't updated yet. ?I'm a bit reluctant to update since the bug is currently reproducible. ?Instead, I pulled the 6.1.63 source from kernel.org and directly verified the ax.25 patch, so I know these fixes are in my running kernel. ?The patch was made for 5.15 a kernel and there are minor code differences with the 6.1.x kernels. ?However, it's very apparent the patch was applied to the running kernel version.

? Cheers
? Mike


??

there was an issue with axip/axudp links going dead and I found that kernel 6.2 seemed to resolve this issue, but I'm not sure if any of the recent patches address all problems, or just a few hi...

On 2024-02-03 1:53 a.m., Jon Bousselot KK6VLO wrote:
JJ, how far back must we go to get a Raspbian release where this isn't broken?
First to get an install of ax25 that doesn't crash while I work on doing a custom kernel on Bookworm.

The kernel.org links seem to indicate this is a problem for ax25 on all linux platforms, not just the Pi.? Is that correct?
-Jon
Correct.


 

¿ªÔÆÌåÓý

Ahh, well, there's been work done since kernel 5.15, I think it was kernel 6.2 I was trying, and that was for a different reason than crashing..there was an issue with axip/axudp links going dead and I found that kernel 6.2 seemed to resolve this issue, but I'm not sure if any of the recent patches address all problems, or just a few hi...

On 2024-02-03 1:53 a.m., Jon Bousselot KK6VLO wrote:
JJ, how far back must we go to get a Raspbian release where this isn't broken?
First to get an install of ax25 that doesn't crash while I work on doing a custom kernel on Bookworm.

The kernel.org links seem to indicate this is a problem for ax25 on all linux platforms, not just the Pi.? Is that correct?
-Jon
Correct.


 

JJ, how far back must we go to get a Raspbian release where this isn't broken?
First to get an install of ax25 that doesn't crash while I work on doing a custom kernel on Bookworm.

The kernel.org links seem to indicate this is a problem for ax25 on all linux platforms, not just the Pi.? Is that correct?
-Jon


 

¿ªÔÆÌåÓý

Also, there was a grant for working on the bugs in ax25 linux, quite awhile ago:

"

Grant: Fixing the Linux kernel AX.25
Date: December 2021
Amount: €179,690
Changes to the Linux kernel over the years have improved and modernized the kernel, but have also made existing AX.25 implementations incompatible and turned preexisting issues into bugs. This can make systems unpredictable or even unusable. Linux kernel development is complex, requiring deep specialized knowledge, and bugs are hard to trace. This may be one of the reasons, why the Linux kernel AX.25 stack is currently in such a bad state.

This ARDC grant funds will allow the Deutscher Amateur Radio Club to hire software developers who can create a stable Linux AX.25 implementation and prevent Linux distributions from dropping pre-compiled AX.25 support. The fixed and functional Kernel-AX.25 stack will improve global amateur radio infrastructure. Professional kernel development can bring Linux AX.25 back to life."


 

Related?

/g/KM4ACK-Pi/topic/howto_patch_the_kernel_for/95904470?p=,,,20,0,0,0::recentpostdate/sticky,,,20,0,0,95904470,previd%3D0,nextid%3D1691495389076951633&previd=0&nextid=1691495389076951633


Cheers, de John ve1jot

PS: I have two ports going usually..HF on network105 at 300b, and vhf 1200b port ve1jot-7 going to the provinces node stack..uptime seems infinite until I updated to latest bullseye, it pooched my entire system when I had a mem card failure.., and now I have to rebuild ugh!


 

I'm following this recipe to build my system.? I must be doing something consistently wrong across all builds, or I'm experiencing the same problem with my config.
I've ruled out the power supply issues with a 5v supply that is extremely capable of doing the job.? The babysitting pi never suffers a hiccup since it's not running any ax25 code.

Bullseye/Bookworm/32/64 from raspberrypi download., using the Lite version, no desktop.
Image the SD card, setup a first user, enable SSH, boot, go through the basic steps, setup timezone and en_us locale.
Patch until there are no more patches to apply.? This next list varies by one package name on Bullseye.? python-pip-whl instead of python3-pip-whl.

apt install -y rsync build-essential autoconf dh-autoreconf \
automake libtool git libasound2-dev libncurses5 \
libncurses5-dev libncursesw5-dev libudev-dev \
bc mg jed whois chrony ax25-apps \
ax25-tools git libxml2 libxml2-dev xutils-dev build-essential \
libax25-dev libx11-dev zlib1g-dev libncurses5-dev autoconf \
autogen libtool cmake libgps-dev screen lm-sensors \
python3-pip python3-pip-whl python3-requests

# create user/group that will run the ACI and RMSGW process
groupadd rmsgw
useradd -r -d /nonexistent -s /usr/sbin/nologin -g rmsgw rmsgw
git clone https://github.com/nwdigitalradio/rmsgw
chown -R rmsgw:rmsgw rmsgw
cd rmsgw
./autogen.sh
./configure
make -j`nproc --all`
make install
ln -s /usr/local/etc/rmsgw /etc
chown -R rmsgw:rmsgw /usr/local/etc/rmsgw
chown -R rmsgw:rmsgw /usr/local/bin/rms*

Populate the XML and CONF files with my data,? banner, channels.xml, gateway.conf, sysop.xml
Populate the ax25 files, axports, ax25d.conf with the minimum entries (removed all the default stuff)

Start a screen session so I can detach, and run this as root
/usr/sbin/kissattach /dev/ttyACM0 radio???? <<-sometimes give it a 44 address or leave empty->>
/usr/sbin/ax25d
/usr/sbin/mheardd??? <<-sometimes I forget to launch this->>
/usr/sbin/beacon -c WA6BGS -d "beacon" -t 35 radio "RMS Gate = WA6BGS-10"


Add crontab for rmsgw
{
echo "# m h? dom mon dow?? command"
echo "14,42 * * * * /usr/local/bin/rmsgw_aci > /dev/null 2>&1"
} | crontab -u rmsgw -



Everything works really great till about a day passes.
How does this compare to your non-crashy config?

4.3MB at 1200 baud...? whoa.


 

Hi Jon,


The Pi-1 crashed on its own today without the beacon app, after 1d 3h 47m uptime.? Here is what it cried out when going down.
[100075.375841] CPU: 0 PID: 3001 Comm: netstat Tainted: G???????? C???????? 6.1.0-rpi7-rpi-v6 #1? Raspbian 1:6.1.63-1+rpt1

Thanks for sharing this; this is very interesting. ?It shows that the crash can move from one user space app to another. ?It also proves the theory that we can't work around this by disabling beacons or turning off Pat; the crash will just be triggered by something else.

I agree that netstat is going to walk the kernel AX.25 memory structures, even if it's not directly related to other ax25-tools. ?Regarding rmsgw_aci, if you know what time of day the Pi crashed, I bet it matches your rmsgw_aci schedule in cron.


We're collecting quite a few variables to test here.? This could take a while if crashes only happen every few days.
32 or 64 bit?
Bullseye or Bookworm?
How many ax25 packets until the system becomes unstable?
Beacon or no-beacon?

Here's my feed back. ?I'm all 64-bit and all Bookworm. ?I also moved beacon to the TNC (Direwolf), but, as I think this crash proves, beacon is just a red-herring.

So, on the packet question, I'm going to retract my earlier statement that a certain amount of traffic is required.

I started thinking about this last night and was disappointed that there wasn't a common way we could test out our builds. ?I did, however, remember something from the Direwolf user's guide. ?WB2OSZ used a set of test audio to tune the Direwolf algorithms, and that gave me an idea. ?I tracked down the ?and extracted an AIFF file. ?Then I built a loop to continuously play this test audio through the on-board Pi headphone jack. ?I connected the headphone back back to a USB audio adapter and configured Direwolf to decode the audio back into the AX.25 stack.

I've had this setup running for about 2 days now. ?No crash thus far, but I've looped about 70,000 AX.25 packets (4.3MByte), which is probably more data than my station would see in 6 months:

ax0: flags=67<UP,BROADCAST,RUNNING> ?mtu 255
? ? ? ? ax25 MYCALL ?txqueuelen 10 ?(AMPR AX.25)
? ? ? ? RX packets 70570 ?bytes 4573313 (4.3 MiB)
? ? ? ? RX errors 0 ?dropped 0 ?overruns 0 ?frame 0
? ? ? ? TX packets 2659 ?bytes 103705 (101.2 KiB)
? ? ? ? TX errors 0 ?dropped 0 overruns 0 ?carrier 0 ?collisions 0

Should you be so inclined, you could test against the same set of data, although your setup would be a bit different with the hardware TNC. ?I'll let you know if I end up getting a crash with this method; it's probably not worth starting a test of your own until I can demonstrate a crash.

? Cheers
? Mike


 


?Hi Jon,


[62681.074035] CPU: 0 PID: 8218 Comm: kworker/u8:0 Tainted: G???????? C???????? 6.1.0-rpi8-rpi-v8 #1? Debian 1:6.1.73-1+rpt1

? I'm not actually sure what to make of that; all of my crashes have been trigged by user space command that have something to do with the AX.25 stack. ?In your crash output, where there AX.25 functions named in the "Call Trace" section? ?I wonder if this crash might have been something different.



Feb 02 10:14:01 rms-gw3 rmsgw_aci[7616]: Channel Stats: 1 read, 1 active, 1 down, 0 updated, 0 errors? <----- things start to break down here


? Sounds like you are pretty familiar with the rmschanstat and rmsgw_aci oddities. ?Mine shows my ax port down at the top of the hour, but up at the bottom of the hour. ?I don't put any faith in the logs it produces. ?But ...



Tracing the script that does ACI, I notice it is failing to detect a device from the netstat command and reporting a device is down.
root@rms-gw3:/usr/local/etc/rmsgw# netstat --protocol=ax25 -l
Active AX.25 sockets
Dest?????? Source???? Device? State??????? Vr/Vs??? Send-Q? Recv-Q
*????????? WA6BGS-10????????? LISTENING??? 000/000? 0?????? 0??? ?

This is very strange indeed; I'll keep an eye on my box for similar behavior. I agree this explains the rmschanstat output, but it looks like you were still able to pass traffic when the Pi was in this state. ?This crash seems different ...


I would like to compare config notes with others, especially if you are using a TNC.


Unfortunately, I'm using Direwolf on my builds, so it's not a direct comparison. ?However, I haven't seen anything to indicate that type of TNC is relevant to the crash.

I see your other note, so I'll respond to some items there too.

? Cheers
? Mike


 

Mike and David,? all great input and I didn't consider 32/64 for application stability.? I chose 32 because if anyone in our club has a Pi-1 or Pi-2, they will be choosing the 32 bit release and rmsgw is such a light weight app and compiles from source...? I started with the most compatible release.???
The Pi-1 is a pilot project for our club, and the gateway and client radios are all within my house.? I've told others to finish soldering their TNC kit and connect, but so far I'm the only one.? I will attempt to monitor the number of ax25 packets that travel through the host.? I am usually checking winmail twice a day, and don't always have a lot of mail to get/send.? The quantity of packets on that interface are probably small, even for 1200 baud.

The Pi-1 crashed on its own today without the beacon app, after 1d 3h 47m uptime.? Here is what it cried out when going down.
[100075.375841] CPU: 0 PID: 3001 Comm: netstat Tainted: G???????? C???????? 6.1.0-rpi7-rpi-v6 #1? Raspbian 1:6.1.63-1+rpt1

Netstat, while not part of the ax25 tools, does query the ax25 interface statistics.? You might be onto something with the ax25 packet count theory.? I wasn't home when it crashed so I did not run netstat.
The rmsgw_aci is set to run on cron and it does call the netstat command several times, one specifically for the ax25 protocol.??? See for yourself.??? The -o is the output log file.
strace -ormsgw_aci.strace-log -fvtTq -s1024 /usr/local/bin/rmsgw_aci

I'm going to downgrade the Pi-1 to Bullseye, patch it thoroughly, and drop this config on it again and include beacon running every 35 minutes.?

I'll re-image my Pi-3 and load a 64bit Bookworm release and get it queued up for a second round of tests.? The two Pi's will swap roles for monitoring the serial console to catch the panic strings when the other goes crazy

We're collecting quite a few variables to test here.? This could take a while if crashes only happen every few days.
32 or 64 bit?
Bullseye or Bookworm?
How many ax25 packets until the system becomes unstable?
Beacon or no-beacon?


 

I think there is another status reply I made before this one, pending...

I moved to 64bit Bookworm on the Pi-3, and setup the Pi-1 to monitor the serial interface on Pi-3 so I can capture the crash.
Linux rms-gw3 6.1.0-rpi8-rpi-v8 #1 SMP PREEMPT Debian 1:6.1.73-1+rpt1 (2024-01-25) aarch64 GNU/Linux

A new process appears in the panic string this time, but the behavior is similar.?
17h 24m uptime, and we panic? with this message (a clip).? This time the panic string is kworker.? I think that's a generic kernel task
----------
[62681.000007] Internal error: Oops: 0000000096000005 [#1] PREEMPT SMP
[62681.006386] Modules linked in: mkiss ax25 cmac algif_hash aes_arm64 aes_generic algif_skcipher af_alg bnep vc4 snd_soc_hdmi_codec drm_display_helpe
r cec drm_dma_helper drm_kms_helper brcmfmac snd_soc_core binfmt_misc brcmutil hci_uart snd_compress cfg80211 btbcm snd_pcm_dmaengine fb_sys_fops rasp
berrypi_hwmon bcm2835_codec(C) syscopyarea sysfillrect cdc_acm bcm2835_v4l2(C) bcm2835_isp(C) bluetooth sysimgblt v4l2_mem2mem bcm2835_mmal_vchiq(C) v
ideobuf2_dma_contig videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_common videodev ecdh_generic snd_bcm2835(C) ecc snd_pcm rfkill libaes ?
raspberrypi_gpiomem snd_timer vc_sm_cma(C) snd mc uio_pdrv_genirq uio drm fuse dm_mod drm_panel_orientation_quirks backlight ip_tables x_tables ipv6 i
2c_bcm2835
[62681.074035] CPU: 0 PID: 8218 Comm: kworker/u8:0 Tainted: G???????? C???????? 6.1.0-rpi8-rpi-v8 #1? Debian 1:6.1.73-1+rpt1
[62681.085158] Hardware name: Raspberry Pi 3 Model B Rev 1.2 (DT)
[62681.091075] Workqueue: events_unbound flush_to_ldisc
--------

I was not making any changes to the Pi-3 at the time, and only recognized the system had faulted when I attempted to check winlink mail via VHF from another system.? It would not respond but I could see lights on the TNC decoding my attempts to connect.

A few hours before I was checking on system health and noticed this.
Feb 02 09:42:04 rms-gw3 rmsgw_aci[7546]: Channel Stats: 1 read, 1 active, 0 down, 1 updated, 0 errors?? <---- last good update which matches timestamp on winlink status page
Feb 02 10:14:01 rms-gw3 rmsgw_aci[7616]: Channel Stats: 1 read, 1 active, 1 down, 0 updated, 0 errors? <----- things start to break down here
Feb 02 10:42:01 rms-gw3 rmsgw_aci[7664]: Channel Stats: 1 read, 1 active, 1 down, 0 updated, 0 errors
Feb 02 10:54:39 rms-gw3 rmsgw_aci[7698]: Channel Stats: 1 read, 1 active, 1 down, 0 updated, 0 errors
Feb 02 10:54:54 rms-gw3 rmsgw_aci[7721]: Channel Stats: 1 read, 1 active, 1 down, 0 updated, 0 errors
Feb 02 10:55:01 rms-gw3 rmsgw_aci[7742]: Channel Stats: 1 read, 1 active, 1 down, 0 updated, 0 errors
Feb 02 10:55:11 rms-gw3 rmsgw_aci[7775]: Channel Stats: 1 read, 1 active, 1 down, 0 updated, 0 errors
Feb 02 10:55:52 rms-gw3 rmsgw_aci[7802]: Channel Stats: 1 read, 1 active, 1 down, 0 updated, 0 errors
Feb 02 10:59:19 rms-gw3 rmsgw_aci[7835]: Channel Stats: 1 read, 1 active, 1 down, 0 updated, 0 errors
Feb 02 11:14:01 rms-gw3 rmsgw_aci[7875]: Channel Stats: 1 read, 1 active, 1 down, 0 updated, 0 errors
Feb 02 11:42:01 rms-gw3 rmsgw_aci[7976]: Channel Stats: 1 read, 1 active, 1 down, 0 updated, 0 errors
Feb 02 12:14:01 rms-gw3 rmsgw_aci[8193]: Channel Stats: 1 read, 1 active, 1 down, 0 updated, 0 errors

Tracing the script that does ACI, I notice it is failing to detect a device from the netstat command and reporting a device is down.
root@rms-gw3:/usr/local/etc/rmsgw# netstat --protocol=ax25 -l
Active AX.25 sockets
Dest?????? Source???? Device? State??????? Vr/Vs??? Send-Q? Recv-Q
*????????? WA6BGS-10????????? LISTENING??? 000/000? 0?????? 0??? ?

In the device column, it should show ax0.? Even though this was not displaying in the netstat command, the gateway was passing traffic between 9:42 and 12:14 before it crashed.?
I'm the only user, so it's very lightly used while I work out these bugs.

Now I restarted the Pi-3 and to try something different I did not give kissattach an IP address.? It starts okay and passes traffic, but the ACI script fails because the rmschanstat script looks for an IP to determine the interface is up.

I'm running out of things to change or try besides abandoning the Pi.
32 bit Bullseye and Bookworm - same results with mostly similar crash times.
Pi-1 and Pi-3 - same results.
64 bit Bookworm on Pi-3 - same results so far.

I'm following the same build/config recipe each time.? Nearly identical packages added from the repos, and the same git code pull for the rmsgw software.
I put the xml and config files in the same place from the same source, and the gateway starts as expected each time.

64bit bookworm appears to be a 32/64 bit kernel (lscpu), but every app I'm running returns "ELF 64-big LSB" including the rmsgw app I compiled from git source.

Any suggestions will be considered.
I would like to compare config notes with others, especially if you are using a TNC.
-Jon



 

Hi David,

I have a test Raspberry Pi 4 + 64bit Bookworm setup ( 6.1.0-rpi7-rpi-v8 #1 SMP PREEMPT Debian 1:6.1.63-1+rpt1 (2023-11-24) aarch64 GNU/Linux ) using the Linux AX.25 stack + VE7FET Ax.25 libs/apps/tools + Direwolf 1.7 (with GPIO PTT) and a Syba USB sound device but it's NOT connected to a radio to send RF traffic.? That said, it's been running beacon for several *months* w/o any crashes:
? Does your Pi have a radio input to hear traffic?

One thing I notice below fJon's most recent post is that they are using a 32bit kernel (aka amrv6l) but 64bit binaries (Pat) where my setup is using using the 64bit kernel.? That might be an important difference.
? I didn't see Jon mention that he was using Pat; besides, I wouldn't think you could run a 64bit binary on a 32bit kernel/arch?

? Thanks
? Mike

?


 


I do have a serial console attached to the Pi-1
? Sorry, I misread your message ... At least I didn't post an amazon link to an adapter :) .

Some crash strings from this month.
Jan 25 19:22:40 wa6bgs-rms kernel: CPU: 0 PID: 976 Comm: beacon Tainted: G??????? WC???????? 6.1.0-rpi7-rpi-v7 #1? Raspbian 1:6.1.63-1+rpt1
Jan 26 09:06:37 wa6bgs-rms kernel: CPU: 0 PID: 988 Comm: beacon Tainted: G???????? C???????? 6.1.0-rpi7-rpi-v7 #1? Raspbian 1:6.1.63-1+rpt1
[208158.485745] CPU: 0 PID: 850 Comm: beacon Tainted: G???????? C???????? 6.1.0-rpi7-rpi-v6 #1? Raspbian 1:6.1.63-1+rpt1
? Cool, so between the two of us we have 4 Pis on 6.1.0 that crash. ?If you get a crash after you disable beacon, grab the crash string; I'm very curious to know what process it blames. ?

And this happened when the system was only up for 3.5 hours, and I ran the beacon command manually.
root@rms-gw:~# beacon -c WA6BGS -d "beacon" -s radio "RMS Gate = WA6BGS-10"
It crashed instantly.
[12716.423664] CPU: 0 PID: 1217 Comm: beacon Tainted: G???????? C???????? 6.1.0-rpi7-rpi-v6 #1? Raspbian 1:6.1.63-1+rpt1

This is when I was convinced beacon was causing the problem.?
Going back one raspbian release is possible, but not permanent.
I had the same thought about beacon, but was disappointed to see another process trigger the crash; reverting to an earlier release may be your only solid work around right now.

So I heard you say that prior crashes happened every 1-2 days and it looks like this crash happened after only 3.5 hours. ?My systems have varied between 4 - 6 days between crashes. ?I have a theory that the crash happens after the kernel processes a certain amount of AX.25 traffic. ?It could explain why your Pi crashes faster than my Pi. ?My packet channel is pretty quiet; sometimes 2 - 3 minutes go by without even a single transmission. ?Would you say your packet channel is busier than this? ?Maybe I should trend the packet count from ifconfig ...

? Cheers
? Mike


 

¿ªÔÆÌåÓý


I have a test Raspberry Pi 4 + 64bit Bookworm setup ( 6.1.0-rpi7-rpi-v8 #1 SMP PREEMPT Debian 1:6.1.63-1+rpt1 (2023-11-24) aarch64 GNU/Linux ) using the Linux AX.25 stack + VE7FET Ax.25 libs/apps/tools + Direwolf 1.7 (with GPIO PTT) and a Syba USB sound device but it's NOT connected to a radio to send RF traffic.? That said, it's been running beacon for several *months* w/o any crashes:

?? /usr/sbin/beacon -c KI6ZHD-8 -d beacon -t 60 vhfdrop KI6ZHD/k KI6ZHD-1/b SCLARA/n 44.128.0.1/ip : Linpac in Santa Clara


One thing I notice below fJon's most recent post is that they are using a 32bit kernel (aka amrv6l) but 64bit binaries (Pat) where my setup is using using the 64bit kernel.? That might be an important difference.

--David
KI6ZHD



On 01/31/2024 09:04 AM, Jon Bousselot KK6VLO wrote:

Crash club.? I think I've overpaid my dues to this club over the years.? I like it.
I'm currently testing the rmsgw without beacon running at all, see if I can make it past two days.? Telling the TNC to beacon should be easy and that is next on my list.? I think the final production deployment for our club gateway will be on an x86 system, a GMKTEK N5105, also using a TNC.? The out the door price for that pc is really close to all the peripherals needed to deploy a Pi, and we have bigger plans for the system, not just RMSGW.? For other club members who want the pi solution at their home (like myself) I want to figure it out.

Here is my current kernel on the Pi-1.?? Linux rms-gw 6.1.0-rpi7-rpi-v6 #1 Raspbian 1:6.1.63-1+rpt1 (2023-11-24) armv6l GNU/Linux
And the kernel on the Pi-3 which has the same issue.? Linux wa6bgs-rms 6.1.0-rpi7-rpi-v7 #1 SMP Raspbian 1:6.1.63-1+rpt1 (2023-11-24) armv7l GNU/Linux

I do have a serial console attached to the Pi-1 to catch kernel panic messages.
Here is the string I saw from prior crashes.? I was only able to capture some because I had journalctl -af running in a shell.? The serial console prints this automatically.

Some crash strings from this month.
Jan 25 19:22:40 wa6bgs-rms kernel: CPU: 0 PID: 976 Comm: beacon Tainted: G??????? WC???????? 6.1.0-rpi7-rpi-v7 #1? Raspbian 1:6.1.63-1+rpt1
Jan 26 09:06:37 wa6bgs-rms kernel: CPU: 0 PID: 988 Comm: beacon Tainted: G???????? C???????? 6.1.0-rpi7-rpi-v7 #1? Raspbian 1:6.1.63-1+rpt1
[208158.485745] CPU: 0 PID: 850 Comm: beacon Tainted: G???????? C???????? 6.1.0-rpi7-rpi-v6 #1? Raspbian 1:6.1.63-1+rpt1

And this happened when the system was only up for 3.5 hours, and I ran the beacon command manually.
root@rms-gw:~# beacon -c WA6BGS -d "beacon" -s radio "RMS Gate = WA6BGS-10"
It crashed instantly.
[12716.423664] CPU: 0 PID: 1217 Comm: beacon Tainted: G???????? C???????? 6.1.0-rpi7-rpi-v6 #1? Raspbian 1:6.1.63-1+rpt1

This is when I was convinced beacon was causing the problem.?
Going back one raspbian release is possible, but not permanent.


 

Crash club.? I think I've overpaid my dues to this club over the years.? I like it.
I'm currently testing the rmsgw without beacon running at all, see if I can make it past two days.? Telling the TNC to beacon should be easy and that is next on my list.? I think the final production deployment for our club gateway will be on an x86 system, a GMKTEK N5105, also using a TNC.? The out the door price for that pc is really close to all the peripherals needed to deploy a Pi, and we have bigger plans for the system, not just RMSGW.? For other club members who want the pi solution at their home (like myself) I want to figure it out.

Here is my current kernel on the Pi-1.?? Linux rms-gw 6.1.0-rpi7-rpi-v6 #1 Raspbian 1:6.1.63-1+rpt1 (2023-11-24) armv6l GNU/Linux
And the kernel on the Pi-3 which has the same issue.? Linux wa6bgs-rms 6.1.0-rpi7-rpi-v7 #1 SMP Raspbian 1:6.1.63-1+rpt1 (2023-11-24) armv7l GNU/Linux

I do have a serial console attached to the Pi-1 to catch kernel panic messages.
Here is the string I saw from prior crashes.? I was only able to capture some because I had journalctl -af running in a shell.? The serial console prints this automatically.

Some crash strings from this month.
Jan 25 19:22:40 wa6bgs-rms kernel: CPU: 0 PID: 976 Comm: beacon Tainted: G??????? WC???????? 6.1.0-rpi7-rpi-v7 #1? Raspbian 1:6.1.63-1+rpt1
Jan 26 09:06:37 wa6bgs-rms kernel: CPU: 0 PID: 988 Comm: beacon Tainted: G???????? C???????? 6.1.0-rpi7-rpi-v7 #1? Raspbian 1:6.1.63-1+rpt1
[208158.485745] CPU: 0 PID: 850 Comm: beacon Tainted: G???????? C???????? 6.1.0-rpi7-rpi-v6 #1? Raspbian 1:6.1.63-1+rpt1

And this happened when the system was only up for 3.5 hours, and I ran the beacon command manually.
root@rms-gw:~# beacon -c WA6BGS -d "beacon" -s radio "RMS Gate = WA6BGS-10"
It crashed instantly.
[12716.423664] CPU: 0 PID: 1217 Comm: beacon Tainted: G???????? C???????? 6.1.0-rpi7-rpi-v6 #1? Raspbian 1:6.1.63-1+rpt1

This is when I was convinced beacon was causing the problem.?
Going back one raspbian release is possible, but not permanent.


 

? Hi Jon,

Very interesting and welcome to the crash club :) . ?Could you share what kernel version you are on? ?I'm on?6.1.0-rpi7-rpi-v8. ?I also match on the ax25 package versions up to the "+" sign; since we are on different architectures, the package version won't be an exact match. ?Same here for rmsgw, N7NIX built from source.

If you can hook up a serial console, I'd recommend it. ?That would allow you to copy/paste the crash messages. ?Make sure you have a TTL level serial adapter.

I would recommend moving the beacon to the TNC or just turning it off for a few days. ?You may find the system still crashes, which would be diagnostic. ?In my case a different process triggered the crash, so the serial console would be important.

? Cheers
? Mike


 

Hi David,

No, the Linux "listen" program can print out TXed packets but you need to enable that feature with the "-a" option:

?? ?????? "-a??????? Allow for the monitoring of outgoing frames as well as incoming ones."
? What I mean by this is that a client and a service sharing the same ax port will never hear each other. ?If rmsgw listens for -10 on ax0 and pat tries to call -10 on ax0, the two will never connect. ?I built the loopback to solve this problem.

Ok, this is the second part of new news.? Pat is making an outbound connection via a local digi and back to your Raspberry Pi.? Now to be clear, are you digipeating or NODEing out and back?? I ask because when you NODE around, your SSID gets decremented by one.? That nuance might matter here.
? ?So, nomenclature here to make sure we are on the same page, by digipeating you mean adding a digipeater to the initial connection, correct? ?Versus NODEing, where you make an initial connection to a NODE and make a second, in-band connection to the destination? ?In that context, definitely digipeating. ?In general form, the pat connect alias looks like this: ?ax25://dw12/DIGI/MYRMS-10 where "dw12" is the ax port, DIGI is any digipeater, and MYRMS-10 is rmsgw on the Pi.

That's a LOT of power for only being so close to each other.? Can you put them in to "EL" or Extra Low mode which is 0.5w?? That might help here.? I would also argue that moving them father apart and also onto different Z-planes aka elevation might help if this is really an RFI issue.? If it is RFI related, I would expect to see other errors like USB device drops, etc.
I'm not sure what you mean here by "close to each other". ?Are you referring to distance between Pi and radio, or are you talking about my RF partners? ?I actually think 10w is pretty conservative, given my closest RF partners are 25-30 miles away. ?I did run this radio at 5w for a few years, but noticed that distant partners were unreliable at that power setting. ?I would be glad to test with the power set to "low", but I don't have an "EL" option. ?Again, a single radio being controlled by a single Pi (PROD). ?You used the plural "them" when referring to "EL" setting as if maybe you thought I had 2 radios, but I don't.

The Z-plane confused me for a minute because I was thinking cartesian coordinates, but you must be referring to an antenna coordinate system where Z runs up and down (parallel to gravity). ?In that coordinate system, the relationship between radio and Pi would be described as the radio at the origin, and the Pi is at x=0, y=0, z=-3 where units are in feet. ?To paint a word picture, the radio is mounted on the top of a bakers rack, with the Pi directly underneath, but 3 shelves down. ?If I moved the Pi, the best distance I could practically get would be x=-2, y=-8 and z=-5. ?It would take me a while to get the lengths of cables needed to make that happen.

Swapping the SD cards around might help here and I imagine that "LEGACY" os is using an older kernel that might not have these AX.25 issues.? Did that LEGACY setup also have rmsgw and Pat running at the same time on it?? In addition to this test and since you have multiple PIs, you might consider splitting apart of the Pat and rmsgw onto different Pis.? That might help isolate the issue as well.
My thought was actually swapping the whole Pi (LEGACY for TEST). ?The value in the LEGACY unit is that it is a known quantity; working software, firmware and hardware in the RF environment. ?Besides, LEGACY is a Pi3, where TEST is a Pi4, so I don't think a swapped SD would boot. ?Correct, LEGACY is an older build, maybe jessie or buster, but definitely an older kernel. ?LEGACY has both rmsgw and Pat; it's nearly an identical configuration to PROD/TEST, just older software versions. ?However, LEGACY was using different sound hardware, a UDRC-II. ?I would pry the UDRC-II off the header and use the CM108 usb adapter from TEST.

As for splitting up rmsgw and Pat, I decided that I would just turn off Pat for a test cycle. ?If the Pi crashes, then we know that the kernel blaming Pat is a red-herring. ?Since we need some traffic to trigger the crash, I've been just axcall'ing to another node and disconnecting on occasion.

? Thanks
? Mike


 

Looking for answers on this specifically, I see others are having the same problem.? Beacon seems to be in the panic string each time my Pi crashes.? I'm using a Nino TNC, Raspbian Bookworm 32 bit, and can duplicate the crash on a Pi-3 and Pi-1.? I needed a serial console to capture the full crash content.? If I launch beacon to fork into the background, sending a station beacon every 35 minutes, the system crashes after about 2 days.? I can also get the system to crash sometimes if I invoke beacon to do a one-time send.

I'm trying to setup RMSGW for Winlink mail.? Using the distro supplied packages for ax25-tools and ax25-apps, and the N7NIX source for rmwgw.? This is a new install.
ax25-tools???? 0.0.10-rc5+git20190411+3595f87-6
ax25-apps????? 0.0.8-rc5+git20190411+0ff1383-5

Without beacon, it is fairly stable on Pi-1 and Pi-3.
Yes, the Pi-1 is kind of slow, but the TNC is doing the work.? The TNC can also beacon on its own - an option I have on my list to try.