Hmmm..this shows ax25 version 0.11.8
newer? changes?
Just sorta stumbled upon this..haven't tried anything tho...
On 2024-02-12 12:23 a.m., David Ranch
wrote:
toggle quoted message
Show quoted text
Hello Mike,
Again, thank you for the detailed email and I think this all helps
in tracking down the real issue here.? I've been discussing this
on the side with Bernard F6BVP who maintains FPAC (node) and FBB
(BBS) and uses the ROSE protocol heavily.? He reported that he's
"running three ROSE/FPAC nodes on a local network and I haven't
observed any connections issues with Raspbian OS 64bit for a long
time nor with Ubuntu (20.04)".? He showed months of uptime with
LOTS of connections without either any panics or any orphaned
AX.25 connections.? One key point he mentioned is that he does NOT
have any RF connections, it's all via AXUDP and he also noted he's
NOT using mkiss for linking the AXUDP to the kernel with
kissattach.
--David
On 02/10/2024 12:27 PM, Michael Dunn
wrote:
? Hi everyone,
Been quiet on this topic recently as I haven't had much to
report. ?I've had a lack of any crashes for over 12 days now,
which seems to be related to disabling Pat in my environment.
?Please don't jump to conclusions here; this is a complex issue.
?As I've said in the past, Pat doesn't appear to be the cause of
the crash, just the process that trips over the kernel garbage
to trigger it.
Since I've had some time to think about this problem, here's
what I've noticed:
? - Jon and I both run rmsgw. ?We both have crashes on the
system running rmsgw.
? - The process that triggers the crash is mobile (beacon, pat,
netstat), but is never rmsgw.
? - Jon and I both have few outside connections to rmsgw. ?My
last outside connection was 10 days ago.
? - Jon has crashes in just a few hours; my crashes take days to
weeks.
? - Jon frequently self checks his mail (possibly hourly? but I
don't think he stated). ?I self check my mail infrequently
(daily).
Based on these facts, my theory is this. ?rmsgw puts the kernel
in some sort of bad state. ?This state is tripped over later by
some unsuspecting process, causing the kernel crash.
To prove this out, I need to separate my rms server from my rms
client. ?Unfortunately, I have only one radio, so I decided to
take this test off-air. ?In my test setup, I built 2 Pi's, let's
call them RMS and PAT to distinguish their roles. ?The Pis were
built by imaging the SD card from PROD in the manner I've
described previously.
Instead of connecting to a radio, I simply connected sound card
to sound card using a pair of TRS cables (headphone to
microphone in both directions). ?Direwolf was configured to
match and PTT was disabled. ?RMS ran rmsgw from ax25d and a
shell loop on PAT checked my mail every 30 minutes using pat -s
(send only, as not to eat my actual inbox).
I let this setup run over night. ?In the morning, neither Pi had
crashed, but I did notice that PAT was no longer able to connect
to RMS. ?Tracking through the logs, it looks like about 7 hours
after I setup the test, connections started failing. ?Digging
into the RMS Pi, I found the netstat condition that Jon first
reported. ?Note ax0 missing from the device column:
Dest
? ? ? Source ? ? Device ?State ? ? ? ?Vr/Vs ? ?Send-Q
?Recv-Q
* ?
? ? ? ?MYCALL-10 ? ? ? ? ?LISTENING ? ?000/000 ?0 ? ? ? 0
?
The kernel on RMS had been trashed, but the Pi was still
operating. ?Checking the PAT Pi, netstat output looked normal.
?Realizing I still needed an AX25 event to trigger the crash, I
used axcall on RMS ?to generate some traffic. ?The RMS Pi
immediately crashed, blaming axcall as it went down:
[61160.353159]
CPU: 1 PID: 130380 Comm: axcall Tainted: G ? ? ? WC ? ?
?6.1.0-rpi7-rpi-vB #1 Debian 1:6.1.63-1+rpt1
?
For me, this is great news. ?I have an off-air way to quickly
show the problem. ?This also continues to show that the crash is
mobile between processes and demonstrates an unrelated trigger
event. ?Next steps are to reproduce the crash to ensure it is
reliable. ?I'm also going to move RMS and PAT out of the RF
environment (e.g. the other end of the house) to ensure there is
no RFI element.
? Cheers
? Mike