I imagine you're getting confused by Pat's version
numbering scheme for it's infrastructure modules (wl2k-go) which
include all kinds of stuff.? It's 0.11.8 version is just one of
many versions that looks similar to what the Official AX.25 repo
as well as what the VE7FET repo uses:
??
Linux's current AX.25 woes is not a problem in these user-space
libraries and utilities.? The issues are in the kernel itself.
--David
KI6ZHD
?
On 02/12/2024 11:29 PM, JJ wrote:
toggle quoted message
Show quoted text
Hmmm..this shows ax25 version 0.11.8
newer? changes?
Just sorta stumbled upon this..haven't tried anything tho...
On 2024-02-12 12:23 a.m., David Ranch
wrote:
Hello Mike,
Again, thank you for the detailed email and I think this all
helps in tracking down the real issue here.? I've been
discussing this on the side with Bernard F6BVP who maintains
FPAC (node) and FBB (BBS) and uses the ROSE protocol heavily.?
He reported that he's "running three ROSE/FPAC nodes on a local
network and I haven't observed any connections issues with
Raspbian OS 64bit for a long time nor with Ubuntu (20.04)".? He
showed months of uptime with LOTS of connections without either
any panics or any orphaned AX.25 connections.? One key point he
mentioned is that he does NOT have any RF connections, it's all
via AXUDP and he also noted he's NOT using mkiss for linking the
AXUDP to the kernel with kissattach.
--David
On 02/10/2024 12:27 PM, Michael
Dunn wrote:
? Hi everyone,
Been quiet on this topic recently as I haven't had much to
report. ?I've had a lack of any crashes for over 12 days now,
which seems to be related to disabling Pat in my environment.
?Please don't jump to conclusions here; this is a complex
issue. ?As I've said in the past, Pat doesn't appear to be the
cause of the crash, just the process that trips over the
kernel garbage to trigger it.
Since I've had some time to think about this problem, here's
what I've noticed:
? - Jon and I both run rmsgw. ?We both have crashes on the
system running rmsgw.
? - The process that triggers the crash is mobile (beacon,
pat, netstat), but is never rmsgw.
? - Jon and I both have few outside connections to rmsgw. ?My
last outside connection was 10 days ago.
? - Jon has crashes in just a few hours; my crashes take days
to weeks.
? - Jon frequently self checks his mail (possibly hourly? but
I don't think he stated). ?I self check my mail infrequently
(daily).
Based on these facts, my theory is this. ?rmsgw puts the
kernel in some sort of bad state. ?This state is tripped over
later by some unsuspecting process, causing the kernel crash.
To prove this out, I need to separate my rms server from my
rms client. ?Unfortunately, I have only one radio, so I
decided to take this test off-air. ?In my test setup, I built
2 Pi's, let's call them RMS and PAT to distinguish their
roles. ?The Pis were built by imaging the SD card from PROD in
the manner I've described previously.
Instead of connecting to a radio, I simply connected sound
card to sound card using a pair of TRS cables (headphone to
microphone in both directions). ?Direwolf was configured to
match and PTT was disabled. ?RMS ran rmsgw from ax25d and a
shell loop on PAT checked my mail every 30 minutes using pat
-s (send only, as not to eat my actual inbox).
I let this setup run over night. ?In the morning, neither Pi
had crashed, but I did notice that PAT was no longer able to
connect to RMS. ?Tracking through the logs, it looks like
about 7 hours after I setup the test, connections started
failing. ?Digging into the RMS Pi, I found the netstat
condition that Jon first reported. ?Note ax0 missing from the
device column:
Dest
? ? ? Source ? ? Device ?State ? ? ? ?Vr/Vs ? ?Send-Q
?Recv-Q
*
? ? ? ? ?MYCALL-10 ? ? ? ? ?LISTENING ? ?000/000 ?0 ? ? ?
0
?
The kernel on RMS had been trashed, but the Pi was still
operating. ?Checking the PAT Pi, netstat output looked normal.
?Realizing I still needed an AX25 event to trigger the crash,
I used axcall on RMS ?to generate some traffic. ?The RMS Pi
immediately crashed, blaming axcall as it went down:
[61160.353159]
CPU: 1 PID: 130380 Comm: axcall Tainted: G ? ? ? WC ? ?
?6.1.0-rpi7-rpi-vB #1 Debian 1:6.1.63-1+rpt1
?
For me, this is great news. ?I have an off-air way to quickly
show the problem. ?This also continues to show that the crash
is mobile between processes and demonstrates an unrelated
trigger event. ?Next steps are to reproduce the crash to
ensure it is reliable. ?I'm also going to move RMS and PAT out
of the RF environment (e.g. the other end of the house) to
ensure there is no RFI element.
? Cheers
? Mike