Hi There,
so close....
I have been working away getting g2_link running on AlmaLinux alongside DPlus G3 version 3.2, and am getting pretty close. I have g2_link and dtmf_reader converted to systemd services, and have them both running successfully (wahoo). However, it seems I am
running into one last hurdle and I am hoping someone with a bit more recent Linux development experience can help me out with. This is something I can tinker with in my spare time, but I don't have much spare time lately.? If there are any C++ / Linux development
guru's out there that have experience I have no doubt they can figure it out much quicker than I can.
The issue I am running into is that the g2_lh program (which is executed frequently to build the dashboard) will sometimes crash, causing g2_link to crash as described below.? I have the systemd service I created to restart the service when it crashes, but
of course if there was a pre-established link it drops and will need to be reestablished.
Here are the details of what is happening:
sometimes, when g2_lh executes, it crashes it core-dumps with the following (from /var/messages):
Sep ?5 09:30:02 vy1rds-gw2 kernel: g2_lh[126886]: segfault at 448001 ip 00007f18eb159755 sp 00007ffc613cd4a8 error 4 in libc.so.6[7f18eb028000+175000] likely on CPU 0 (core 0, socket 0)
Sep ?5 09:30:02 vy1rds-gw2 kernel: Code: 0f b6 44 07 e0 29 c8 c5 f8 77 c3 66 2e 0f 1f 84 00 00 00 00 00 83 fa 01 76 7b 89 f8 09 f0 25 ff 0f 00 00 3d e0 0f 00 00 7f 2b <c5> fe 6f 16 c5 ed 74 17 c5 ? ? ? ? ?fd d7 c2 ff c0 c4 e2 68 f5 d0 0f 85 d2
Sep ?5 09:30:02 vy1rds-gw2 systemd[1]: Started Process Core Dump (PID 127638/UID 0).
Sep ?5 09:30:02 vy1rds-gw2 systemd-coredump[127639]: Resource limits disable core dumping for process 126886 (g2_lh).
Sep ?5 09:30:02 vy1rds-gw2 systemd-coredump[127639]: Process 126886 (g2_lh) of user 0 dumped core.
Sep ?5 09:30:02 vy1rds-gw2 systemd[1]: systemd-coredump@...: Deactivated successfully.
Shortly after, g2_link crashes (from /var/messages)
Sep ?5 09:30:52 vy1rds-gw2 kernel: g2_link[126542]: segfault at ae15ae9a ip 00000000f7ca84a3 sp 00000000fffc2d0c error 4 in libstdc++.so.6.0.29[f7c78000+123000] likely on CPU 2 (core 0, socket 0)
Sep ?5 09:30:52 vy1rds-gw2 kernel: Code: 66 90 66 90 66 90 90 f3 0f 1e fb 8b 54 24 04 8b 42 0c 85 c0 74 11 90 89 c2 8b 40 08 85 c0 75 f7 89 d0 c3 8d 74 26 00 8b 42 04 <3b> 50 0c 75 1b 8d b4 26 00 ? ? ? ? ?00 00 00 90 89 c2 8b 40 04 39 50 0c 74
Sep ?5 09:30:52 vy1rds-gw2 systemd[1]: Started Process Core Dump (PID 127773/UID 0).
Sep ?5 09:30:52 vy1rds-gw2 systemd-coredump[127774]: Resource limits disable core dumping for process 126542 (g2_link).
Sep ?5 09:30:52 vy1rds-gw2 systemd-coredump[127774]: Process 126542 (g2_link) of user 0 dumped core.
Sep ?5 09:30:52 vy1rds-gw2 systemd[1]: systemd-coredump@...: Deactivated successfully.
In /var/log/g2_link, this is the error that happens right before the g2_link segfault:
??????090524 at 09:30:52:call=1NFO ? ? timeout, removing 10.0.0.2, users=0
It seems to me, that g2_lh is crashing, which then has the cascading effect into g2_link. While I have the c code for g2_lh which I can try to debug as time permits, I do not have the code for g2_link, so I cant look into that side.
It has been a long?time since I analyzed core dumps, but I am willing to try. If anyone wants to see the core dumps to analyze, I will happily enable core dumping for these services and send them over for their analysis.
Rob, VY1RG
|
what is the source of your g2-link and what is the version
number?
rich
On 9/5/24 11:05 AM, Robert Gillis via
groups.io wrote:
toggle quoted message
Show quoted text
Hi There,
so close....
I have been working away getting g2_link running on AlmaLinux
alongside DPlus G3 version 3.2, and am getting pretty close. I
have g2_link and dtmf_reader converted to systemd services, and
have them both running successfully (wahoo). However, it seems I
am running into one last hurdle and I am hoping someone with a
bit more recent Linux development experience can help me out
with. This is something I can tinker with in my spare time, but
I don't have much spare time lately.? If there are any C++ /
Linux development guru's out there that have experience I have
no doubt they can figure it out much quicker than I can.
The issue I am running into is that the g2_lh program (which is
executed frequently to build the dashboard) will sometimes
crash, causing g2_link to crash as described below.? I have the
systemd service I created to restart the service when it
crashes, but of course if there was a pre-established link it
drops and will need to be reestablished.
Here are the details of what is happening:
sometimes, when g2_lh executes, it crashes it core-dumps
with the following (from /var/messages):
Sep ?5 09:30:02 vy1rds-gw2 kernel: g2_lh[126886]: segfault at
448001 ip 00007f18eb159755 sp 00007ffc613cd4a8 error 4 in
libc.so.6[7f18eb028000+175000] likely on CPU 0 (core 0, socket
0)
Sep ?5 09:30:02 vy1rds-gw2 kernel: Code: 0f b6 44 07 e0 29 c8 c5
f8 77 c3 66 2e 0f 1f 84 00 00 00 00 00 83 fa 01 76 7b 89 f8 09
f0 25 ff 0f 00 00 3d e0 0f 00 00 7f 2b <c5> fe 6f 16 c5 ed
74 17 c5 ? ? ? ? ?fd d7 c2 ff c0 c4 e2 68 f5 d0 0f 85 d2
Sep ?5 09:30:02 vy1rds-gw2 systemd[1]: Started Process Core Dump
(PID 127638/UID 0).
Sep ?5 09:30:02 vy1rds-gw2 systemd-coredump[127639]: Resource
limits disable core dumping for process 126886 (g2_lh).
Sep ?5 09:30:02 vy1rds-gw2 systemd-coredump[127639]: Process
126886 (g2_lh) of user 0 dumped core.
Shortly after, g2_link crashes (from /var/messages)
Sep ?5 09:30:52 vy1rds-gw2 kernel: g2_link[126542]: segfault at
ae15ae9a ip 00000000f7ca84a3 sp 00000000fffc2d0c error 4 in
libstdc++.so.6.0.29[f7c78000+123000] likely on CPU 2 (core 0,
socket 0)
Sep ?5 09:30:52 vy1rds-gw2 kernel: Code: 66 90 66 90 66 90 90 f3
0f 1e fb 8b 54 24 04 8b 42 0c 85 c0 74 11 90 89 c2 8b 40 08 85
c0 75 f7 89 d0 c3 8d 74 26 00 8b 42 04 <3b> 50 0c 75 1b 8d
b4 26 00 ? ? ? ? ?00 00 00 90 89 c2 8b 40 04 39 50 0c 74
Sep ?5 09:30:52 vy1rds-gw2 systemd[1]: Started Process Core Dump
(PID 127773/UID 0).
Sep ?5 09:30:52 vy1rds-gw2 systemd-coredump[127774]: Resource
limits disable core dumping for process 126542 (g2_link).
Sep ?5 09:30:52 vy1rds-gw2 systemd-coredump[127774]: Process
126542 (g2_link) of user 0 dumped core.
In /var/log/g2_link, this is the error that happens right before
the g2_link segfault:
??????090524 at 09:30:52:call=1NFO ? ? timeout, removing
10.0.0.2, users=0
It seems to me, that g2_lh is crashing, which then has the
cascading effect into g2_link. While I have the c code for g2_lh
which I can try to debug as time permits, I do not have the code
for g2_link, so I cant look into that side.
It has been a long?time since I analyzed core dumps, but
I am willing to try. If anyone wants to see the core dumps to
analyze, I will happily enable core dumping for these services
and send them over for their analysis.
Rob, VY1RG
|
i've not seen this before but that message is emitted from send_heartbeat() here:
static void send_heartbeat() { ?? inbound_type::iterator pos; ?? inbound *inbound_ptr; ?? bool removed = false;
?? for (pos = inbound_list.begin(); pos != inbound_list.end(); pos++) ?? { ????? inbound_ptr = (inbound *)pos->second; ????? sendto(ref_g2_sock,(char *)REF_ACK,3,0, ???????????? (struct sockaddr *)&(inbound_ptr->sin), ???????????? sizeof(struct sockaddr_in));
????? if (inbound_ptr->countdown >= 0) ???????? inbound_ptr->countdown --;
????? if (inbound_ptr->countdown < 0) ????? { ???????? removed = true; ???????? traceit("call=%s timeout, removing %s, users=%d\n", ???????????????? inbound_ptr->call, ???????????????? pos->first.c_str(), ???????????????? inbound_list.size() - 1);
???????? free(pos->second); ???????? pos->second = NULL; ???????? inbound_list.erase(pos); ????? } ?? } ?? if (removed) ????? print_status_file(); }
toggle quoted message
Show quoted text
On 9/5/24 11:05 AM, Robert Gillis via groups.io wrote: timeout, removing
|
Hi Rich,
Based on the log output, I am using g2_link 4.00 binaries, which I sourced from . However.... I did find some time today to do some debugging.? With a little help from my old friend gdb, I was able to find the condition that causes this segmentation fault in g2_lh, and I coded a simple handler
to handle the condition (instead of letting it crash when this condition is met—which in turn also keeps g2_link
up). The system has been steady ever since I applied my patch to g2_lh and I even just had a short QSO with KA8SCP via my repeater / g2_link. While I will still need to do some more debugging to understand why this condition is happening in the first place,
I am happy with the patch I created for g2_lh.
I will go though my documentation that I created while going through this exercise tonight to clean it up a bit,?and if time permits, I will send it out to this group for feedback / incorporation into other sources/official documentation.
Live dashboard of upgraded system here: .? I still have a bit of customization to do on the registration page, etc., but fully operational.
Rob
toggle quoted message
Show quoted text
what is the source of your g2-link and what is the version number?
rich
On 9/5/24 11:05 AM, Robert Gillis via groups.io wrote:
Hi There,
so close....
I have been working away getting g2_link running on AlmaLinux alongside DPlus G3 version 3.2, and am getting pretty close. I have g2_link and dtmf_reader converted to systemd services, and have them both running successfully (wahoo). However, it seems I am
running into one last hurdle and I am hoping someone with a bit more recent Linux development experience can help me out with. This is something I can tinker with in my spare time, but I don't have much spare time lately.? If there are any C++ / Linux development
guru's out there that have experience I have no doubt they can figure it out much quicker than I can.
The issue I am running into is that the g2_lh program (which is executed frequently to build the dashboard) will sometimes crash, causing g2_link to crash as described below.? I have the systemd service I created to restart the service when it crashes, but
of course if there was a pre-established link it drops and will need to be reestablished.
Here are the details of what is happening:
sometimes, when g2_lh executes, it crashes it core-dumps with the following (from /var/messages):
Sep ?5 09:30:02 vy1rds-gw2 kernel: g2_lh[126886]: segfault at 448001 ip 00007f18eb159755 sp 00007ffc613cd4a8 error 4 in libc.so.6[7f18eb028000+175000] likely on CPU 0 (core 0, socket 0)
Sep ?5 09:30:02 vy1rds-gw2 kernel: Code: 0f b6 44 07 e0 29 c8 c5 f8 77 c3 66 2e 0f 1f 84 00 00 00 00 00 83 fa 01 76 7b 89 f8 09 f0 25 ff 0f 00 00 3d e0 0f 00 00 7f 2b <c5> fe 6f 16 c5 ed 74 17 c5 ? ? ? ? ?fd d7 c2 ff c0 c4 e2 68 f5 d0 0f 85 d2
Sep ?5 09:30:02 vy1rds-gw2 systemd[1]: Started Process Core Dump (PID 127638/UID 0).
Sep ?5 09:30:02 vy1rds-gw2 systemd-coredump[127639]: Resource limits disable core dumping for process 126886 (g2_lh).
Sep ?5 09:30:02 vy1rds-gw2 systemd-coredump[127639]: Process 126886 (g2_lh) of user 0 dumped core.
Shortly after, g2_link crashes (from /var/messages)
Sep ?5 09:30:52 vy1rds-gw2 kernel: g2_link[126542]: segfault at ae15ae9a ip 00000000f7ca84a3 sp 00000000fffc2d0c error 4 in libstdc++.so.6.0.29[f7c78000+123000] likely on CPU 2 (core 0, socket 0)
Sep ?5 09:30:52 vy1rds-gw2 kernel: Code: 66 90 66 90 66 90 90 f3 0f 1e fb 8b 54 24 04 8b 42 0c 85 c0 74 11 90 89 c2 8b 40 08 85 c0 75 f7 89 d0 c3 8d 74 26 00 8b 42 04 <3b> 50 0c 75 1b 8d b4 26 00 ? ? ? ? ?00 00 00 90 89 c2 8b 40 04 39 50 0c 74
Sep ?5 09:30:52 vy1rds-gw2 systemd[1]: Started Process Core Dump (PID 127773/UID 0).
Sep ?5 09:30:52 vy1rds-gw2 systemd-coredump[127774]: Resource limits disable core dumping for process 126542 (g2_link).
Sep ?5 09:30:52 vy1rds-gw2 systemd-coredump[127774]: Process 126542 (g2_link) of user 0 dumped core.
In /var/log/g2_link, this is the error that happens right before the g2_link segfault:
??????090524 at 09:30:52:call=1NFO ? ? timeout, removing 10.0.0.2, users=0
It seems to me, that g2_lh is crashing, which then has the cascading effect into g2_link. While I have the c code for g2_lh which I can try to debug as time permits, I do not have the code for g2_link, so I cant look into that side.
It has been a long?time since I analyzed core dumps, but I am willing to try. If anyone wants to see the core dumps to analyze, I will happily enable core dumping for these services and send them over for their analysis.
Rob, VY1RG
|
Rob,
I'm just starting my g3 rework for g2-link and g2-lh on almalunux
9.
I am preparing a server to go up on Methodist Mountain (10,713
ft) KD0QPG with the latest 9.4 and g3.20 for a trip there this
weekend... before the snow gets too deep!
I have al 9.4 installed and configured with my usual aids
(webmin, postfix, syslog-mailer, etc).
Just got the g3.20 minutes ago and will load in the next hours.
then g2-lh and g2-link.... I'm thinking of relabeling them as
g3-lh and g3-link so I dont have to try and deal with the centos
history.
what do you think?
regards
rich
On 9/5/24 6:21 PM, Robert Gillis via
groups.io wrote:
toggle quoted message
Show quoted text
Hi Rich,
Based on the log output, I am using g2_link 4.00 binaries, which
I sourced from . However.... I
did find some time today to do some debugging.? With a little
help from my old friend gdb, I was able to find the condition
that causes this segmentation fault in g2_lh, and I coded a
simple handler to handle the condition (instead of letting it
crash when this condition is met—which in turn also keeps
g2_link up). The system has been steady ever since I applied my
patch to g2_lh and I even just had a short QSO with KA8SCP via
my repeater / g2_link. While I will still need to do some more
debugging to understand why this condition is happening in the
first place, I am happy with the patch I created for g2_lh.
I will go though my documentation that I created while going
through this exercise tonight to clean it up a bit,?and if time
permits, I will send it out to this group for feedback /
incorporation into other sources/official documentation.
Live dashboard of upgraded system here: .? I still have a bit of
customization to do on the registration page, etc., but fully
operational.
Rob
what is the source
of your g2-link and what is the version number?
rich
On 9/5/24 11:05 AM, Robert Gillis via groups.io wrote:
Hi There,
so close....
I have been working away getting g2_link running on AlmaLinux
alongside DPlus G3 version 3.2, and am getting pretty close. I
have g2_link and dtmf_reader converted to systemd services,
and have them both running successfully (wahoo). However, it
seems I am running into one last hurdle and I am hoping
someone with a bit more recent Linux development experience
can help me out with. This is something I can tinker with in
my spare time, but I don't have much spare time lately.? If
there are any C++ / Linux development guru's out there that
have experience I have no doubt they can figure it out much
quicker than I can.
The issue I am running into is that the g2_lh program (which
is executed frequently to build the dashboard) will sometimes
crash, causing g2_link to crash as described below.? I have
the systemd service I created to restart the service when it
crashes, but of course if there was a pre-established link it
drops and will need to be reestablished.
Here are the details of what is happening:
sometimes, when g2_lh executes, it crashes it
core-dumps with the following (from /var/messages):
Sep ?5 09:30:02 vy1rds-gw2 kernel: g2_lh[126886]: segfault at
448001 ip 00007f18eb159755 sp 00007ffc613cd4a8 error 4 in
libc.so.6[7f18eb028000+175000] likely on CPU 0 (core 0, socket
0)
Sep ?5 09:30:02 vy1rds-gw2 kernel: Code: 0f b6 44 07 e0 29 c8
c5 f8 77 c3 66 2e 0f 1f 84 00 00 00 00 00 83 fa 01 76 7b 89 f8
09 f0 25 ff 0f 00 00 3d e0 0f 00 00 7f 2b <c5> fe 6f 16
c5 ed 74 17 c5 ? ? ? ? ?fd d7 c2 ff c0 c4 e2 68 f5 d0 0f 85 d2
Sep ?5 09:30:02 vy1rds-gw2 systemd[1]: Started Process Core
Dump (PID 127638/UID 0).
Sep ?5 09:30:02 vy1rds-gw2 systemd-coredump[127639]: Resource
limits disable core dumping for process 126886 (g2_lh).
Sep ?5 09:30:02 vy1rds-gw2 systemd-coredump[127639]: Process
126886 (g2_lh) of user 0 dumped core.
Shortly after, g2_link crashes (from /var/messages)
Sep ?5 09:30:52 vy1rds-gw2 kernel: g2_link[126542]: segfault
at ae15ae9a ip 00000000f7ca84a3 sp 00000000fffc2d0c error 4 in
libstdc++.so.6.0.29[f7c78000+123000] likely on CPU 2 (core 0,
socket 0)
Sep ?5 09:30:52 vy1rds-gw2 kernel: Code: 66 90 66 90 66 90 90
f3 0f 1e fb 8b 54 24 04 8b 42 0c 85 c0 74 11 90 89 c2 8b 40 08
85 c0 75 f7 89 d0 c3 8d 74 26 00 8b 42 04 <3b> 50 0c 75
1b 8d b4 26 00 ? ? ? ? ?00 00 00 90 89 c2 8b 40 04 39 50 0c 74
Sep ?5 09:30:52 vy1rds-gw2 systemd[1]: Started Process Core
Dump (PID 127773/UID 0).
Sep ?5 09:30:52 vy1rds-gw2 systemd-coredump[127774]: Resource
limits disable core dumping for process 126542 (g2_link).
Sep ?5 09:30:52 vy1rds-gw2 systemd-coredump[127774]: Process
126542 (g2_link) of user 0 dumped core.
In /var/log/g2_link, this is the error that happens right
before the g2_link segfault:
??????090524 at 09:30:52:call=1NFO ? ? timeout, removing
10.0.0.2, users=0
It seems to me, that g2_lh is crashing, which then has the
cascading effect into g2_link. While I have the c code for
g2_lh which I can try to debug as time permits, I do not have
the code for g2_link, so I cant look into that side.
It has been a long?time since I analyzed core dumps,
but I am willing to try. If anyone wants to see the core dumps
to analyze, I will happily enable core dumping for these
services and send them over for their analysis.
Rob, VY1RG
|
Hi Rich,
My apologies for not replying sooner, I only discovered this email tonight (I filter my groups.io email to a separate folder, and I have been quite busy lately and haven't had a chance to go through them until now.? Work has a way to get in the way of my hobbies
it seems ? ).
Have you already got this work done as you describe below?? I think rebranding g2_link as g3_link is not a bad idea and would be more reflective that it works with the g3 system. I believe in another email thread in the past you--or perhaps it was someone else--suggested
that we advocate for creating a GitHub?repository with the source code. I agree it would help with current and future development.? While I understand the original developer of g2_link did not want the code shared publicly, which I can (and will) respect,
I can't help but wonder if they would consent to having it on a private?GitHub repository? or perhaps their views have changed and don't mind making it open source?
I have since cleaned up my documentation of typos and made? a couple of very minor tweaks for the work I have done. I was planning to re-package and update the file on my website along with re-sending the instructions (again, only minor edits) to this group—but
I cant help but wonder if I should start a GitHub repository for the work I have done (with the intent of it becoming a?future home for g2_link (er, g3_link?) development. I suspect there would be no concern uploading the code for g2_lh to this repository,
as the code for it can be downloaded openly now (and it is complied as part of the g2_link install). Regardless, I would want to track down the original developer of g2_lh to confirm.
Thanks,
Rob
toggle quoted message
Show quoted text
Rob,
I'm just starting my g3 rework for g2-link and g2-lh on almalunux 9.
I am preparing a server to go up on Methodist Mountain (10,713 ft) KD0QPG with the latest 9.4 and g3.20 for a trip there this weekend... before the snow gets too deep!
I have al 9.4 installed and configured with my usual aids (webmin, postfix, syslog-mailer, etc).
Just got the g3.20 minutes ago and will load in the next hours. then g2-lh and g2-link.... I'm thinking of relabeling them as g3-lh and g3-link so I dont have to try and deal with the centos history.
what do you think?
regards
rich
On 9/5/24 6:21 PM, Robert Gillis via groups.io wrote:
Hi Rich,
Based on the log output, I am using g2_link 4.00 binaries, which I sourced from . However.... I did find some time today to do some debugging.? With a little help from my old friend gdb, I was able to find the condition that causes this segmentation fault in g2_lh, and I coded a simple handler
to handle the condition (instead of letting it crash when this condition is met—which in turn also keeps g2_link up). The system has been steady ever since I applied my patch to g2_lh and I even just had a short QSO with KA8SCP via my repeater / g2_link. While
I will still need to do some more debugging to understand why this condition is happening in the first place, I am happy with the patch I created for g2_lh.
I will go though my documentation that I created while going through this exercise tonight to clean it up a bit,?and if time permits, I will send it out to this group for feedback / incorporation into other sources/official documentation.
Live dashboard of upgraded system here: .? I still have a bit of customization to do on the registration page, etc., but fully operational.
Rob
what is the source of your g2-link and what is the version number?
rich
On 9/5/24 11:05 AM, Robert Gillis via groups.io wrote:
Hi There,
so close....
I have been working away getting g2_link running on AlmaLinux alongside DPlus G3 version 3.2, and am getting pretty close. I have g2_link and dtmf_reader converted to systemd services, and have them both running successfully (wahoo). However, it seems I am
running into one last hurdle and I am hoping someone with a bit more recent Linux development experience can help me out with. This is something I can tinker with in my spare time, but I don't have much spare time lately.? If there are any C++ / Linux development
guru's out there that have experience I have no doubt they can figure it out much quicker than I can.
The issue I am running into is that the g2_lh program (which is executed frequently to build the dashboard) will sometimes crash, causing g2_link to crash as described below.? I have the systemd service I created to restart the service when it crashes, but
of course if there was a pre-established link it drops and will need to be reestablished.
Here are the details of what is happening:
sometimes, when g2_lh executes, it crashes it core-dumps with the following (from /var/messages):
Sep ?5 09:30:02 vy1rds-gw2 kernel: g2_lh[126886]: segfault at 448001 ip 00007f18eb159755 sp 00007ffc613cd4a8 error 4 in libc.so.6[7f18eb028000+175000] likely on CPU 0 (core 0, socket 0)
Sep ?5 09:30:02 vy1rds-gw2 kernel: Code: 0f b6 44 07 e0 29 c8 c5 f8 77 c3 66 2e 0f 1f 84 00 00 00 00 00 83 fa 01 76 7b 89 f8 09 f0 25 ff 0f 00 00 3d e0 0f 00 00 7f 2b <c5> fe 6f 16 c5 ed 74 17 c5 ? ? ? ? ?fd d7 c2 ff c0 c4 e2 68 f5 d0 0f 85 d2
Sep ?5 09:30:02 vy1rds-gw2 systemd[1]: Started Process Core Dump (PID 127638/UID 0).
Sep ?5 09:30:02 vy1rds-gw2 systemd-coredump[127639]: Resource limits disable core dumping for process 126886 (g2_lh).
Sep ?5 09:30:02 vy1rds-gw2 systemd-coredump[127639]: Process 126886 (g2_lh) of user 0 dumped core.
Shortly after, g2_link crashes (from /var/messages)
Sep ?5 09:30:52 vy1rds-gw2 kernel: g2_link[126542]: segfault at ae15ae9a ip 00000000f7ca84a3 sp 00000000fffc2d0c error 4 in libstdc++.so.6.0.29[f7c78000+123000] likely on CPU 2 (core 0, socket 0)
Sep ?5 09:30:52 vy1rds-gw2 kernel: Code: 66 90 66 90 66 90 90 f3 0f 1e fb 8b 54 24 04 8b 42 0c 85 c0 74 11 90 89 c2 8b 40 08 85 c0 75 f7 89 d0 c3 8d 74 26 00 8b 42 04 <3b> 50 0c 75 1b 8d b4 26 00 ? ? ? ? ?00 00 00 90 89 c2 8b 40 04 39 50 0c 74
Sep ?5 09:30:52 vy1rds-gw2 systemd[1]: Started Process Core Dump (PID 127773/UID 0).
Sep ?5 09:30:52 vy1rds-gw2 systemd-coredump[127774]: Resource limits disable core dumping for process 126542 (g2_link).
Sep ?5 09:30:52 vy1rds-gw2 systemd-coredump[127774]: Process 126542 (g2_link) of user 0 dumped core.
In /var/log/g2_link, this is the error that happens right before the g2_link segfault:
??????090524 at 09:30:52:call=1NFO ? ? timeout, removing 10.0.0.2, users=0
It seems to me, that g2_lh is crashing, which then has the cascading effect into g2_link. While I have the c code for g2_lh which I can try to debug as time permits, I do not have the code for g2_link, so I cant look into that side.
It has been a long?time since I analyzed core dumps, but I am willing to try. If anyone wants to see the core dumps to analyze, I will happily enable core dumping for these services and send them over for their analysis.
Rob, VY1RG
|
I've been following this thread pretty much since it's inception. We've got an RF issue on our system (so much RF from nearby transmitters on the mountain that it's effectively deaf) so I'm in no hurry. G3.2/Alma 9 is all fine and been running since June on a SFF PC, just no RF fun. Sad, too, as the transmitter can be heard *every*where (it's connected to REF069C most of the time). Plus, other things at the station are keeping be busy: this email is coming to you from our new Netgate/pfSense box that I'm configuring because we have fiber now and need to up our game, and the KiwiSDR is working now ... and we're probably going to do an APRS node ... I would have built the g2_link stuff (needs a new name!) already but I really don't want to install the entire development suite on the gateway. It's bad enough that I made the mistake of installing the GUI stuff (I guess it's Wayland now and not X any more?) and I don't want to install tons of new packages that will be needed once. Anyone done any cross compiling? Say from Alma 9 installed under WSL??? ORRRRRR ... maybe someone could spin up an RPM that has all the magic in it ... hint hint hint. :-) Back to the new FW, etc, box ... And thank you, all of you, who are working on this. Great project. Peter On Thu, Oct 24, 2024 at 10:42?PM Robert Gillis via groups.io <robert@...> wrote: Hi Rich,
My apologies for not replying sooner, I only discovered this email tonight (I filter my groups.io email to a separate folder, and I have been quite busy lately and haven't had a chance to go through them until now. Work has a way to get in the way of my hobbies it seems ? ).
Have you already got this work done as you describe below? I think rebranding g2_link as g3_link is not a bad idea and would be more reflective that it works with the g3 system. I believe in another email thread in the past you--or perhaps it was someone else--suggested that we advocate for creating a GitHub repository with the source code. I agree it would help with current and future development. While I understand the original developer of g2_link did not want the code shared publicly, which I can (and will) respect, I can't help but wonder if they would consent to having it on a private GitHub repository? or perhaps their views have changed and don't mind making it open source?
I have since cleaned up my documentation of typos and made a couple of very minor tweaks for the work I have done. I was planning to re-package and update the file on my website along with re-sending the instructions (again, only minor edits) to this group—but I cant help but wonder if I should start a GitHub repository for the work I have done (with the intent of it becoming a future home for g2_link (er, g3_link?) development. I suspect there would be no concern uploading the code for g2_lh to this repository, as the code for it can be downloaded openly now (and it is complied as part of the g2_link install). Regardless, I would want to track down the original developer of g2_lh to confirm.
Thanks,
Rob
________________________________ From: [email protected] <[email protected]> on behalf of Rich Painter <painterengr@...> Sent: October 16, 2024 12:36 To: [email protected] <[email protected]> Subject: Re: [g2-link] g2_link + AlmaLinux - almost there! - challenge with g2_lh
Rob,
I'm just starting my g3 rework for g2-link and g2-lh on almalunux 9.
I am preparing a server to go up on Methodist Mountain (10,713 ft) KD0QPG with the latest 9.4 and g3.20 for a trip there this weekend... before the snow gets too deep!
I have al 9.4 installed and configured with my usual aids (webmin, postfix, syslog-mailer, etc).
Just got the g3.20 minutes ago and will load in the next hours. then g2-lh and g2-link.... I'm thinking of relabeling them as g3-lh and g3-link so I dont have to try and deal with the centos history.
what do you think?
regards
rich
On 9/5/24 6:21 PM, Robert Gillis via groups.io wrote:
Hi Rich,
Based on the log output, I am using g2_link 4.00 binaries, which I sourced from . However.... I did find some time today to do some debugging. With a little help from my old friend gdb, I was able to find the condition that causes this segmentation fault in g2_lh, and I coded a simple handler to handle the condition (instead of letting it crash when this condition is met—which in turn also keeps g2_link up). The system has been steady ever since I applied my patch to g2_lh and I even just had a short QSO with KA8SCP via my repeater / g2_link. While I will still need to do some more debugging to understand why this condition is happening in the first place, I am happy with the patch I created for g2_lh.
I will go though my documentation that I created while going through this exercise tonight to clean it up a bit, and if time permits, I will send it out to this group for feedback / incorporation into other sources/official documentation.
Live dashboard of upgraded system here: . I still have a bit of customization to do on the registration page, etc., but fully operational.
Rob
________________________________ From: [email protected] <[email protected]> on behalf of Rich Painter <painterengr@...> Sent: September 5, 2024 12:59 To: [email protected] <[email protected]> Subject: Re: [g2-link] g2_link + AlmaLinux - almost there! - challenge with g2_lh
what is the source of your g2-link and what is the version number?
rich
On 9/5/24 11:05 AM, Robert Gillis via groups.io wrote:
Hi There,
so close....
I have been working away getting g2_link running on AlmaLinux alongside DPlus G3 version 3.2, and am getting pretty close. I have g2_link and dtmf_reader converted to systemd services, and have them both running successfully (wahoo). However, it seems I am running into one last hurdle and I am hoping someone with a bit more recent Linux development experience can help me out with. This is something I can tinker with in my spare time, but I don't have much spare time lately. If there are any C++ / Linux development guru's out there that have experience I have no doubt they can figure it out much quicker than I can.
The issue I am running into is that the g2_lh program (which is executed frequently to build the dashboard) will sometimes crash, causing g2_link to crash as described below. I have the systemd service I created to restart the service when it crashes, but of course if there was a pre-established link it drops and will need to be reestablished.
Here are the details of what is happening:
sometimes, when g2_lh executes, it crashes it core-dumps with the following (from /var/messages): Sep 5 09:30:02 vy1rds-gw2 kernel: g2_lh[126886]: segfault at 448001 ip 00007f18eb159755 sp 00007ffc613cd4a8 error 4 in libc.so.6[7f18eb028000+175000] likely on CPU 0 (core 0, socket 0) Sep 5 09:30:02 vy1rds-gw2 kernel: Code: 0f b6 44 07 e0 29 c8 c5 f8 77 c3 66 2e 0f 1f 84 00 00 00 00 00 83 fa 01 76 7b 89 f8 09 f0 25 ff 0f 00 00 3d e0 0f 00 00 7f 2b <c5> fe 6f 16 c5 ed 74 17 c5 fd d7 c2 ff c0 c4 e2 68 f5 d0 0f 85 d2 Sep 5 09:30:02 vy1rds-gw2 systemd[1]: Started Process Core Dump (PID 127638/UID 0). Sep 5 09:30:02 vy1rds-gw2 systemd-coredump[127639]: Resource limits disable core dumping for process 126886 (g2_lh). Sep 5 09:30:02 vy1rds-gw2 systemd-coredump[127639]: Process 126886 (g2_lh) of user 0 dumped core. Sep 5 09:30:02 vy1rds-gw2 systemd[1]: systemd-coredump@...: Deactivated successfully.
Shortly after, g2_link crashes (from /var/messages) Sep 5 09:30:52 vy1rds-gw2 kernel: g2_link[126542]: segfault at ae15ae9a ip 00000000f7ca84a3 sp 00000000fffc2d0c error 4 in libstdc++.so.6.0.29[f7c78000+123000] likely on CPU 2 (core 0, socket 0) Sep 5 09:30:52 vy1rds-gw2 kernel: Code: 66 90 66 90 66 90 90 f3 0f 1e fb 8b 54 24 04 8b 42 0c 85 c0 74 11 90 89 c2 8b 40 08 85 c0 75 f7 89 d0 c3 8d 74 26 00 8b 42 04 <3b> 50 0c 75 1b 8d b4 26 00 00 00 00 90 89 c2 8b 40 04 39 50 0c 74 Sep 5 09:30:52 vy1rds-gw2 systemd[1]: Started Process Core Dump (PID 127773/UID 0). Sep 5 09:30:52 vy1rds-gw2 systemd-coredump[127774]: Resource limits disable core dumping for process 126542 (g2_link). Sep 5 09:30:52 vy1rds-gw2 systemd-coredump[127774]: Process 126542 (g2_link) of user 0 dumped core. Sep 5 09:30:52 vy1rds-gw2 systemd[1]: systemd-coredump@...: Deactivated successfully.
In /var/log/g2_link, this is the error that happens right before the g2_link segfault: ??????090524 at 09:30:52:call=1NFO timeout, removing 10.0.0.2, users=0
It seems to me, that g2_lh is crashing, which then has the cascading effect into g2_link. While I have the c code for g2_lh which I can try to debug as time permits, I do not have the code for g2_link, so I cant look into that side.
It has been a long time since I analyzed core dumps, but I am willing to try. If anyone wants to see the core dumps to analyze, I will happily enable core dumping for these services and send them over for their analysis.
Rob, VY1RG
-- Peter Laws | VE[23]UWY / N5UWY | plaws0 gmail com | Travel by Train!
|
| |