Hi Brendan,
A process monitor is what you¡¯re looking for. ?There are some that will also accept getting a heartbeat from a monitored program and if not received it will restart, etc.
I did a quick search for ¡°python process monitor¡± and came up with a stackoverflow answer:
I personally use PM2 for node.js and any other executable/shell I want to monitor (this is under linux).
toggle quoted message
Show quoted text
On Dec 14, 2023, at 2:44 PM, Brendan Lydon <
blydon12@...> wrote:
Hi,
I have been running an algorithm from a main.exe in a .sh file, pre-compiled. I have python processes I do following the end of the day to get statistics and what not on my models and trades after the successful completion of my main.exe. What I have noticed is oftentimes maybe once per day at random my algorithm will stall and not continue forward in my main while loop. So, I took the liberty of adding a heartbeat to my main loop and am writing to a heartbeat.sts file in my repo every 10 seconds or so (playing with optimal time still). My bash script then sits in a loop the entire time main.exe runs and pulls the latest timestamp from the heartbeat.sts file and gets a time delta from the current time to the latest pulse from the program and will terminate the current main.exe PID and restart a new one if we have not seen a pulse in a while. This loop does this all day as long as my main.exe does not exit gracefully. If it does exit gracefully (@ 4PM EST), it breaks the loop and continues to my EOD statistics. So, my question is has anyone else built something like this "Watchdog" to deal with stalls & reduce downtime? I have made a lot of adjustments with my system logs and initializing of the project when I startup to accomodate this. Also, filling data gaps, etc to accommodate this. I don't mind writing a solution to something and not ever having it execute.
So, I guess I wanted to see if anyone had thoughts or experiences with these types of scripts, where you are constantly keeping things up and running and never being offline. Side note, with all due respect, please refrain from responses like "why is your program stalling? Are you doing sleeps? You need to rethink your design." Firstly, the answer is no to those first 2 questions, and sometimes things just stall. Its not a connection problem, its a computer science problem. Also, I am very pleased with my current and final design...
Thanks