My experience with d6.0 under Hercules is that the system can stay up about two weeks before it will crash. The crash takes the form of a super-dump and is triggered by someone trying to sign into an already existing terminal line task.
?
So, if I sign off of that terminal and try to sign back in to any user, it will cause a super-dump. ??I never leave an instance up more than 10 days without reloading.
?
Sounds like Tom’s fixes were in code released after d6.0. Not sure if the issue he fixed is causing my issues.
On Sun, Feb 11, 2024 at 08:44 AM, Thomas Valerio wrote:
toggle quoted message
Show quoted text
Douglas Wade wrote:
> Off topic I know, but running MTS under Hercules and getting a week of uptime probably pretty much matches the real mainframe experience. At UBC the maximum uptime was something like 7 days when the system shutdown to clients? for a total (weekly) file save followed by an IPL.
I am commenting on an ancient thread (from last April), I thought I had re-registered for this group when it switched to groups.io but apparently I did not.? I am really only responding to the comment made by Douglas Wade, (and only for historical completeness) but if you want to see the entire thread see the link at the very bottom.
tl;dr: There is no longer any time limit (that I am aware of) on how long MTS/UMMPS can run without experiencing a non-recoverable program interrupt while in supervisor state.? Just to be clear though, this is with regard to weeks and/or months of up time, not years and/or decades.
If I recall correctly, I am pretty sure that at U of M we did not routinely re-ipl after filesave every week, so in the early 1990's when there was somewhat less development work going on, the system would generally stay up for multiple weeks at a time.? There was however a supervisor intertask code bug that would cause a crash approximately 2,147,483,647 (2^31 - 1) milliseconds, or 24.85 days after ipl due to an uncaught fixed point divide overflow in the supervisor intertask module.? This bug obviously also exists in the supervisor that was distributed as part of D6.0. ?
In September of 1993, I made a quick fix to the code and changed the variable that caused the fixed point divide exception from a 32 bit signed integer to a 32 bit *unsigned* integer, which obviously doubled the possible system up time from 24.85 days to 49.71 days.? After MTS was shutdown for general use at U of M, it continued to be run on a couple of Flex-ES systems as well as obviously on Hercules.? With even less of a reason to reboot the system on any kind of regular schedule, in August of 2015 I coded a more permanent fix for the problem (I think, I haven't done any 2042 TOD clock rollover testing (and I doubt anyone else has either)).
So as of now (if the patch is applied) the supervisor should be able to run indefinitely.? There is one remaining bug in MTS related to this though, if a task that is started at IPL, is left unused, if the system is shutdown after it has been up more than 49.71 days, or an attempt is made to use the task, the unused task will snark.? Any freshly started tasks are unaffected.
?? Thomas Valerio
?
?? /g/H390-MTS/topic/which_hercules_version_for/98115982?p=,,,20,0,0,0::recentpostdate/sticky,,,20,2,0,98115982,previd%3D1697596044835478369,nextid%3D1672385293156727066&previd=1697596044835478369&nextid=1672385293156727066