Keyboard Shortcuts
Likes
Search
MVS 3.8j (TK5) Crashes on Massive SORT
I've been playing with a SORT job to stress test my system.? It sorts 99,999,999 random 80 byte records.? The job works just fine if I am running with 1 CPU.? Run MVS with 2 CPUs and it typically locks up, clocks, and sometimes I have to clear SYS1.LOGREC after an IPL.
?
As far as I have read, MVS 3.8j ought to work fine with 2 CPUs but I've verified this enough to know that I can reliably reproduce the issue.
?
If anyone has any ideas and would like me to produce some debug data to sift through, please let me know.? In the meantime, I'll run MVS with 1 CPU.
? |
Hi Daniel, The S/M Program runs in Problem Program mode (Key 8) and is not an Authorized Program as it does not use any system services that require Authorization. The Program strictly follows the published APIs for Supervisor services and Data Management services. It does make extensive use of EXCP I/O to manage data flow to and from the Sort Work intermediate work data sets. When sufficient storage is available for double buffering of the Sort Work intermediate work data sets (as it would be in your case) then the S/M Program will drive the processor and the DASD for maximum utilization. You may have seen, with your sorting runs, that the processor time reported will be about 75% of the elapsed time demonstrating excellent overlap of processor and I/O processing. I suspect that, with such a high degree of processor and I/O concurrency, there is a bug in MVS 3.8 when it is running in multiprocessor mode, possibly in the area of scheduling concurrent SRBs given the high I/O activity from just one address space. Enabling multi processor mode in MVS 3.8 is not going to improve the performance of the S/M Program because it runs as a single task. The second processor would normally be idle, unless there was work available for another task, except when an I/O event completed and the second CPU was dispatched to run an SRB to post completion of the I/O event. The fact that SYS1.LOGREC is filling up (with error recording ?) is an indicator to me that the problems is within MVS 3.8 internals. IIRC, MVS 3.8 was not regarded as being particularly stable running in multi processor mode. It was the later PP extensions MVS/SE 1, MVS/SE 2 and MVS/SP that addressed many issues in this area. Regards Tom |
I will note that MVS 3.8 was the first release of OS/VS2 to support an attached processor or multiprocessor configuration, if I recall correctly.
?
Also, given that MVS running on Hercules today is probably running about 100 times faster than it ever ran on the real hardware, back when MVS 3.8 was "current" (e.g. on a 370 158 or 168 or similar).? So, it is certainly possible that there may be as yet undiscovered timing-related issues ("bugs") in that old code.
?
Tread lightly in this area.
? |
And designed for 2 processors. Next version was debugged for 3+ processors.
On Wed, Nov 20, 2024 at 9:36?PM Mark Waterbury via groups.io <mark.s.waterbury@...> wrote:
-- Mike A Schwab, Springfield IL USA Where do Forest Rangers go to get away from it all? |
I have always been suspicious of Hercules/Herculon in the area of multiprocessors. My rule of thumb is to run one cpu (the Gene Amdahl model). The whole interaction of MVS MP, Hercules support of MP, and how it interacts with the underlying hardware if it has multiple processors seems to me to be an unnecessary complication and everything has to work right. No bugs in MVS MP support (ya sure), no bugs in Hercules and its bolted on MP support, and no bugs in the underlying system (again - ya sure). I have run OSX and Linux and seen different types of strange problems pop up in both cases. Now throw in Apple Silicon and Power and Efficiency cores. Too much to mess with unless you have never spent a weekend in an MVS data center recovering a JES3 complex and want to see how that feels - I have (or used to).
The issues I have seen over the years seem to me to be an unnecessary complication. If I want to pump more work I just duplicate a UP and run either JES2 shared spool or JES3. Just my 2 cents worth. From an old grey beard. |
Yeah, I can see how you might be chary of testing the waters. MVS 3.8 was kinda iffy for MP/AP support, especially early on, but in the 1982-1983 timeframe, it had gotten pretty solid, at least on the 370/158AP that was my first systems job. I haven't had a problem in Hercules attributable to emulating MPs on SMP hosts in a couple of decades, myself, and I've run a pretty wide range of code on it. FWIW, JES2 shared spool was problematic longer than MVS MP/AP support... I never ran JES3 myself, except at one shop I worked at in the late 80s. Always struck me as too foreign. On Thu, Nov 21, 2024 at 12:29?PM Sterling Garwood via <slgarwood=[email protected]> wrote: I have always been suspicious of Hercules/Herculon in the area of multiprocessors. My rule of thumb is to run one cpu (the Gene Amdahl model). The whole interaction of MVS MP, Hercules support of MP, and how it interacts with the underlying hardware if it has multiple processors seems to me to be an unnecessary complication and everything has to work right. No bugs in MVS MP support (ya sure), no bugs in Hercules and its bolted on MP support,? and no bugs in the underlying system (again - ya sure). I have run OSX and Linux and seen different types of strange problems pop up in both cases. Now throw in Apple Silicon and Power and Efficiency cores. Too much to mess with unless you have never spent a weekend in an MVS data center recovering a JES3 complex and want to see how that feels - I have (or used to). --
Jay Maynard |
开云体育JES3 was a bit weird, but remember its heritage - West Coast ASP etc….originally 7094/7040-like system.HASP was a Houston child, My first systems programmer job was in a HASP shop then I moved to a JES3 shop as they were converting to MVS. Definitely grey hair creator!! HASP/JES2 was a true KISS tool. Simple, did everything well and an example of a clean well thought out piece of code. Simpson and Crabtree should be proud. I enjoyed using JES3 but it was a bear to maintain and the whole shared tape pool concept never worked well IMHO. Maybe my view is colored by some local mods that clobbered the JES3 spool every so often until one of the IBM guys said in effect “you can’t do that, you will cause spool destruction”. IBM loved ASP and JES3 since you basically had buy another system to run the dispatch and control functions. We had ISTR 3 370/168 systems in a triplex. Then a couple of 3033s then 3090s.
|
Later releases work well with multiple CPUs in Hercules (n-1),
On Thu, Nov 21, 2024 at 12:29?PM Sterling Garwood via groups.io <slgarwood@...> wrote:
-- Mike A Schwab, Springfield IL USA Where do Forest Rangers go to get away from it all? |
Since (until a month ago) my entire systems career was in Houston... And the shop I ran into JES3 at was Rockwell Shuttle Operations Corporation, a NASA/JSC contractor. I always thought it was borderline blasphemy that JSC, the site HASP was originally developed for, turned into a JES3 shop... On Thu, Nov 21, 2024 at 1:53?PM Sterling Garwood via <slgarwood=[email protected]> wrote:
--
Jay Maynard |