First-Hand:Measurement in Early Software
Measurement in Early Software
Submitted by Robert L. Patrick, Atascadero, CA
Some think that the architecture of major pieces of software was created through the combined intuition of a series of near geniuses cloistered in a conference room at the moment of creation. While this scenario may have occurred, an engineering approach yielded a much more successful product.
When the Card Programmed Calculator was popular in the late 1940s (700 systems manufactured) it was driven by the pace of its punched card devices (100 cards per minute). When used for engineering work (its dominant use) it suffered whenever the program encountered a transcendental function that was expressed as an iteration. In this case, the feeding of cards would stop while the compute element ground away trying to resolve the iteration. Thus for this class of work it did not execute 100 commands/minute.
In 1949, Cecil Hastings of the Rand Corporation devised a series of polynomial curve fits for popular mathematical functions that allowed a couple of multiplies and a few adds to provide the required values without interrupting the pace of the card reader. This may have been the first instance of a soft change correcting a hardware limitation.[1]
In 1953, the IBM 701 was delivered in kit form: several boxes of hardware and a few manuals. Computer sessions were scheduled as blocks of time, were seldom used efficiently, and there was always unused idle time between scheduled sessions. There were backlogs of work to be programmed and tested that competed with backlogs of production runs to be made. There was a shortage of computer time. There were only about two-dozen commercial mainframes in the entire country.
Convair (Ft. Worth, TX) installed IBM 701 #7 (out of the 19 that were built). In addition to my work as an application programmer, I studied the work of industrial engineering pioneer Henry L. Gantt (1861-1919). He was the father of the time-and-motion-study. I made some measurements, ran some throughput experiments, and had some ideas about more efficient computer operations. After I moved to General Motors Research (they installed 701 #17), I produced the conceptual design for a non-stop multi-user operating system as part of GMR’s plan for the installation of an IBM 704. My design was presented at SHARE[2], and resulted in a joint operating system development project between GMR and North American Aviation. George Ryckman led the GM team and Owen Mock led the NAA team. When in full operation the number of jobs run in an hour increased by a factor of 10.
To digress a moment: The number of jobs run in an hour is a function of the characteristics of the jobs in an individual shop. If a shop ran only trajectory calculations, as some of the early non-production computers did, you could start a calculation and it would run for hours (or until the unreliable equipment of the day failed). However, if you were supporting a collection of programmers, most of your day-shift jobs were short and consisted of an assembly/compile and a set of test data. A typical engineering-scientific shop would support their programming staff during the day shift and schedule heavy production at night. Compile (or assemble) and Go proved to be popular with all of our development programmers after they had achieved a clean-compile.
From the measurements of work running on an IBM 701, I found several problems that needed to be addressed if we were to realize the full potential of the equipment we were renting. In 1953, the primary challenges were:
- The operational chaos that accompanied allowing each programmer full use of all facilities as his/her creativity might dictate. The answer was programming standards.
- The inability of a professional operator to help very much when every programmer kept his own counsel. The answer was formal operational documentation.
- The lost time between jobs when each job ran singly. The answer was to batch jobs and run an entire batch without human intervention.
- The lost time when a job crashed (due to program or machine error) and the programmer had not provided for restart. The answer was a restart standard for all long jobs.
- When a job failed in execution, how to provide meaningful feedback to the programmer so the job did not crash on subsequent runs. The answer was a core map available to the operating system so core could be printed in a way useful to the programmer.
- The mystery of mainframe activities to line management who needed information to measure service levels and to justify upgrades. The answer was management reports and advisory customer bills from machine-captured accounting data. (For this answer, Ryckman built a time-of-day clock which the machine could interrogate.)
- The disruption to the schedule when a programmer insisted on being present when his/her job was run. The answer was professional operators and programmer-present operation on second shift.
- The lower programming productivity that resulted from direct programmer job submittal. The answer was hourly desk-to-desk courier service.
- The lower productivity that followed jobs in the latter states of checkout, or in production, from being unable to combine job steps and/or treat multiple sets of case data in a single run. The answer was control cards embedded in the job stream so complex runs could be set up at a programmer’s desk and executed automatically by the system.
All of these features were offered by the GM-NAA operating system for the 704 in 1957. It should be clear that an Operating System was more than software. It included programming, documentation, and operational standards; job descriptions for both programmers and operators; a master schedule for routine work; information for computer center management; and, of course, operational software to tie all of this together.
The management statistics allowed operational management to predict when the workload would exceed the machine capacity. If a system was going to get saturated, center management had to take action or (if temporary) take the heat. There were some loaded systems that ran late all week and caught up only on the weekends when no further jobs were being submitted.
The 704 had unbuffered I-O so most card reading and printing was done offline using special machines that IBM had developed for their commercial customers. If system throughput was being reduced by tape speed, IBM offered a spectrum of tape drives with various speeds that offered better performance at more money. With the early systems, watching the console lights with a stopwatch in hand would tell whether the central processing unit was waiting for I-O or vice versa.
In later IBM systems (709, 7090) the customer could choose faster printers, faster tapes, and larger/faster memories. In most cases the faster units could just be plugged in. In the case of larger memories, programming changes to the software (but usually not the applications programs) were required to exploit the larger memory and get additional throughput.
The OS software following the GM system was SOS (709) and IBSYS (7090). With all of these, the customer employed a big crew of system programmers as the software was hard to install, troubleshoot, and tune. It took knowledgeable system programmers to get optimum use from large mainframes.
The tape batch systems were efficient, but the batch concept was onerous to some who wanted priority service. A batch shop froze priorities when the input tape was made up. Once the tape was made, priorities were ignored and efficiency was the watchword. This was a serious limitation in some shops. In 1963 IBM announced a reliable disk drive which would hold the input stream and allow non-sequential job access.
Aerospace Corporation (El Segundo, CA) had a 7090 installed and was giving efficient batch service. But they also had some high-priority customers who could not get acceptable service without destroying the production schedule when they usurped the whole machine to support a space launch.
At about this time IBM announced the 7040 computer. The 7040 was almost compatible with the 7090, was slower, was cheaper, and also ran a version of IBSYS. Pricewise it looked as if we could get more throughput without spending more rental dollars by using the 7040 as a front-end machine. These discussions led to the development of the Direct Couple extension to IBSYS, and eventually HASP (Houston Attached Support Processor) and ASP (Attached Support Processor) for the S/360.
To estimate the throughput improvements we could get on our 7090, running our workload, if all buffers and device drivers were moved from the mainframe to the support computer; we measured our workload at the macro level. Richard Van Vranken was lead systems programmer. When we needed statistics on each job, he and his systems programming team quickly modified IBSYS to produce overlap and interlock data. We analyzed these measurements, issued an engineering performance report,[3] and decided an attached processor to handle the I-O would increase the number of jobs/day without increased rental; plus improving our service levels. Further, we freed up high-performance core when we offloaded routine I-O processing onto the slower 7040 system.
So we undertook an extension to the IBSYS software to put the input and output streams on a disk attached to the 7040. The system we ended up with had the following attributes:
- It first ran on a 7040-7090 coupled in a Jr. - Sr. configuration.
- To the application programmers, it was still standard IBSYS.
- Eventually it supported remote entry/output stations around the campus with card readers and 300 line/minute printers (limited Remote Job Entry).
- It queued incoming jobs on disk.
- It queued output on disk.
- Some production jobs were retained on disk and could be called for execution with just a control card.
- The machine micro-scheduled its work to respond to customer desires.
- Interleaving setup and non-setup jobs increased efficiency.
- Moving the I-O chore to a cheaper machine was economic.
In contrast with earlier systems, the scheduler in the Direct Couple at Aerospace extracted information from the control cards (JCL) as the job was entered and built a list of work to do. If the job had an ultra high priority and the customer was waiting, it placed the job on top of the queue, did not start any more compute or print jobs, and ran that job next. Fortunately that did not happen often because it disrupted the rhythm of the shop and introduced great inefficiencies. However, if numbers were needed to support an immediate space launch, the system would honor such requests.
When a job terminated on the mainframe the next ready job was dispatched in priority order. In parallel, tapes were requested on a library console printer and assigned to drives. When the tapes were hung, the job was added to the ready queue. Thus, short, no-tape jobs of lower priority, could be run to maintain machine room efficiency while tapes were being extracted from the library.
Output for printing was queued on disk. When a printer was available, the top priority print job was assigned. Short unclassified printout could be printed remotely on request from a console next to each remote printer. Long print jobs and classified jobs were always printed at the center.
When the day-shift went home, the operator at the master computer console informed the system and the system scheduled differently during nights, holidays, and weekends. During the off-shift, the system essentially ignored external priorities and scheduled for maximum efficiency. The goal was to have all queues go empty at the same time, so the system could be shut down or be turned over to IBM for maintenance or used by systems programmers for further development. Thus long print jobs were run first, and short jobs with little print were run last.
After our new software was complete, we measured what we had achieved.[4] Then we looked for other inefficiencies we needed to attack. One was the line printer service. There were three 1100 line/minute printers attached. Sometimes stacks of finished output built up. This occurred even though there was an end-of job sheet with 3” letters that identified the place to break the web of paper between customers. We sought ways to eliminate this ad hoc batching and make the task of minding a printer less arduous.
The high-speed chain printers were located near the supply of boxed paper so the source was near the point of need. Each printer was fitted with a slide so finished output could be placed on the slide without the operator walking a step. Each slide led to a rubber conveyor belt (like a long supermarket checkout belt). This belt ran behind all three printers and deposited the output at a workstation where the classified jobs were held for pickup by the submitter or his cleared representative, and the unclassified output was placed in pigeon holes for pickup by friends or the submitter himself.
The print team and the tape team exchanged places every two hours to avoid physical fatigue. The tape drives were placed close to the tape vault door and a console printer told the tape crew what tapes to pull and where to hang them.
Later the 7090 was replaced by a 7094 (faster and completely compatible) and the 7040 was replaced by a 7044 (ditto). The computers at Aerospace ran this way until replaced by System/360s.
The Direct Couple software package traveled around the aerospace industry and was frequently modified to match its performance to the workload of each individual company. One 7044-7094 was so efficient that when it was replaced by a System/360 Model 40-65 combination, the new 360 configuration could not run 24 hours of work in a day. (OS/360 had to be tuned to the workload before the 7000 system could be retired.)
IBM made the direct couple software into a product (HASP) and offered it to 7090 customers. Later, on the 360s, the same package was called ASP (Attached Support Processor).
When IBM set out to design the senior operating system for their family of System/360 computers, they assembled a team of top-notch designers. In addition to their frequent transfers of personnel between the field and their design labs, IBM had a vast intelligence network that reported ideas from the field. As a result OS/360 was an outstanding collection of the best software features available.
Tom Apple was part of the OS/360 development project. He was responsible for collecting statistics and performance information and then supplying it to the development teams. As a result, they could predict and repair performance problems before they reached the field. Their senior operating system shipped with the System/360 model 65s in 1965.
After the System/360s were in the field, there were enough software glitches to produce a steady stream of software changes. The process of producing these changes at the Lab was complex, time consuming, and expensive. The process of applying these changes in the field was complex, time consuming, and expensive. The work in IBM’s development labs and the work at the customer site were both prone to error.
Many big shops ran production on a rigid schedule with manufacturing or accounting processes dependent on timely outcome. The system programmers in these shops collected any production jobs and their data that always seemed to cause trouble (perhaps because they taxed the computing system so much), and used them as a validation suite.
After a hardware reconfiguration or a significant software change, this test suite would be run. After a successful run the accounting statistics would be checked to verify that all job steps were executed and that the execution times were as expected. Then the print files just produced would be automatically compared with a master output data set to be sure all the computations had been done correctly.
In 1968 when we were finishing up the development of IMS/360 (Information Management System - a big data base-network system) at the Space Division of Rockwell (Downey, CA), we needed to know how many transactions per hour a fully loaded system could handle. IMS/360 was a new innovation and there was no relevant experience to use as a guide.
Since Space Division had a large computer center which contained two S/360-65s, we designed a measurement experiment to produce the information we needed for planning purposes. We gathered a series of our test cases and wrote a driver so one Mod65 could send these transactions to the test machine in a realistic manner.
The other Mod65 was configured for production with our newly developed software. Further, its files were loaded with real data so the file organization and sizes were realistic. Then we connected the channels of the two machines, simulated the network with one machine, and drove the machine under test. The results showed our early software could handle 4000 transactions per hour. Then we wrote another report.[5]
References
- ↑ Google: Cecil Hastings, Or “Annals of the History of Computing”, Vol. 2 #3 July 1980 (Gruenberger).
- ↑ Google: Publications Department, Rand Corporation P-7316, “General Motors/North American Monitor for the IBM 704 Computer”, Or National Computer Conference Proceedings, Chicago, June 1987
- ↑ “Performance of a Scientific 7090”, Aerospace Corporation, September 1963.
- ↑ “Performance of a Scientific 7090 Computer System”, Aerospace Corporation, March 1966.
- ↑ “Performance of a S/360”, Rockwell Space Division, September 1968.
About the Author
Robert L. Patrick was an independent freelance consultant for 33 years. He specialized in applications systems and operations management. He also was a designer/architect/team member on six major software systems: An Operating System for the IBM 704, 1956; a Compiler for the H-800, 1959; The Direct Couple for the 7040-7090, 1963; System/360, 1964; IMS/360, 1968; and a custom DB system at Rand, 1968. During his career he worked for 121 different organizations in the U.S. and Europe. Patrick holds: a BS in Mechanical Engineering, University of Nevada, 1951.