Nerd Story – The Mystery of the Data Cell Drive
Daughter Kelly (that’s not her in the picture) gave me a one gigabyte USB Swiss Army Knife for Christmas. Of course one gig doesn’t take all that much space so it has blades, too. All this memory in a little package reminds me of the mystery of the IBM 2321 data cell drive, my first major nerdish consulting assignment.
This Rube Goldberg device was the state of the art of mass storage in the late 1960s. It could store a then-amazing 400 megabytes of information. The one I ran into was in use in a large department store in Brooklyn as part of their just-installed inhouse credit system written for them by IBM.
Data was recorded on magnetic strips which hung on strips in a tub. When a particular record was needed, the tub rotated with a series of thumps until the correct strip was positioned under some mechanical fingers. The fingers picked up the strip and wrapped it around a drum. Then a read/write head descended to hover over the drum and access the data. Finally, the strip was removed from the drum and put back onto its hanger. To encourage the strip to fall straight back into its slot, a breeze blew past it. On dry days when the strips had a high static charge, they often jammed.
IBM had replaced the early data cell drive in use at the department store number of times and installed more humidifier capacity in the computer room. The mechanics appeared to be working for the minute but there was still a terrible problem and this is what I had been hired to find.
Every once in a while, several times per day, credit card authorizing would slow to a snail’s pace; sometimes it would restart; sometimes it wouldn’t and all computers would have to be rebooted. In those days when a credit card was presented to a cashier, she (it was always she then) called downstairs to the authorization room on a phone at the cash register. She gave the account number and the amount of the purchase to a clerk (also always a women) at a 2260 monitor attached to the one of the two 360/40 computers which was attached to the data cell; the clerk then relayed the credit acceptance or denial to the cashier.
So, if the system slowed down, lines grew at the cash registers. If the lines got too long, customers left. To keep customers happy, the credit amount approvable without authorization was raised to reduce the number of calls; but store management estimated that, even in those days without cell phones, every credit card crook in NYC would be on his or her way to the store within about half an hour if the limit was high or authorization was suspended.
Oh, yeah, and this was only August. Holiday shopping season coming up; every indication that more volume meant more episodes of slow or non-existent credit processing. I accepted the assignment.
I pored over the listings of the computer programs, found lots of bugs but nothing that accounted for this strange behavior. I printed memory dumps of the 360/40s stopped at the time of occurrence. Everything seemed normal.
Now it’s September and my confidence is fading but I’m not showing it. I take to having lunch next to the data cell, still hoping to trap the problem out in the open. One day in the middle of a pastrami on rye (best in the world in Brooklyn) I noticed a pattern in the thump-thump-thump of the tubs when correlated with the women’s voices from the authorization room next door: five thumps, pause, five thumps, pause, repeated and the longer this pattern repeated, the more talking there was from the women.
So when do the women talk? Answer: they talk when they can’t work. They can’t work when the authorization process slows down or stops. But why only in a five thump pattern?
The answer is one only a nerd can love. There are two computers accessing the same data cell drive. In order to make good use of expensive computing power in the days before Moore’s Law made CPU power super-abundant, these machines would start a slow external operation like moving a data cell to the proper position for a pick –thump, thump, thump – and then go back to computing something else until the data cell got in position. When the proper strip had been picked, the controller of the data cell would “interrupt” the computer to get the specific read/write instruction.
Meanwhile, of course, the other computer might ask for something. It would get back an error from the controller saying that the data cell was busy with another request. Fine, the other computer would say, interrupt me when it’s free, and it would busy itself with other work.
Now the proper strip is plucked. Both computers get interrupted by the friendly controller. Turns out that there are a few less instructions to be executed by the second computer, the one that was waiting for the device to be available, then there are by the computer that made the original request. So the second computer now sends the tub thumping towards the strip it wants and the first computer gets an error message saying that the data cell is busy. Do you smell a loop?
Turns out that this condition only leads to significant delays in the symmetrical case when the strips the two computers want are halfway around the drum from each other; that’s why five thumps repeated is fatal. In other cases, there is fairly quick recovery.
As usual, the fix was easy when the problem was understood. Don’t understand why Mary says I have no sense of rhythm.
Comments