How Is Critical 'Life or Death' Software Tested?
Ideally, it's so good that it barely needs to be in the first place.
Image: Steve Jurvetson/Flickr
Your average scripter likely isn't writing a whole lot of proofs or going through the rigors of formal program verification, generally. Which is fine because your average scripter also isn't writing software for jet airliners or nuclear power plants or robotic surgeons. But somebody is—and the odds are pretty good that your life has been in their hands very recently. How do you know they're not a complete hack?
Well, you don't really. Which prompts the question: How is this sort of code tested?
It was a short blog post written by Gene Spafford, a professor of computer science at Purdue University, that inspired this particular asking of the question. It relays an anecdote about the origins of fly-by-wire cockpits—in which aircraft flight surfaces and whatnot are manipulated electronically rather than directly—in the late-1980s. We're used to it now, but back then it was a huge leap: direct control was being taken away from pilots and put into the hands of software designed to communicate pilot commands to mechanical actuators. A computer could now crash a plane.
That fact put the people behind those computers into a hell of a position.
In the late 1980s, around the time the Airbus A340 was introduced (1991), those of us working in software engineering/safety used to exchange a (probably apocryphal) story. The story was about how the fly-by-wire avionics software on major commercial airliners was tested. According to the story, Airbus engineers employed the latest and greatest formal methods, and provided model checking and formal proofs of all of their avionics code. Meanwhile, according to the story, Boeing performed extensive design review and testing, and made all their software engineers fly on the first test flights. The general upshot of the story was that most of us (it seemed) felt more comfortable flying on Boeing aircraft. (It would be interesting to see if that would still be the majority opinion in the software engineering community.)
So, Boeing's programmers had this additional motivation of putting their lives in the hands of their own software on day-one. There would be no room for hotfixes at 30,000 feet—it just had to work. Which do you feel more comfortable with: the taking one's own medicine approach to software verification, or a bunch of formal proofs and simulations?
I'd prefer both really, but the question of how this stuff is tested is interesting in itself. A 2011 Stack Exchange thread provides a lot of good insight into the process and its evolution. For one thing, the Boeing approach is going out of style or has mostly gone out of style, according to SE poster Uri Dekel (handle: Uri), a Google software engineer.
"There's a serious move towards formal verification rather than random functional testing," he writes. "Government agencies like NASA and some defense organizations are spending more and more on these technologies. They're still a PITA [pain in the ass] for the average programmer, but they're often more effective at testing critical systems."
Scott Whitlock, a software engineer who maintains the home automation-focused software library FluentDwelling, goes deeper, explaining the design requirements for safety relays, which are systems in industrial applications designed to monitor equipment warning signals and shut down the implicated equipment if necessary. They include, according to Whitlock, "Two redundant processors, typically sourced from different vendors, based on different designs. The code running on each processor has to be developed by two teams working in isolated conditions. The output of both processors has to agree or else the safety relay faults."
"Now once you've designed your safety system, the logic you write to control the machine itself can be very seat-of-your-pants," Whitelock adds. "Programmers frequently crash machines causing thousands of dollars of damage, but at least nobody's going to be injured."
How about manned spacecraft? In terms of criticality, a spacecraft is like an airliner crossed with a life-support system crossed with the hardest math problems in aeronautics.
"This software never crashes," wrote Fast Company's Charles Fishman in "They Write the Right Stuff." "It never needs to be re-booted. This software is bug-free. It is perfect, as perfect as human beings have achieved. Consider these stats: the last three versions of the program—each 420,000 lines long-had just one error each. The last 11 versions of this software had a total of 17 errors. Commercial programs of equivalent complexity would have 5,000 errors."
How do NASA's programmers pull that off? Well, mostly because they have to. The smallest bug will put billions of dollars and lives at risk. All in the brightest of public spotlights.
"Before every flight, Ted Keller, the senior technical manager of the on-board shuttle group, flies to Florida where he signs a document certifying that the software will not endanger the shuttle," Fishman explains (the article predates the shuttle's demise, but still). "If Keller can't go, a formal line of succession dictates who can sign in his place."
As for the code itself, its perfection came as the result of basically the opposite of every trope normally assigned to "coder." Creativity in the shuttle group was discouraged; shifts were nine-to-five; code hotshots and superstars were not tolerated; over half of the team consisted of women; debugging barely existed because mistakes were the rarest of occurrences. Programming was the product not of coders and engineers, but of the Process.
"The process is so pervasive, it gets the blame for any error," Fishman wrote, "if there is a flaw in the software, there must be something wrong with the way its being written, something that can be corrected." The software is perfect with no margin of error.
The development group is supposed to deliver completely error-free code, so perfect that the testers find no flaws at all. The testing group is supposed to pummel away at the code with flight scenarios and simulations that reveal as many flaws as possible. The result is what Tom Peterson calls "a friendly adversarial relationship." "They're in competition for who's going to find the errors," says Keller. "Sometimes they fight like cats and dogs. The developers want to catch all their own errors. The verifiers get mad, 'Hey, give it up! You're taking away from our time to test the software!'"
So, what do you think, future astronaut? The programmer that's flown into orbit under control of their own software, or the programmer reared in NASA's shuttle group?