Slashdot Log In
Space Shuttle Software: Not For Hacks
from the nothing-too-much dept.
As someone who's done more than his share of late-nighters, it was an interesting view into the mission-critical environment. Maybe there are a few software firms out there that would rather spend some of their money on better processes rather than technical support engineers. Maybe a little more market research and a little less marketing, too. A good read."
These guys are "pretty thorough" the way Vlad the Impaler was "a little unbalanced." Still, you have to wonder how they can claim single-digit errors among thousands of lines of code, but I guess the proof is in the rocket-powered pudding. And lucky for them, their target platform was recently upgraded.
Can you imagine just how simple those things are.. (Score:4)
and how slowly they are being developed? I don't mean that it's a bad thing -- it's good that Shuttle program allows them to do it at reasonable pace and with reasonable requirements, but if everyone else wasn't under constant pressure, and if everyone's else software wasn't a victim of feature bloat, dealing with poorly documented and even worse implemented protocols, and never-ending stream of bullshit coming from the management, everyone else would write robust software, too. Well, not really everyone -- some "programmers" wouldn't be able to do anything because they have no skill, no education or are plain dumb, but reasonably geeky and educated programmer can pull something like that in ideal conditions -- and those guys _are_ working in ideal conditions.
The importance of documentation (Score:5)
If you have enough to do it poorly ..... (Score:3)
The problem with this arguement is that while many companies think that they can't afford to do it, what they really can't do is afford NOT to do it. Software is becoming more complex - it's the nature of the beast. For the most part, design is not; we are all still using procedures that were brought into being in 'dawn of computer age', with the exception of higher order languages and more focus on OO.
You are correct in that it may be expensive, THE FIRST TIME. This is called a 'learning curve' and the cost is amortized over the number of times you use this technique. You may also say that the process itself is expensive but that is incorrect, or at least only partially correct. The process allows errors to be caught EARLY, which reduces cost. Please don't tell me that you believe a code-compile-fix routing can catch these sorts of errors as early as a well thought out design.
Also, rigourous design allows for flexibility - this may sound contradictory but consider the use of design patterns. They are NOT things that can just be thrown into the code ad hoc; they require thought and intelligence. A good upfront design means the ability to use these tools. Consequently, use of these design patterns allows for a certain level of flexibility in statisfying the lower to medium level nasty customer requests, and certainly helps on the more egregious ones. Does a code now, look later approach allow this? (if you think so, I have this bridge I'd like to sell you ...)
In short, yes, using these techniques is expensive. But they also produce code that cuts development time (i.e., no stuck in debug/extra request phase for 2 years) and once people get used to the process, the extra cost/load is minimal.
Not just space shuttles. (Score:4)
Re:The importance of documentation (Score:5)
Likewise, i worked for a while on the signalling system for the Jubilee Line Extension for the London Underground.
Totally documentation driven. First there was the CRS (Customer Requirements Spec). - this then transformed via an SRS (Systems Requirement Spec.) into the FRS (Functional...) and the NFRS (Non-functional...). From these we had Software Design specs, Module Design Specs, Object Design Specs, Boundary Layer Design specs. in all there were around 4000 specification documents for the project, often at issue numbers well into the teens.
What really made the difference though, was not so much the existence of documentation, as the absolute insistence on traceability - every memeber function of every class in the whole system could be traced back to the Customer Requirement Spec, and every Requirement could be traced to its implementation. This meant - no chrome: everything in the spec was p[rovided, and nothing was provided that wasn't in the spec.
Also worth noting that: the whole thing was in ADA95. The compiler was very carefully chosen. Coding standards were tight, and tightly enforced - function point analysis was king - anything with more that 7 function points was OUT, simple as that. Every change to anything, however small, required an inspection meeting before and after implementation, with specialits from every part of the system which could be impacted, plus one of the two people with a general overview. Then there were the two independent test teams and the validation team.
Ye Gods it got tedious, no denying that. But in a situation where lives depended on good software...
Now I probably apply only a tiny fraction of what I learned, but when I decide to ignore part of the methodology, at least I know I'm ignoring it. And I'm aware of what I'm missing.
In short - learn about the safety-critical approach. Ditch most of it as excess baggage by all means - it's often simply not justifiable. But be aware of the choices you're making.
TomV
Higher Quality != Higher Cost/Time! (Score:3)
But then the Japanese came along with a radical new idea: if there are defective parts coming down the line, then we should figure out why they were created defectively in the first place and fix that. Then the number of defective parts at the end of the line would be less, thus you would need *fewer* inspectors and *less* time at the end of the assembly line. (Ironically, this principle came from an American named Edward Deming; unfortunately American companies were too successful during his lifetime for them to take him seriously :-) So the Japanese were able to build cheaper cars quicker than the Americans while actually having higher quality.
I think that's very analogous to the current argument. Under the current system of coding, you basically hack together something that sorta works, and then use sophisticated debuggers/development tools to figure out which parts are buggy. Using that system, it's true that higher quality requires more cost and time.
But I think the point of this article is that that is the wrong way to approach programming. First, figure out why defective code gets written in the first place (be it poor client specifications, poor management, poor documentation, whatever) and then fix those processes, and you'll turn out quality code without having to spend any more time or money!
As a practical example, I first learned C under a CompSci Ph.D. who was a quality fanatic. In order to teach me to code properly, he would give me projects and then not allow me to use a debugger. Nothing at all. Zilch. Nada. The only thing I was allowed was to place print statements within my program wherever I wanted to see what was going on. As a result, I spend *a lot* of my time planning my code out, and reviewing it over and over again before even compiling it, because I knew that if there were bugs in it, I couldn't just fire up a debugger and take a look.
And secondly, if there were bugs, I couldn't just trace through the entire program or create a watch list of every variable. I had to study the bug and understand it, look at the code and figure out where the bug most likely was, and then use selective print statements to look at the most suspicious stuff. That way, when I encountered bugs, I'd be forced to actually understand what the bug was and then analyze my code to figure out where that error most likely was.
If this sounds like a programming class from hell, believe me, it was incredible! I couldn't believe how much of my code worked the first time it compiled. And when there were bugs, I actually fixed the underlying flaw in the logic rather than just applying a temporary patch. What's more, since the rest of my program was well planned and documented, there were no "hidden" effects: if I found a bug, I knew exactly which parts of the program it affected, and perhaps more importantly, *how* it affected those parts. Thus they were very easy to fix.
Believe it or not, it took me less time to program this way than using debuggers, and the resulting code was much more stable and understood.
If you look at commercial software these days, it's not uncommon for the debugging period to take longer than the actual coding. In other words, there are more quality inspectors than there are assembly workers, and the time the code spends in inspection stations is longer than it spends being produced. It's tough to say that this is the "efficient" method of programming...
If you want to see where this is heading, just turn once again to the car industry: once American companies got their asses kicked by the Japanese, they adopted their techniques, and Surprise! Cars now come out of their factories with higher quality, in less time, and at less cost (adjusted for inflation and new features :-). Who would've believed it? :-)
"No Pizza" is good (Score:3)
That's because pizza-and-coke all-nighters are a direct byproduct of poor planning, either by the engineer implementing the code, the architect creating the design (if there even is such a person) or the person making the engineer's schedule. And the result is usually hastily written, incompletely tested software that is typical of most product offerings for use on the desktop.
The process of authoring mission critical, man rated software is so far removed from the ad hoc, informal, duct-tape-it-together approach that most programmers use that no direct comparison can be made. I've seen both ends of the software development spectrum and they each have their uses. You can't launch a shuttle with a bunch of last minute kernel patches and some stuff that was written the night before the launch date. But you can't compete in the commercial software marketplace with code that takes 2 or 3 years to specify, design, implement, test, and integrate, either.
Stand in awe of the people who have the skill and discipline to write software of this quality. Learn what you can from their process and try and use the lessons they've learned. Their stuff doesn't break, because when it does, people die. If O/S developers had that same attitude about their code, we'd never see blue screens of death, kernel panics, or any of the other flakiness we tolerate on our desktop machines.
Somewhat bogus article (Score:5)
Here's NASA's own history [nasa.gov] on bugs in that software:
- So, despite the well-planned and well-manned verification effort, software bugs exist. Part of the reason is the complexity of the real-time system, and part is because, as one IBM manager said, "we didn't do it up front enough," the "it" being thinking through the program logic and verification schemes. Aware that effort expended at the early part of a project on quality would be much cheaper and simpler than trying to put quality in toward the end, IBM and NASA tried to do much more at the beginning of the Shuttle software development than in any previous effort, but it still was not enough to ensure perfection.
Read the NASA history. They had a 200-page known-bug list in 1983, although they did fix most of them during the long downtime after the Challenger explosion.The Shuttle's user interface is awful. The thing has hex keyboards!. Some astronaut comments include
This project should not be held up as a great example of software engineering. Even NASA doesn't think it is.
Re:Flight Software (Score:3)
http://www.ksc.nasa.gov/mirrors/images/images/p
--Jim
Re:"No Pizza" is good (Score:3)
If your company took the time to write very stable, near-bug-free code, they'd take so long doing it they'd go out of business - their competitor would get the business with a flakey but shipping product and by the time you turned up with your perfect product, everyone would be locked into their stuff (and most likely would have been using it for a couple of years).
Noone wants to write buggy code, we all try to do our best; logical & clear design, defensive programming & good documentation give a good base. Peer review and experience (been there, don't want to do that again) help a lot too. Just writing the comments first (saying what you're *going* to do before doing it) helps.
Another problem is that writing bug free apps on (say) windows is almost pointless as the app will still fall over when some bit of buggy OS/windows API code falls over. Things have to be stable and bug-free from the hardware upwards to give an impression of stability to the user - the problem is, the average user can't tell the difference (and couldn't give a toss) whether the app or the os fell over, it's just "my WP crashed and I lost my work".
Welcome to the real world. Software can be flakey because it was written to be useful before the hardware went out of date - not exactly a problem with the shuttle. You can spend ages hand-crafting efficient code to be overtaken by crap code on a faster CPU. Blame the chip companies for moving so quickly
Hugo
Re:Interesting stuff (Score:5)
I work in the Flight Software (FSW) Verification group in Houston.
The shuttle FSW code is written in something called HAL/S. This stands for High-level Assembly Language / Shuttle. The language was designed to read like mathematics is written. Superscripts like vector bars are actually displayed on the line above, subscripts like indices are displayed on the line below. Vectors and matrices can be operated on naturally, without looping.
We are the only ones with a compiler, because we wrote it ourselves.
Here's a sample:
EXAMPLE:
PROGRAM;
DECLARE A(12) SCALAR;
DECLAREB ARRAY(12) INTEGER INITIAL(0);
DECLARE SCALE ARRAY(3) CONSTANT(0.013, 0.026,0.013);
DECLARE BIAS SCALAR INITIAL(57.296);
DO FOR TEMPORARY I = 0 TO 9 BY 3;
DO FOR TEMPORARY J = 1 TO 3;
A =B SCALE + BIAS;
I+J I+J J
END;
END;
CLOSE EXAMPLE;
I couldn't get the subscripts to line up, but you get the idea.
Unintentionally humorous quote in article (Score:3)
I can see many Dilbert-fans wondering if that is a bug or a feature.
If they ever release this to the public (Score:4)
1 Space Shuttle Endeavor
1 Launch Pad
1 Houston Mission Control Station
4 Astronauts
I can understand why they want no hacks (Score:3)
#Shuttle Waste Dump
#
#I dunno WHY this works, but it does!
What's fun in software development? (Score:4)
I can almost hear the moans from the pizza-and-coke crowd whem they read this: "Where's the fun? Where's the creativity?". But they're under the mistaken assumption that putting lines of code into the editor is the only fun thing about developing software.
IMHO, software development is full of fun activities. What about analysis and design? In my experience, that's where the creativity really comes into play. Just talking to the customer, understanding the problem and making a working design is really difficult, and hence rewarding when you pull it off.
And what about the process itself? Software development is a young dicipline, where individuals and small groups really can make an impact. Nobody really knows how to make good software. Maybe you'll be the one to find out? As the man says, in the shuttle software group, people use their creativity on improving the process.
And last, but not least, I bet those guys have a really good feeling when they talk to the customer after delivery. Not like some people I know, who just hide. ;)
If you can't see the fun of these other activities, maybe you shouldn't be working in this field...
this only works for few projects (Score:3)
- half a million LOC (that's small)
- under development for 20 years
- new requirements are avoided at all cost
So it is a small, long lived project with nearly unlimited budget. No wonder they can afford to have such a process in place. But now realistically, how long does it take to set up such a project from scratch. How about having a customer who does not know what he wants. How about deadlines of less than 10 years from now.
I honestly believe that this way of delivering software is optimal for nothing else but long lived, multi billion dollar projects. In any other case you'll end up with something that is delivered years to late, indeed matches the requirements of 10 years ago and is close to useless.
Unfortunately many software companies are in a situation where they can't afford to wait for perfect software. Take mobile phones as an example. Typically these things become obsolete within half a year after introduction. The software process is what determines time to market. Speed is everything. If you can deliver the software one month earlier, you can sell the phone one month longer.
Of course testing, requirementspecs and software designs are usefull for any project but it's usually not feasible to do it properly.
Re:Seems almost like ISO... (Score:4)
I'm working on a large NASA project now. I have determined that the purpose of this project is not to produce a working software system, but rather to produce a wall full of loose-leaf binders of incomprehensible documentation that no one will ever refer to again.
The process says we must have code reviews - great! But instead of being an analysis of the logic of my code, it turns into a check against the local code formatting standards - "You can't declare two variables with one declaration, use int a; int b; instead of int a,b;" (yes, that's an actual standard around here) instead of "Hey, if foo is true and bar is negative, you're going to dereference a garbage pointer here!"
The forms are observed, but the meaning is forgotten, like Christians going to church on Sunday then cutting people off and flipping them the bird on the drive home.
"Process" won't save us. Which doesn't mean that a certain amount of it can't help, but there is no silver bullet. [virtualschool.edu]
Flight Software (Score:5)
I can also tell you that NASA avoids having to make unnecessary changes to the FSW. For example, the new "glass cockpit" recently discussed here on Slashdot: when these upgrades were designed, they chose to design the interface to the new display modules to exactly mimic the interface to the old intruments. In other words, they are true plug-and-play replacements; one significant reason for this was so the flight software didn't have to be modified.
Likewise, people often ask why the shuttle continues to use such antiquated General Purpose Computers: slow, 16-bit machines designed back in the seventies. There are many reasons, but a big reason is that new hardware would almost certainly require massive changes to the flight software. And rewriting and recertifying all that software would be a huge task. The current FSW works reliably; if it ain't broke...
Huzzah! As I type, we just launched Atlantis. Go, baby, go!
--Jim
THAT is how to write code (Score:5)
The good thing about the way software is written here is that the requirements are written down and sorted out before they even do the planning. How many prgrammers, groups, firms etc. can say that. I will admit, though, that a major problem is changing requirements. Something that just happen in the same way for NASA. It might just be better if people decided to wait a bit before jumping in to the programming. They'll save themselves more time and money in the long run.
Re:Formal Methods are the key. (Score:3)
Another formal system originated in France is the Methode B, that consists in progressively refining logical statements that apply to the desired behaviour of your program (like assert() you put before and after the body of a function) into the implementation of the behaviour :
http://estas1.inrets.fr:8001/ESTAS/BUG/WWW/BUGh
An academic formal methods team that checks the Ariane 5 software:
http://pauillac.inria.fr/para/eng.htm
http://pauillac.inria.fr/para/eng.htm
Safety is cool... (Score:4)
I never quite understand why it is an act of macho bravado to work all night and live off pizza. It indicates two things 1) A badly run project and 2) poor maintainability in the code.
In one of my previous incarnations I worked on display systems for Air Traffic Control, where the quality level was also very high, where the performance requirements were exacting and the specifications precise.
Some would think that this means simple and boring... Of course not. Having to display a track from reception at the Radar to the display in 1/10th of a second isn't easy by any stretch of the imagination, and to do it so it works 100% of the time means you have to understand the problem properly rather than coding and patching.
If only more projects worked like that then there would be a lot less bugs in the world.