Slashdot is powered by your submissions, so send in your scoop

 



Forgot your password?
typodupeerror
×
Programming IT Technology

Space Shuttle Software: Not For Hacks 178

Jeff Evarts writes: " This article in Fast Company talks about the process the Shuttle Group uses to make software. At first it seems too predictable: a very cool project but no hacks, no pizza-and-coke all-nighters, etc. Then, however, it goes on to talk about why: They have an informed customer, they talk to that customer until they have a very clear idea of what is wanted, they have a budget focused on prevention, and they focus on fixing the process and not blaming the individual."

As someone who's done more than his share of late-nighters, it was an interesting view into the mission-critical environment. Maybe there are a few software firms out there that would rather spend some of their money on better processes rather than technical support engineers. Maybe a little more market research and a little less marketing, too. A good read."

These guys are "pretty thorough" the way Vlad the Impaler was "a little unbalanced." Still, you have to wonder how they can claim single-digit errors among thousands of lines of code, but I guess the proof is in the rocket-powered pudding. And lucky for them, their target platform was recently upgraded.

This discussion has been archived. No new comments can be posted.

Space Shuttle Software: Not For hacks

Comments Filter:
  • While the caliber of this grooup seems unbeatable, it's too bad NASA doesn't apply this rigid development model to its unmanned space craft. -- I still don't understand how a difference in units (english vs. metric) managed to go undetected!

    The only thing I could think of after hearing that such an error caused mulitmillion dollar craft to crash was IDIOTS - any scientist should be using SI units today.
  • Would the next generation programmers write in "logic language" instead of C++?

    Most likely not - but automatic verification of programs using logical constructs is a big growth area.

    You can test a program with all possible inputs, and have a clean run. But this does not mean the program is 100% reliable. You must prove the program is correct if you want to be sure it is good enough for circumstances such as shuttle or aeroplane flight.

    With all the complexities of semaphore control in parallel computing, you really have to make sure a program enters and leaves critical sessions at the correct times, without anything else running (that has been designated mutually exclusive).

    Many expertes believe that some Airbus crashes were caused by incorrect verification.

    On a single processor machine, this is much easier, but how many space shuttles do you know of that only have one CPU!

    Have a look at some of the links at Dr. Mark Ryan's [bham.ac.uk] page (university of Birmingham) for some more info.

  • That's the point of SEI Level 5 - "Continuous Process Improvement."

    I've worked on programs assessed at Levels 3 and 4, and supposedly the folks I work with now are Level 5 (I know they made 4, but I'm not sure the certification for 5 is finished). I grind my teeth sometimes at the layers of process we have to wade through to get things done - but every six months or so, they make changes to (hopefully) make it better.

    The SEI's not just working with software [cmu.edu] - they're developing models for System Engineering and Integrated Product Development, as well as Personal and Team [cmu.edu] software process models for small and independent-minded folk. Your tax dollars at work!

  • The main reason Space Shuttle reliability is not a priority in the software industry in general is that the whole focus of the industry has become the quick buck, the rush to the IPO, the dazzling of the user with endless "features" that have minimal utility. The classic example was Windows 3.1. It was colorful, had lots of features - and barely worked.

    The marketroids who set timetables for software projects are another problem. Most of them think any arbitrarily complex piece of software can be designed, implemented and tested in about 3 weeks and get impatient when this doesn't happen. In the shuttle program the engineers are in charge and they determine the timetables.

    Yeah, I'm a bitter, angry little coder...

  • Well, everything i could say about time requirements and budgets has been said a thousand times already.. so i'll just go to what annoyed me about the whole article:
    suits.

    Seriously. Why is wearing a suit such a huge thing in the business world? I can understand if you're a lawyer and need to impress people with your multi-thousand dollar clothing, or an executive who deals with customers and must appease the customers' sense of what's proper in an executive.. but other than that.. WHY?

    It's been proven and re-proven that people are more productive in an environment where they're comfortable. In this particular case the idea seems to be to make your coders as un-cumfortable as possible so they can think of nothing but getting the code perfect the first time so they can go home.. but most places (as has been mentioned repeatedly) aren't like that. So why is it that a guy who works in a cubicle and never gets closer to customers than a middle-manager who is in charge of a supervisor who is in charge of the customer service department has to show up to work in a tie?

    And the worst part is that the business world seems to think people enjoy this. Sure, it's nice to look good.. if you've got a $2,000 suit you're going to want to wear it on occasion.. but how many of us can honestly say that we feel more productive in it?
    Dreamweaver
  • To be this good, the on-board shuttle group has to be very different -- the antithesis of the up-all-night, pizza-and-roller-hockey software coders who have captured the public imagination. To be this good, the on-board shuttle group has to be very ordinary -- indistinguishable from any focused, disciplined, and methodically managed creative enterprise. (from the main article)

    When I meet programmers who think that they are cool and tough, I tell them to read Bravo Two Zero [amazon.com] by Andy McNab. It's the true story of an SAS (British army special forces) unit that operated behind the lines during the Gulf War. Here in the UK, the SAS is revered by most guys in the way that Navy SEALS are in the US. The book has a lot to teach about programming.

    Many people seem to think that special forces troops are so good that they can just be handed a task, left to get it done, and that they will deal with whatever problems arise. Wrong. According to McNab, the True Motto(tm) of the SAS is "check and plan". For example, before approaching an Iraqi military vehicle, they would rehearse opening the vehicle's door: which way the handle turns, whether the handle has to first be pushed in or pulled out, whether the door swings open or slides back, how much force needs to be used, etc. etc. etc. Every little detail is checked like this. And there are backup plans.

    Now read the first sentence of the previous paragraph, but substitute "top software programmers" for "special forces troops". You can see my point. Truly good special forces/programmers/professionals all have some things in common: they are focused, disciplined, and methodical. And they don't feel a need to prove how good they are by taking unnecessary chances.

    The main article also notes that programming teams such as those used for the Space Shuttle seem good at drawing in women. This is hardly surprising. Women naturally like men who are justifiably confident about what they do.

    How well did the eight-man SAS unit perform? They were surrounded by Iraqis, who had armored vehicles. Three were killed. The other five retreated: over 85 km (>2 marathons) in one night with 100 kg (220 lb) of equipment each. About 250 Iraqis were killed along the way, and thousands more were terrorized.

    Sara Chan

  • Comment removed based on user account deletion
  • Heh - we don't ALL use 'em. I, too, am an SAIC software person, and my previous assignment was on a 15-person development - I was brought in as a lead developer, in part to provide experience with SW process as this project tried to work its way up to SEI Level 3. Didn't make it (gov't and the prime got the program canceled - LONG story), but the value of the processes we used was established in the minds of everyone involved.

    Our Telcordia subsidiary (formerly BellCore, half of what was once AT&T Bell Labs) is one of those Level 5 organizations - we're all learning from them.

  • The NASA Space shuttle project is generally described as being at CMM Level 5 (Software Engineering Institute of Carnegie Mellon). The CMM is basically a system to ensure software quality.
    The software fits the budget; is what the client actually requested etc...
    Many major companies/consultancies try to aim for CMM Level 3, and most defence contracts require it.
    It makes the acheivements of the NASA Shuttle program seem all the more impressive.

    It doesn't necessarilly fullfill the Hackers development model, however, it does try to ensure Software Quality.

  • it makes the priority clear. unlike most corporate work in which the fear of unspoken criteria is always yeilding random result, the shuttle program is sure of it's priorities. Of course, it means nothing to someone who signs such a thing without caring. I'm sure someone loves to point out that it hasn't stopped the shuttle itself from problems. But still, the answer is, the black magic is that it makes the priority clearly communicated and acknowledged as communicated.

    pyrrho
  • I haven't either, but it does spark a thought. This is, apparently, a VERY good way of doing things. Why aren't other companies (who don't have NASA contracts) doing this? From what I've ehard, the Love bug wouldn't have worked if there hadn't been huge 'undocumented features' in Microsoft Everything big enough to launch the shuttle through. Pointing out that intelligent quality control can be done is a good thing.
  • You have a specification for what the kernel is supposed to do ... don't you? That document tells you about environment, inputs, expected outputs, performance, and a bunch of other stuff. So write a test program that lives on top of the kernel and inflicts a bunch of specific tests on it, whatever is suggested by the spec. And since you have access to the code you're testing, you can even write nasty devious tests that look for errors at edge conditions.

    Part of one SW development process I've worked successfully with has QA engineers designing the test plan, based on the spec, while the SW wngineers write the code. When the code's done, you implement and run the test plan.

    If a change is made to part of the code, it ought to be reviewd, and the QA engineer should be present. He can then make some new tests that look specifically at the effects of the change. (And run at least a representative sample of the standard tests.)

    My experience was at the application level, on a multimedia authoring and playback system. I'd be tempted to apply similar processes to OS kernel development and testing.

    Scenario testing -- what you described -- can find bugs the formal tests didn't, for a hundred users can be more devious than one QA erngineer. But you can't rely on it to find bugs at the early stage; it's too random and undirected.

    --Timberwoof

  • hehe, well looks like you are sick of the "process system".
    I'm sick of this particular process, partly because I've worked under better ones (or perhaps, less bad ones; I've seen good bits and pieces here and there but they've all had serious flaws). It's not the major factor why I'm leaving in a few weeks (I'm partnering up with a friend who's starting a web development company) but it certainly helped.
    It makes sense because if all staff in your organization feel the same way as you, the process will simply not exist...because no one would be implementing it!
    Not all staff are equal - in any hierachy, each individual rises to his or her own level of incompetence and stays there. The managers who determine the process have the authority, and the rank-and-file coders are supposed to shut up and follow it.
    The point is your organization has processes in place to ensure STANDARDS are met and the final product is fit for mission critical systems.
    The goal is to ensure that the final product is fit. But when that is forgotten, the processes and standards that are actually practiced (whether or not they agree with those in the policies and procedures manual that no one ever reads) will not support software quality in an efficient manner.

    What's needed is a "meta-process", a process to develop the software process and keep it directed towards the goal. I would suggest that a democratic meta-process, where developers themselves work together to evolve the procedures they will use, would work better than decrees from clueless management.

    Religions have a process to you know. It calls for the 10 Commandments to be followed.
    Well, that's one set of religions. Others - such as Zen Buddhism - would say that such rules, or "process", are things to ultimately be transcended. The enlightened person, the sage or bodhisattva, does not refrain from killing based on some religious law; he simply acts. The practice of these religions is designed to help lead ordinary people to that state of enlightenment.

    Perhaps that should be the goal of software development practices, as well - to help lead ordinary programmers into that state where they are enlightened enough to be simply incapable of producing flawed software.

  • Don't forget why the Arian 5 rocket blew up in 1996 [ufl.edu], a conversion error caused a software shutdown that lead to the self-destruct of the rocket.

    "The internal SRI software exception was caused during execution of a data conversion from a 64-bit floating-point number to a 16-bit signed integer value. The value of the floating-point number was greater than what could be represented by a 16-bit signed integer. The result was an operand error. The data conversion instructions (in Ada code) were not protected from causing operand errors, although other conversions of comparable variables in the same place in the code were protected."

    What was the estimate, about $8,000,000,000 of uninsured losses, including 10 years of work for the scientists with satellites on board.

    I wonder how many other maiden voyages have started off so poorly, other that the Titanic that is.

  • and how slowly they are being developed? I don't mean that it's a bad thing -- it's good that Shuttle program allows them to do it at reasonable pace and with reasonable requirements, but if everyone else wasn't under constant pressure, and if everyone's else software wasn't a victim of feature bloat, dealing with poorly documented and even worse implemented protocols, and never-ending stream of bullshit coming from the management, everyone else would write robust software, too. Well, not really everyone -- some "programmers" wouldn't be able to do anything because they have no skill, no education or are plain dumb, but reasonably geeky and educated programmer can pull something like that in ideal conditions -- and those guys _are_ working in ideal conditions.

  • I think the point of that exercise is to promote a sense of well defined accountability and confidence, up and down the management chain. Sure, in theory, the project manager should be ultimately accountable. But all too often she can, post facto, dodge responsibility for failure by (accurately) claiming that other project stakeholders failed to provide their inputs to the project correctly. In Mr. Keller's case, he would not sign the certificate if he felt that failure was a possibility, for any reason. This also gives the decision makers a well defined "emergency brake" that perhaps could have averted a *Challenger* like disaster, where some line managers said STOP, while some higher-ups said GO!
  • >Likewise, people often ask why the shuttle continues to use such antiquated General Purpose
    >Computers: slow, 16-bit machines designed back in the seventies. There are many reasons, but a big
    >reason is that new hardware would almost certainly require massive changes to the flight software. And
    >rewriting and recertifying all that software would be a huge task. The current FSW works reliably; if
    >it ain't broke...

    Actually, AFAIK, the main reason is that old 386s are tested, tested and, once more, tested for space use. With newer processors, there are too many unkowns to risk a space shuttle. The line-widths in modern processors are so small that background radiation is beginning to cause problems in space without proper shielding. Probably they are testing 486s and Pentiums right now, but it'll be another ten years before they're ready for extensive space use.
  • by Glimmer_Man ( 126147 ) on Friday May 19, 2000 @01:57AM (#1061983)
    I worked on some mission-critical/life-critical stuff about 2 years ago. It was aircraft related, and since it was basically carrying the data which made the plane fly it was critical by any definition. The processes we followed was absolutely document driven. User specs were examined, questions asked and the user asked to add definition and clarification for several iterations of the document. Then the software requirements etcetcetc were followed, ech document with quite a bit of iteration. Eventually we found that typically documentation and design would take 50% of the project. Testing would take about 30 to 35%, and the actual implementation hardly took any time at all. Now in the commercial world, I find that the process is VASTLY different. Implementation has started shortly after user specs have hit the desk, before design or documentation has begun. As a result, the system we currently have is very patchy in places. Its mission is a lot less critical, but the bugs slow us down tremendously. The bugs are due to the process. The process is requirements driven, not documentation driven. But it seems that the current system I'm working on has about the same complexity as that I used to work on. Only even though we are supposed to be pushing it all out the door faster, the bugs are slowing us to the point where we have approximately the same rate of progress as the mission-critical project!! Lesson: If you do it by the documentation, you will push it out faster and cleaner (and more bugfree!!!)
  • I don't know a software company that wouldn't implement such a strategy to ensure that their software wasn't perfect if they had the budget to do so. As with all things of this nature it comes down to the money vs. quality contest. The better the quality the more it cost to produce but unfortunatly its not an even rise up the scale. It may cost you 2X$ as much to improve quality by 50% but it might cost you another 4X$ to get the next increase of quality of only 25%. Even the article points out that, that the Shuttle software is the most expensive in world and it still is run on old computers. Give me the same scale of budget/time and I'll give you a windows operating system that a fanatical Linux user would be hard pressed to complain about. Or, even better. I'll use the funds to set up an open source group to make Linux as versatile and useful across the board, from beginners to the "Linux guru's".
  • I don't work with the FSW people, so I'm not sure about the details of their work flow, but I think it's safe to say that new code goes through several readings, probably both at the pseudo-code and code levels.

    Schedule is driven by the planned date for launch, and worked backward from there. For example, if you're going to launch a mission at date L, then the crew begins training at L minus X months, which means that the software has to be ready for the SMS at L minus Y months, which means you have to begin design at L minus Z months, etc. I'm not sure what X, Y, Z and related time deltas are, but I believe they probably start planning at least a couple of years in advance.

    --Jim
  • Or this:

    /* O2 systems monitor
    clean up later -
    too drunk right now */

    - eddy the lip

  • Sort out that closing italics tag! The front page article only has the first paragraph, and the second paragraph has the </i> tag. All the headlines are italicised!

    My god! Where's all my karma going?
  • However, dropping to your knees and worshipping the brilliant scientist-programmer who wrote the core code your company's business depends on will not make you milions of dollars.

    That code still needs to be tested against specifications -- even if the specs are written afterwards -- and (re)engineered so that it can be maintained and expanded as new versions and applications demand. Trust me, it's better to write the code in a comprehensible and maintanable way from the start.

    If you have a genius who won't work within the programming *organization*'s process, you're sunk. If your genius sees the process as liberating, freeing his mind to create really good stuff ... then pay him lots of money and stock options.

    --

  • Maybe you could release a free 'Light' version HAL/ER, High-level Assembly Language / Estes-Rocket for the rest of us.

    Please, think of the balsa wood and cardboard tubes. For their sake, please don't release such a dangerous tool!

  • Thanks for your ideas.

    My idea of computer logic was the following: one of my friends studies on a course on computer engeneering (The Netherlands). He's shewn me once one of his scratches. It was a difficult program, several factors envolved, etc.

    But it fit into one simple "logic" line!

    On the other hand, another "simple" programme took almost 3 long lines of "logical formulas".

    What I meant, it would be nice to write programmes in this language, but let the computer do his thing writing the code.

    Sorry if it sounded too stupid.

  • I agree that people usually work better (at least at coding) when dressed comfortably.

    However, maybe I missed it, but I didn't see anywhere in the article that it mentioned that they wore suits. It said "moderately dressy ... neat but nothing flashy, certainly nothing grungy". Still sounds like it could fit the "comfortable" range to me.

    Of the photographs I could make out, one group of people was wearing jackets, but the other group didn't even have ties. Probably what separates management from the grunts would be my guess.
  • According to this story there were problems with the software on the Jubilee Line extension.

    "It was called Moving Block Signaling

    Oh there were certainly problems, especially with the MBP (Moving Block Processor). It was a truly great idea, no question. Idea was, roughly, that since the dawn of railways, signalling has been on the 'Fixed Block' system - divide the railway into chunks, and only allow one train per chunk. MBP idea is that if the trains have a map of the system, and monitor their own speed, and the condition of their brakes, and the diameters of their wheels to the nearest 10oth of a millimeter, and a whole bunch more data, plus information about the other trains on the network, then the MBP can work out the safe braking distance (LMA, Limit of Movement Authority, including several extra metres for safety), the upshot being that you can then cram a lot more trains per mile of track, driving themselves more safely than humans ever could.

    The old system would allow up to 12 trains per hour, MBP could potentially do 36, if you could get people on and off the trains fast enough.

    The project went so far over budget the whole firm looked like going tits up, and after losing 25 million pounds on this project alone, it was all rather scaled down.

    But to be honest, what really blew up everyone's plans was that, when the project started, it was meant to be delivered for 2003. Then the Major Government decided to have the Millenium Dome and furthermore decided that the Jubilee Line Extension would be the preferred (i.e. the only convenient) way of getting there. Bingo - the project delivery date suddenly moved forward by three years with no possibility of compromise. Now by early 1999, we'd run simulator tests - first two fake trains, then a real train and a sim, and were about due to get to the Two-Real-Trains test. However, at this point, a) London Underground needed pretty much constant access to the tracks, and b) we still needed to get our Safety Case. That could easily take another year.

    So yes, they got Colour Light Signals (which apart from the MBP benefits described above also means you need to think about stuff like braking distances vs Line-Of-Sight).

    The trains have most of the equipment, Automatic Train Operation, Automatic Train Protection, the Common Logical Environment (Effectively and Operating System for railways), and once MBP's finished it could be fairly easily retrofitted. It's still being developed for the Madrid Metro.

    Basically, if our client (London Underground) hadn't had their timescales rewritten for them by the Govt, I believe we would have delivered the most advanced railway in the world, and the repeat business would have made not millions but billions.

    Shame, really. A Lot of very good people did a lot of very good work on MBP.

    TomV

  • better link for the Methode B :

    http://archive.comlab.ox.ac.uk/formal-methods/b. html
  • A ladder diagram is used to represent basic logic circuits. Basically you have two lines going down the sides of your drawing, representing a voltage, and you put draw little circuits across to make rungs. A straight line across would be a short circuit.

    For example:
    |--Switch1---lightbulb1---|
    |--Switch2-/

    This represents two switches in parallel, so lightbulb1 will get juice if either Switch is on. So this is the equivalent of OR.

    |--Switch1--Switch2--Light1---|
    This is AND.

    You can add new rungs and include relays, so that a switch3 could be a relay driven off of lightbulb1. By cascading with relays, you can have states, which can represent steps in a process. Switches can be sensors and lightbulbs can be actuators, so you can build a very simple circuit that can control a multi-step process with safety conditions, such as "only activate the forge if there is a blank in place(detected by a proximity sensor), and the temperature is withing certain limits(sensors), and previous steps were completed successfully, and the operators hands are safely out of the way holding down switches 8 and 9." Instead of wiring all this up as actual circuits, you can connect all of the sensors and actuators to the PLC. That allows you to store your programs, it simplifies the wiring, and you don't need to use actual relays, timers, etc. (You'll still use some relays of course if you need the low voltage coming out of the PLC to activate heavy equipment.)

    Simple do-it-yourself application: You could connect all your home lighting, along with motion sensors and switches to a PLC, and set up any number of different logical relationships. So a single switch could be "home/away" which could control a large number of lights throughout the house. A single "movie lighting" switch could turn off certain lights, turn others on, dim a few more, turn off the dishwasher, and set a timer to go back to normal in two hours in case you fall asleep.

    I don't have one, but I think the cheapest models are probably under $100. They never crash, they can run for years, they're extremely reliable, easy to use, and cheap. If you can program a VCR, then you can program a PLC. Unfortunately, that rules it out as a product for the home market.

    "What I cannot create, I do not understand."

  • ...that we could arrange for a situation where the requirements are all fixed and locked down, and documented, before any coding begins? In industry jobs, I've never seen a project that wasn't having some marketing group force "critical" changes the whole time something was being written.

    You get what you pay for, and take the time for. These days, most people and companies seem quite willing to settle for "bad, buggy programs now" rather than "better programs, later". Of course, without organization (also common), it's possible to wait and get nothing later, too. Process is expensive in terms of people involved and time, but it's a lot cheaper in the long run than the alternative. :)

    Open-source projects actually follow this - every successful open project I've seen has a definate hierarchy of people managing patches and controling what winds up in the latest sub-point build, and making key architectural decisions so nothing derails them. Oh - and there's no one who'll fire you if marketings last-minute changes aren't rushed through. :)

  • I can almost hear the moans from the pizza-and-coke crowd whem they read this: "Where's the fun? Where's the creativity?". But they're under the mistaken assumption that putting lines of code into the editor is the only fun thing about developing software.
    Typeing code is not what the job is about (despite what people seem to think). We're in the business of doing cool things for people. The crativity and ideas that flow from the (very smart) people around me are what drives me.
    Just sitting coding typing is a bit dull compared to human interaction...

    "The reason I was speeding is.....
  • I think there are folks at NASA who are not satsified with the SYSTEMS engineering done. When the SOFTWARE engineers had the old Apollo hex keypad (as an example)dictated to them as a system requirement, I would say that the software engineering that followed was still pretty impressive. The project was, and still is, an impressive job of software engineering.
  • ... you have more than enough to do it well.

    The problem with this arguement is that while many companies think that they can't afford to do it, what they really can't do is afford NOT to do it. Software is becoming more complex - it's the nature of the beast. For the most part, design is not; we are all still using procedures that were brought into being in 'dawn of computer age', with the exception of higher order languages and more focus on OO.

    You are correct in that it may be expensive, THE FIRST TIME. This is called a 'learning curve' and the cost is amortized over the number of times you use this technique. You may also say that the process itself is expensive but that is incorrect, or at least only partially correct. The process allows errors to be caught EARLY, which reduces cost. Please don't tell me that you believe a code-compile-fix routing can catch these sorts of errors as early as a well thought out design.

    Also, rigourous design allows for flexibility - this may sound contradictory but consider the use of design patterns. They are NOT things that can just be thrown into the code ad hoc; they require thought and intelligence. A good upfront design means the ability to use these tools. Consequently, use of these design patterns allows for a certain level of flexibility in statisfying the lower to medium level nasty customer requests, and certainly helps on the more egregious ones. Does a code now, look later approach allow this? (if you think so, I have this bridge I'd like to sell you ...)

    In short, yes, using these techniques is expensive. But they also produce code that cuts development time (i.e., no stuck in debug/extra request phase for 2 years) and once people get used to the process, the extra cost/load is minimal.

  • I seem to remember seeing this article before, and since the only place I read anything interesting anymore is as a result of hearing about it here... ;)
  • by Anonymous Coward
    their process ensures it will be. The vast majority of software development is performed in an environment where individual "heroes" are the primary reason projects succeed. The Space Shuttle Onboard Software processes will seem to almost all of us to be "common sense", but how many of us work in a place where management mandates these things to ensure quality? Their environment is "ideal" because they have made it so. Unfortunately, many managers' (and too many developers', also) attitudes can be described as "get it done", and it shows!

    They were rated CMM level 5 in 1988 - one of the first organizations anywhere rated at that level of software process maturity. Another good description of their processes (and how they created them) is in the book "The Capability Maturity Model - Guidelines for Improving the Software Process" (ISBN 0-201-54664-7) in Chapter 6, "A High-Maturity Example: Space Shuttle Onboard Software".

    As far as making software error-free, a quote from the book will help illustrate the difference in attitude they have (it's talking about a graph). "These data include failures occurring during NASA's testing, during use on flight simulators, during flight, or during any use by other contractors. Any behavior of the software that deviates from the requirements in any way, however benign, constitutes a failure. Contrast this level of commitment with the cavalier attitude toward users in most warranties offered by vendors of personal computer software."

    The best place to find more about the CMM is their web site at http://www.sei.cmu.edu/ [cmu.edu]
  • I flew the F-15E for 4 years, and it was common to have to reset a system because of some sort of glitch. Whether the glitch was hardware or software based, I didn't really care. If a system stopped working reliably or failed outright, it was time to troubleshoot. That usually meant first a software reset, a hardware reset, and in the worst case (but still common) a complete power down/wait 30 seconds/power up cycle.

    2-3 times per flight is more than I usually experienced, but I think I had to reset at least one system on 50% or more of my flights. That's quite a bit more than 1 every 500 hours. Some aircraft were better than others too... One jet required it's radar to be reset every 15-20 minutes. That problem was eventually traced down to a wiring harness connector...

    In addition, there were and still are known software problems in that aircraft. The known ones usually have some sort of workaround (if the heads up display freezes, cycle power on the display processor, stuff like that), but the occasional random crashes or glitches (like occasionally the plane will suddenly think it's flying 100,000 ft below the ground) have no known cause and the only fix is to reset something until the jet behaves itself again.

    My last point is that the flight control software in the F-15E is designed to go offline if the aircraft exceeds certain parameters. In that case, the flight controls must be manually reset in one of four ways. There is a quick reset switch, a "hard" reset switch for pitch, roll, and yaw, we can cycle power for those systems, and worst case we can pull and reset the circuit breakers for the flight control system components.

    The funny thing is, it works only because the rest of the design is very robust. Most systems have some sort of backup, and the plane flies just fine without any electrical power at all. Once the software problems are known, they're dealt with as simply one more environmental factor until they're fixed. The fix may take over a year, but they are usually fixed eventually.

  • by Anonymous Coward
    Before every flight, Ted Keller, the senior technical manager of the on-board shuttle group, flies to Florida where he signs a document certifying that the software will not endanger the shuttle.

    Is this supposed to be black magic or something? If something bad is bound to happen, it will happen regardless of how many "certificates" and such were signed.

    Or maybe it's about transferring responsibility?

    Maybe Mr. Keller could sign a certificate that aliens will contact us next wednesday?
  • They are going to use old Pentiums (no MMX) with Win95 on the new space station.
  • by Anonymous Coward
    coz sure as heck, the kernel developers have lost the plot.
  • I Think your problem here is that you still subscribe to the fallacy that "Code like Hell" Programming is faster than doing things properly.

    It isn't.

    Many organisations are starting to find this out and are moving to proper professional engineering practices that improve reliability increase schedule predictability and more importantly reduce costs.

    A couple of hundred years ago people built houses & bridges the way we build software - work until it's done. These days we have archaetects and project managers that build houses faster, more reliably and ON BUDGET.

    This is the way the wind's blowing. It's a lot less heroic but it's the future.
  • by BigStink ( 99218 ) on Friday May 19, 2000 @01:00AM (#1062006)
    It's not just space shuttle code that needs extreme reliabilty. The embedded systems in civilian aircraft are not interrupt-driven because of the reliabilty issues associated with interrupt-driven code - interrupts make the software to hard to debug thoroughly (becuase there are so many combiniations and timings of input signals to test), make faults difficult to replicate and have the potential to go wrong on a spurious set of input signals. This sort of problem doesn't really matter too much in a home or corporate computing environment, but it would be a major disaster if a plane carrying a few hundred people were to crash into a city with a population of a few million, just because of a software error. These things need 100.00 per cent reliability, so obviously software hacks are frowned upon.
  • Some manager I had said: If you want to be sucessfull, find a sucess and see how it was made.

    The obvious canidate would be Bill Joy's TCP/IP implementation. Eveyone runs it:

    1. BSD's always used it

    2. SYS V incorperated it - thus it flowed to most commercial unixes

    3. LINUX borrowed heavily from is (recall that Regents of the University of California boot message?)

    4. If the TCP/IP fingerprint of WIN2000 is any indication, they borowed it too.

    And it works right every sincle time you use it. So, what process made it? A single genius. All the cool process in the world won't make up for the fact that the single requirement for great software is a great designer/programmer. The required process is simple - whatever that person requires to let their genius loose.

    The only way to circumvent this requirement is to do what NASA does and spend probably literally hundreds of $ per line of code.

  • I also work down the hall from some of the folks in this article, and I know quite a few of them from college (Co Cyclones). Anyways, I thought I would mention this project from United Space Alliance's Dual Program [usa-dual.com]. USA [unitedspacealliance.com] is a joint venture of Lockheed and Boeing that took over Shuttle Opperations a little while back (the group mentioned in this article is part of USA and has been for about year or so). The Dual program is a USA/Academic partnership for research in space operations. The project that I thought you might be interested in is the development of a space shuttle flight computer emulator for linux described here [usa-dual.com].

    On another note, the group that I work in (Flight Design and Dynamics) may start looking into moving from our IBM/AIX platform to a Linux platform. Penguins in space! I guess that is a bit offtopic, but oh well.

  • Reliability obvious gets a big premium when crash is not a metaphor.
  • Hmm. Talk to the client until you fully understand the problem. What a concept! No doubt this will make some fast and bulletproof code. Now if only they can teach their engineers to convert units correctly....
  • Ever hear of Boo.com [boo.com]?

    ;-?
  • ... any scientist should be using SI units today.

    That might be true of the scientists and engineers, but not necessarily of the contractors or of other government agencies.

    --
  • Every time I read a history of a programme and find a line "completely re-wrote the code", I begin having second thougths about how really good the programme is.

    Have you ever programmed a half-way complex system yourself? Re-writing it from scratch is often the best thing that you can do, the more often , the better. In fact, there are software engineering models that officially choose to re-write their code often. This is called "prototype-based SE".

    The reason is that while you write the code, you invariably notice some decisions that you made earlier were false, but they affected the design so deeply that changing it would be more work than rewriting it from scratch. The alternative is to live with the design flaws; most commercial projects do that because they don't have the time to re-write their code.

  • by TomV ( 138637 ) on Friday May 19, 2000 @02:25AM (#1062014)
    I worked on some mission-critical/life-critical stuff about 2 years ago. It was aircraft related, [...] The processes we followed was absolutely document driven.

    Likewise, i worked for a while on the signalling system for the Jubilee Line Extension for the London Underground.

    Totally documentation driven. First there was the CRS (Customer Requirements Spec). - this then transformed via an SRS (Systems Requirement Spec.) into the FRS (Functional...) and the NFRS (Non-functional...). From these we had Software Design specs, Module Design Specs, Object Design Specs, Boundary Layer Design specs. in all there were around 4000 specification documents for the project, often at issue numbers well into the teens.

    What really made the difference though, was not so much the existence of documentation, as the absolute insistence on traceability - every memeber function of every class in the whole system could be traced back to the Customer Requirement Spec, and every Requirement could be traced to its implementation. This meant - no chrome: everything in the spec was p[rovided, and nothing was provided that wasn't in the spec.

    Also worth noting that: the whole thing was in ADA95. The compiler was very carefully chosen. Coding standards were tight, and tightly enforced - function point analysis was king - anything with more that 7 function points was OUT, simple as that. Every change to anything, however small, required an inspection meeting before and after implementation, with specialits from every part of the system which could be impacted, plus one of the two people with a general overview. Then there were the two independent test teams and the validation team.

    Ye Gods it got tedious, no denying that. But in a situation where lives depended on good software...

    Now I probably apply only a tiny fraction of what I learned, but when I decide to ignore part of the methodology, at least I know I'm ignoring it. And I'm aware of what I'm missing.

    In short - learn about the safety-critical approach. Ditch most of it as excess baggage by all means - it's often simply not justifiable. But be aware of the choices you're making.

    TomV

  • Well, the point is really this: There is a point beyond which making the software more stable is so much more work that it's simply not worth it. Where this point is depends, of course, largely on what the consequences of failure are. Obviously, if multi-million-dollar equipment is at stake, it is worth being extremely thorough.
  • Unreasonable deadlines and too few programmers are usually the reason for pulling all-nighters, it seems to me. Other environments where those kind of things aren't necessary can be found in the vincinity of banks and insurance companies, so look there if you want relaxed programming jobs.
  • So, commercial software is lousy because we're all stupid, and choose not to use good development practices.

    Bullshit.

    NASA didn't just have a solid process, they had MONEY. They BOUGHT that quality, by hiring an order of magnitude more testers than you'd find in the commercial world. By budgeting several years of development time rather than weeks or months. By reducing the number of lines of code that any one developer is responsible for.

    There's a lot to learn from a highly structured development process like NASA's. But don't kid yourself that the quality they produced is simply because they 'had the right process' or had better management.

  • by severian ( 95505 ) on Friday May 19, 2000 @07:57AM (#1062020)
    One of the assertions that seems to keep coming up is that higher quality code (i.e. more stable, predictable, etc.) always means more expense or time to create. That's not necessarily true. To take an example from the car industry: in the 60's/70's American car makers made cars by building them on the assembly line, and then having "quality inspectors" at the end of the line who would check for defective parts which would then get fixed. Using this model, it was always assumed that achieving higher quality naturally meant higher costs (you would have to spend more to hire more inspectors, and you'd have to replace more parts), and longer time (adding new checkpoints in the line would increase the time to manufacture a car).

    But then the Japanese came along with a radical new idea: if there are defective parts coming down the line, then we should figure out why they were created defectively in the first place and fix that. Then the number of defective parts at the end of the line would be less, thus you would need *fewer* inspectors and *less* time at the end of the assembly line. (Ironically, this principle came from an American named Edward Deming; unfortunately American companies were too successful during his lifetime for them to take him seriously :-) So the Japanese were able to build cheaper cars quicker than the Americans while actually having higher quality.

    I think that's very analogous to the current argument. Under the current system of coding, you basically hack together something that sorta works, and then use sophisticated debuggers/development tools to figure out which parts are buggy. Using that system, it's true that higher quality requires more cost and time.

    But I think the point of this article is that that is the wrong way to approach programming. First, figure out why defective code gets written in the first place (be it poor client specifications, poor management, poor documentation, whatever) and then fix those processes, and you'll turn out quality code without having to spend any more time or money!

    As a practical example, I first learned C under a CompSci Ph.D. who was a quality fanatic. In order to teach me to code properly, he would give me projects and then not allow me to use a debugger. Nothing at all. Zilch. Nada. The only thing I was allowed was to place print statements within my program wherever I wanted to see what was going on. As a result, I spend *a lot* of my time planning my code out, and reviewing it over and over again before even compiling it, because I knew that if there were bugs in it, I couldn't just fire up a debugger and take a look.

    And secondly, if there were bugs, I couldn't just trace through the entire program or create a watch list of every variable. I had to study the bug and understand it, look at the code and figure out where the bug most likely was, and then use selective print statements to look at the most suspicious stuff. That way, when I encountered bugs, I'd be forced to actually understand what the bug was and then analyze my code to figure out where that error most likely was.

    If this sounds like a programming class from hell, believe me, it was incredible! I couldn't believe how much of my code worked the first time it compiled. And when there were bugs, I actually fixed the underlying flaw in the logic rather than just applying a temporary patch. What's more, since the rest of my program was well planned and documented, there were no "hidden" effects: if I found a bug, I knew exactly which parts of the program it affected, and perhaps more importantly, *how* it affected those parts. Thus they were very easy to fix.

    Believe it or not, it took me less time to program this way than using debuggers, and the resulting code was much more stable and understood.

    If you look at commercial software these days, it's not uncommon for the debugging period to take longer than the actual coding. In other words, there are more quality inspectors than there are assembly workers, and the time the code spends in inspection stations is longer than it spends being produced. It's tough to say that this is the "efficient" method of programming...

    If you want to see where this is heading, just turn once again to the car industry: once American companies got their asses kicked by the Japanese, they adopted their techniques, and Surprise! Cars now come out of their factories with higher quality, in less time, and at less cost (adjusted for inflation and new features :-). Who would've believed it? :-)

  • Noone wants to write buggy code...

    Well, mister know-it-all...how do you go about getting really obnoxious amounts of money out of the customer?

  • Even if yo consider a project managment approach, often you will wind up rewriting the code from scratch anyhow. From personal experience I can tell you that realying too heavilty on user input in the proposal/design phase can cause a project to completey lose focus. Many rewrites are the result of "feature creep" associated with pandering to the user's every whim. The most sucessful projects I have seen started with a narrowly defined, strict set of goals. Even at that, the trend seems to be at least one major rewrite by the time the software reaches its third version. Code reuse is highly overrated.
  • by cshotton ( 46965 ) on Friday May 19, 2000 @02:49AM (#1062028) Homepage
    ...no pizza-and-coke all-nighters...

    That's because pizza-and-coke all-nighters are a direct byproduct of poor planning, either by the engineer implementing the code, the architect creating the design (if there even is such a person) or the person making the engineer's schedule. And the result is usually hastily written, incompletely tested software that is typical of most product offerings for use on the desktop.

    The process of authoring mission critical, man rated software is so far removed from the ad hoc, informal, duct-tape-it-together approach that most programmers use that no direct comparison can be made. I've seen both ends of the software development spectrum and they each have their uses. You can't launch a shuttle with a bunch of last minute kernel patches and some stuff that was written the night before the launch date. But you can't compete in the commercial software marketplace with code that takes 2 or 3 years to specify, design, implement, test, and integrate, either.

    Stand in awe of the people who have the skill and discipline to write software of this quality. Learn what you can from their process and try and use the lessons they've learned. Their stuff doesn't break, because when it does, people die. If O/S developers had that same attitude about their code, we'd never see blue screens of death, kernel panics, or any of the other flakiness we tolerate on our desktop machines.

  • US Military test pilots aren't stupid people. Most of them have advanced degrees in aeronautics or aeuronautical engineering -- at the insistance of the military or aerospace firm they work for.

    I suspect that, upon seeing the "computer restart" button, the test pilot evaluating the aircraft would start asking a series of questions:

    1. What is the failure rate of the computers; i.e., how often will that button have to be pressed.

    2. What is the time elapsed between the computer failing and the computer operational, including the reaction time of the pilot or weapons officer? Assume that the pilot and weapons officer are already a) flying the plane, b) lining up on target, c) watching for SAM sites, ground fire, enemy aircraft, and d) coordinating with friendly aircraft.

    3. How does the computer controlled, fly-by-wire system function during the timeframe covered in question 2? Will it fly steady (given that many modern fighter airframes are inherently unstable in flight, and rely on active computer control)? Will I have any control over the plane until it restarts?

    4. If this happens in a dogfight, what are the chances of recovery and survival?

    Or not... In truth, I suspect the first few questions would really be something like "You're kidding me, right? Do you think I'm crazy? Would you be willing to fly this deathtrap?"

  • Every time I read a history of a programme and find a line "completely re-wrote the code", I begin having second thougths about how really good the programme is.

    There have been several occasions last year where me and a co-worker ended up trashing pages and pages of code to re-write it with the same functionality, but modular and ended up being smaller in some cases.

    My company used consultants who wrote terrible code. Let's use this example...there is a program that calcuates x days ago. The consultant's program went and tried to calculate leap years and all of that. Our program that replaced it used system library calls to date, and then simply subtracted the proper amount of seconds. Other ones were hardcoded scripts to run sql on our database, we replaced that with a perl script that took the sql as a parameter.

    So there are times where a re-write is better than maintaining the code. I guess the biggest case in point is mozilla versus navigator. Basically I agree that projects were planned and used software engineering principles we would most likely end up with good products. Granted game programs seem to be done best when they're a hack.. But how many times have you seen long term maintenance of games?
  • by Animats ( 122034 ) on Friday May 19, 2000 @08:16AM (#1062035) Homepage
    That article has been around for a while. It paints an excessively rosy picture of the Space Shuttle flight control software.

    Here's NASA's own history [nasa.gov] on bugs in that software:

    • So, despite the well-planned and well-manned verification effort, software bugs exist. Part of the reason is the complexity of the real-time system, and part is because, as one IBM manager said, "we didn't do it up front enough," the "it" being thinking through the program logic and verification schemes. Aware that effort expended at the early part of a project on quality would be much cheaper and simpler than trying to put quality in toward the end, IBM and NASA tried to do much more at the beginning of the Shuttle software development than in any previous effort, but it still was not enough to ensure perfection.
    Read the NASA history. They had a 200-page known-bug list in 1983, although they did fix most of them during the long downtime after the Challenger explosion.

    The Shuttle's user interface is awful. The thing has hex keyboards!. Some astronaut comments include

    • "What we have in the Shuttle is a disaster. We are not making computers do what we want" -- John Young, Chief Astronaut, 1980s
    • "We end up working for the computer, rather than the computer working for us." -- Frank Hughes, NASA flight trainer
    • "crew interfaces...more confusing and complex than I thought they would be" -- John Aaron, NASA interface designer
    • "(the) 13,000 keystrokes used in a week-long lunar mission are matched by a Shuttle crew in a 58-hour flight" -- NASA history

    This project should not be held up as a great example of software engineering. Even NASA doesn't think it is.

  • Solidifying a contract like that works when the client actually knows what they want. More often they have absolutely no clue of what they want/need, and require the programmer to help them along that stage as well.

    With these type of clients (and I've dealt with many) taking the proper long stage of design and discussion doesn't work at all. The client immediately changes their tune after seeing initial results. Not so much to add features, but that the features they actually requested were not the ones they needed, or didn't work within their business practices.

    Doug
  • If you want to see where this is heading, just turn once again to the car industry: once American companies got their asses kicked by the Japanese, they adopted their techniques, and Surprise! Cars now come out of their factories with higher quality, in less time, and at less cost (adjusted for inflation and new features :-)

    A good book on this (from 1986-8, so it leaves off when the US auto industry was in pretty much the nadir of its decline) is David Halberstam's The Reckoning [fatbrain.com]... I'd go into further detail, but you have to read the book. It goes into Ford & Nissan overall, but it's very rich with both history and personality (particularly Mr. K of Datsun 240Z fame) and an excellent read.

    There are definitely some lessons to learn, particularly regarding American hubris during fat economic times..

    Your Working Boy,
  • 'Course if they started writing space shuttle code like that, it would be "Goodbye World"...
  • Consider these stats : the last three versions of the program -- each 420,000 lines long-had just one error each. The last 11 versions of this software had a total of 17 errors. Commercial programs of equivalent complexity would have 5,000 errors.

    This software is the work of 260 women and men...

    Commercial programs of equivalent complexity would have been written by 7 or 8 people.

  • If more projects worked like that, there would also be a lot less software in the world. Say goodbye to whateever you're running to watch slashdot, you couldn't afford it. (You also couldn't afford the hardware to run it on, because faultless software is of little utility without faultless hardware.)

    I would suggest that if every software project werre SEI-5, there would be no internet and people would be doing papers on typewriters.
  • My third programming job was my first experience with software engineering. I'd had 4 years of experience at two other jobs -- one where I wrote code for a InterLibrary Loan book lending database, and one where I worked on an e-commerce package. There was not a thing at either place that would qualify as a spec, and there was no process in place for engineering. I didn't know anyone who used specs. I assumed that this was something that was taught by computer science professors, but wasn't actually practiced by anyone.

    Then I got a job at the Waterford Institute. Their process wasn't probably as tight as the space shuttle, but there WAS a process, and there were specs. Nice specs. Nearly psuedo-code.

    We were programming educational activites for kids learning math. Activities were created by design teams consisting of an educator, an artists, a tech writer, and a programmer. The tech writer would document everything that went on at the meetings, and distill it into spec. The design team would meet regularly over a period of several months, refining the spec until it was solid.

    The spec described various states of the software. When a user did something, the state of the software changed, and did something accordingly. I'd never seen software described this way, but it made a big impression on me, and it made things easy to write and debug.

    ('course, the platform we were writting on was in Java, which kept changing, and in-house developers were writing our own object library, which kept changing too, so your code would work one day, and then wouldn't the next, so everything wasn't perfect. But hey. I was impressed with the specs :)
  • I saw this article a while back linked from here. Incredibly cool stuff . . . the part about "blueprinting software" and "how we design software in the future" was especially cool. It makes one aspire to code to a higher standard.

    That said, something I was curious about that the article didn't answer, and that I don't see mentioned here yet-- what language is all of this done in? Ada would be my guess, or is there something even better than that?
  • by kzinti ( 9651 ) on Friday May 19, 2000 @03:27AM (#1062068) Homepage Journal
    Want to know what a Shuttle GPC looks like? Check out [nasa.gov]
    http://www.ksc.nasa.gov/mirrors/images/images/pa o/STS39/10064134.htm.

    --Jim
  • by altman ( 2944 ) on Friday May 19, 2000 @03:27AM (#1062069) Homepage
    The problem is, in the commerical world the product is driven by tight deadlines and getting the product out before you get eaten alive by your competitors (who are also doing the same thing).

    If your company took the time to write very stable, near-bug-free code, they'd take so long doing it they'd go out of business - their competitor would get the business with a flakey but shipping product and by the time you turned up with your perfect product, everyone would be locked into their stuff (and most likely would have been using it for a couple of years).

    Noone wants to write buggy code, we all try to do our best; logical & clear design, defensive programming & good documentation give a good base. Peer review and experience (been there, don't want to do that again) help a lot too. Just writing the comments first (saying what you're *going* to do before doing it) helps.

    Another problem is that writing bug free apps on (say) windows is almost pointless as the app will still fall over when some bit of buggy OS/windows API code falls over. Things have to be stable and bug-free from the hardware upwards to give an impression of stability to the user - the problem is, the average user can't tell the difference (and couldn't give a toss) whether the app or the os fell over, it's just "my WP crashed and I lost my work".

    Welcome to the real world. Software can be flakey because it was written to be useful before the hardware went out of date - not exactly a problem with the shuttle. You can spend ages hand-crafting efficient code to be overtaken by crap code on a faster CPU. Blame the chip companies for moving so quickly :)

    Hugo
  • I work for this project and the bulk of the flight software application stuff is written in HAL/S while the system software (OS) is written in both HAL and assembly. HAL/s is a language developed specifically for real-time shuttle operations back in the mid-70's and looks alot like Pascal. It is currently maintained by a company called AverStar specifically for this project.
  • I thought they were the only group to achieve SEI-Level 5. If not, then who else has, I'd love to go and correct one my lecturers.

    When the Capability Maturity Model for Software [cmu.edu] was published by the SEI [cmu.edu] there was only one ML-5 orginzation; at the time they were known as the IBM Onboard Shuttle group. Thankfully, times are changing.

    According to the SEI's 1999 survey [cmu.edu], 61 organizations reported a Maturity Level of 4 or 5. Of those, 40 were Level 4 groups and 21 were Level 5. The survey goes on to mention that as of 15-Feb-2000, some 71 organizations reported that they were Level 4 or 5. Those that gave their consent are listed in Apendix A [cmu.edu].

  • I think the last US fighter not to rely on computer controls was prbly the F-15. To be inherently unstable is a feature...not a bug. Wasn't it a software flaw that caused the prototypes of both the F-22 and the Saab Griffon to crash on landings? Although the F-22 was a walk away crash and fire...the Griffon was a bit more spectacular if I remeber it right.

    The B-1B has seven of the GCUs that the Shuttle has. So it's couldn't fly at all either. The FA-18E has a number of PowerPC chiped flight control computers...the FA-18E is the first US fighter to use Cat-5 Ethernet to connect the computers togeather instead of obscure military cabling...at least thats what I read.

    IMHO the biggest problem with the F-16 is the fact that it has a single engine. If you look single engine jets crash more than twice as much as twin engine jets. Single Point of Failure will get you every time.

  • by cheeto ( 128748 ) on Friday May 19, 2000 @04:31AM (#1062079) Homepage

    I work in the Flight Software (FSW) Verification group in Houston.

    The shuttle FSW code is written in something called HAL/S. This stands for High-level Assembly Language / Shuttle. The language was designed to read like mathematics is written. Superscripts like vector bars are actually displayed on the line above, subscripts like indices are displayed on the line below. Vectors and matrices can be operated on naturally, without looping.

    We are the only ones with a compiler, because we wrote it ourselves.

    Here's a sample:

    EXAMPLE:

    PROGRAM;

    DECLARE A(12) SCALAR;

    DECLAREB ARRAY(12) INTEGER INITIAL(0);

    DECLARE SCALE ARRAY(3) CONSTANT(0.013, 0.026,0.013);

    DECLARE BIAS SCALAR INITIAL(57.296);

    DO FOR TEMPORARY I = 0 TO 9 BY 3;

    DO FOR TEMPORARY J = 1 TO 3;

    A =B SCALE + BIAS;
    I+J I+J J

    END;

    END;

    CLOSE EXAMPLE;

    I couldn't get the subscripts to line up, but you get the idea.

  • I work for SAIC and we use the same processes (SEI) in our software development. Our clients include banks, airlines, brokerages, the IRS, etc. We made >$5 billion last year alone doing this. It costs a bundle to set it up initially and requires a ton of training to make sure people do it right but the result is outstanding software and very, very few all-nighters.
  • The software is written in a language called HAL/S (High-Level Aerospace Language Shuttle). It was originally developed by Intermetrics.

    The Shuttle was flying before Ada had been developed.

  • That is clear, that the more you know about the customer's wishes, the better soft you'd write. But how many programmers really studies computer logic, software engeneering, project management?

    Not as many as should have.

    Every time I read a history of a programme and find a line "completely re-wrote the code", I begin having second thougths about how really good the programme is.

    With the ever-faster-growing complexity of programmes, it becomes more and more difficult for humans, even aided with computers, to keep track of the project. But if you teach everyone how the computer logic works, the programming would become only about writing the necessary simple code (ha! hackers, get this!).

    Would the next generation programmers write in "logic language" instead of C++? Who knows, but it would IMHO make the programmes robuster and even better.

  • Bill Pate, who's worked on the space flight software over the last 22 years, says the group understands the stakes: "If the software isn't perfect, some of the people we go to meetings with might die."

    I can see many Dilbert-fans wondering if that is a bug or a feature.

  • System Requirements:

    1 Space Shuttle Endeavor
    1 Launch Pad
    1 Houston Mission Control Station
    4 Astronauts

  • I had the time.
    I had the paitience.

    Well this is cool. I proves that you can't write perfect software*. However you can come close.
    If only everybody would do it this way, not just some cool company.
    This probably even produces better software the "open source" way. OpenBSD is the only open software project that comes close, it really is kind of sad. People need to relax to do it right, down with stress!

    Well if you met someone who works at some dot com ( well there quite a lot of them here in Stockholm ) they are always really really stressed. That might impress the stockmarket but not really anyone else... That is the reason everybody talkes about "When will the bubble bburst?"and I can tell you this:

    The "bubble" ( which consists of overstressed people ) will burst very soon. The more relaxed people will take it easyily.

    * Well you can, but Hello World! isn't really THAT
    complex.
  • This software is bug-free. It is perfect, as perfect as human beings have achieved. Consider these stats : the last three versions of the program -- each 420,000 lines long-had just one error each. The last 11 versions of this software had a total of 17 errors. Commercial programs of equivalent complexity would have 5,000 errors.

    How can they be sure it's bug free? If the last 14 versions had 20 errors, did they think it was bug free each time - only to find more bugs? At 500k lines of code you can't prove it all mathematically and human checkers are.. well human.

    One way to measure how many bugs your code has is to purposefully introduce a bug and tell people to find it. Then you count how many new bugs they found along with the bug you introduced and scale that by the lines of code you have. But this technique won't work if you one have 1 or 2 bugs that people are actively looking for in the first place. So, my question is - how can they be sure it is bug free?
  • by Ikari Gendou ( 93109 ) on Friday May 19, 2000 @01:14AM (#1062095)
    So people don't see lines like this in the code:

    #Shuttle Waste Dump
    #
    #I dunno WHY this works, but it does!
  • by Harald74 ( 40901 ) on Friday May 19, 2000 @03:46AM (#1062100) Homepage Journal

    I can almost hear the moans from the pizza-and-coke crowd whem they read this: "Where's the fun? Where's the creativity?". But they're under the mistaken assumption that putting lines of code into the editor is the only fun thing about developing software.

    IMHO, software development is full of fun activities. What about analysis and design? In my experience, that's where the creativity really comes into play. Just talking to the customer, understanding the problem and making a working design is really difficult, and hence rewarding when you pull it off.

    And what about the process itself? Software development is a young dicipline, where individuals and small groups really can make an impact. Nobody really knows how to make good software. Maybe you'll be the one to find out? As the man says, in the shuttle software group, people use their creativity on improving the process.

    And last, but not least, I bet those guys have a really good feeling when they talk to the customer after delivery. Not like some people I know, who just hide. ;)

    If you can't see the fun of these other activities, maybe you shouldn't be working in this field...

  • One way of increasing the reliability of software is to use n-version programming, whereby you implement several versions of the software, written by different people, and then create a voter system that constantly compares the data of each program and forwards the consensus one. Even if none of the programs agree, the voter 'knows' that something is amiss and can alert the pilot/engineer/whatever. I'm doing my PhD on this, and I know that NASA has implemented quite a few n-version systems, as well as the more tried and trusted multiple-redundant hardware. I heard somewhere that the space shuttle code costs the equivalent of $100,000 a line (feel free to tell me I'm wrong if you know the 'true' figure) so it might be worth considering. Certainly a number of prominent academics reckon that you can get a 45:1 improvement in a software system by implementing 3 channels as opposed to a single good system. Blah, anyway, that's my $.02 worth.
  • I was wondering about the same thing! But while Ada certainly is superior to many other languages, it's not a panacea -- the exploded Ariane 5 rocket mentioned in the article failed because of an unhandled exception in some Ada code which was simply taken over from the Ada 4 (3?) control system without checking it against the new requirements. At least that's what they told us in the first lecture of our software engineering class ;-)
  • by jilles ( 20976 ) on Friday May 19, 2000 @04:45AM (#1062105) Homepage
    Lets see,
    - half a million LOC (that's small)
    - under development for 20 years
    - new requirements are avoided at all cost

    So it is a small, long lived project with nearly unlimited budget. No wonder they can afford to have such a process in place. But now realistically, how long does it take to set up such a project from scratch. How about having a customer who does not know what he wants. How about deadlines of less than 10 years from now.

    I honestly believe that this way of delivering software is optimal for nothing else but long lived, multi billion dollar projects. In any other case you'll end up with something that is delivered years to late, indeed matches the requirements of 10 years ago and is close to useless.

    Unfortunately many software companies are in a situation where they can't afford to wait for perfect software. Take mobile phones as an example. Typically these things become obsolete within half a year after introduction. The software process is what determines time to market. Speed is everything. If you can deliver the software one month earlier, you can sell the phone one month longer.

    Of course testing, requirementspecs and software designs are usefull for any project but it's usually not feasible to do it properly.
  • Proof of working high level language isn't even enough on a single processor machine. Optimizations in compilation use techniques that must be accounted for in some very special cases (like some uses of shared memory, function calling conventions, and row-major v. column major problems).

    The only way to know for certain is to either code directly in bits or be (extremely) intimate with the compiler and linker. At that point, a proof will be correct.

    The way the shuttle seems to work is you better have a damn good reason to write/alter/delete/modify/worship a line of code. This will catch the majority (~99.95% by their reports of errors v. standard commercial software) of reasonable errors.

    They identified the weakest link in the chain of software engineering and have fortified it quite well.

  • The problem was that the software was reused from an earlier Ariane launch vehicle without rechecking the requirements and assumptions to see if they were still valid. The flight dynamics of the Ariane-5 were different enough to trigger an overflow in the software. Sort of like taking the engine control software from a Toyota and dropping it into a Porsche.
  • 've heard that fighter aircraft have inherently unreliable software

    If that's so, it's an interesting illustration of the overall system's requirements imposing lower quality standards on components of that system.

    To wit: the article (I presume; haven't read it, but have read similar ones on the same topic) discusses the importance of achieving a 100% quality rate on a given chunk of software.

    Now, that software is merely one component in a much larger system.

    Actually, these larger systems nest "outwards". I.e. the shuttle itself is a larger system than the software it contains, but so is NASA a larger system than the shuttle; so is the US government larger than NASA; so is the USA larger than the government; so is the planet's population larger than the USA; etc.

    In this case, there are specific reasons I can suggest account for the 100% quality requirement that might otherwise go unnoticed:

    • Failure resulting in death of participants, and especially of non-participants (humans), is not an option.

      However, failure resulting in not launching, not even building it in the first place, especially not building it within some timeframe, is an option. That is, failure of the "commitment to quality" approach to actually deliver the component on a "timely" basis is an acceptable option.

    • The world generally will admire a program such as the space shuttle less if it crashes and burns frequently, killing/maiming people and destroying equipment, than if it succeeds on the extremely rare occasions on which it is tried -- perhaps even less than if it never happened in the first place.

    • A delay in a shuttle launch costs, overall, far less than the cumulative risks of premature shuttle launches. (Challenger demonstrated that.)

    (Yes, there's some overlap there, but these are subtly different points, that might apply independently in other projects. E.g. a not-publicly-visible project might have no risk of embarrassment should it fail in one way vs. another, but have a huge risk of $$$ loss.)

    Compare these elements to fighter aircraft, where the software is part of a somewhat different set of larger systems:

    • The deaths of participants and non-participants is expected by most everyone of this sort of system and the activities around which it revolves.

      On the contrary, the sorts of failures that result from failing to launch a fighter plane, or never having designed it in the first place, are generally not so well-tolerated.

    • The world will likely fear a non-existent fighter plane, even one that has 100% success in its flight-control software (doesn't require rebooting) but is launched extremely rarely (it's hard to build) or too late, far less than it will a large fleet of existing, dangerous fighters that have even a 10% "kill" rate of its pilots per year.

    • A delay in a fighter-plane deployment can literally cause lost wars. In that sense, the loss of pilots due to poor design is a calculated positive compared to the loss of a nation's (and/or its peoples') freedom.

    Of course, I'm making pretty much everything up, above, so don't bother arguing details or interpretations with me -- I have no idea whether they're correct or not.

    But, they're probably correct enough to illustrate why it's probably okay for us to be using highly buggy computers on a poorly designed (for the way it's being used now, anyway) Internet rather than, as another post on this thread put it, using typewriters and plain paper.

    Not that there aren't wonderful advantages to deploying 100% correct software components in a large-scale, much-buggier system! "Creeping quality" is not a bad thing at all, since it allows people working on the system to worry less about various portions of it as they try to debug it.

    But, the effort to deploy such perfect components may well outweigh the utility of doing so, overall, given the pertinent timeframe.

    In particular, when trying to deploy such a perfect component in a large, buggy system, it can be hard figuring out which component can be made so "perfect" and still be useful in that (presumably speedily-evolving) system by the time it's ready!

    So maybe it's appropriate to view almost everything we deal with on the Internet as a very early alpha-stage prototype after all. ;-)

  • everything revolves around the 'process'. The result is determined by the process.
    The problem is that often the process becomes primary, and the reasons behind it get lost.

    I'm working on a large NASA project now. I have determined that the purpose of this project is not to produce a working software system, but rather to produce a wall full of loose-leaf binders of incomprehensible documentation that no one will ever refer to again.

    The process says we must have code reviews - great! But instead of being an analysis of the logic of my code, it turns into a check against the local code formatting standards - "You can't declare two variables with one declaration, use int a; int b; instead of int a,b;" (yes, that's an actual standard around here) instead of "Hey, if foo is true and bar is negative, you're going to dereference a garbage pointer here!"

    The forms are observed, but the meaning is forgotten, like Christians going to church on Sunday then cutting people off and flipping them the bird on the drive home.

    "Process" won't save us. Which doesn't mean that a certain amount of it can't help, but there is no silver bullet. [virtualschool.edu]

  • So the way that Microsoft Flight Simulator keeps crashing is actually a feature?
  • by kzinti ( 9651 ) on Friday May 19, 2000 @01:16AM (#1062116) Homepage Journal
    I happen to work just down the hall from the guys who maintain and upgrade the shuttle Flight Software (FSW), and I can tell you they have a rigorous design, inspection, and test sequence that they go through before they fly new or modified code. The story around here (which I have no reason to doubt) is that the FSW team was one of the first SEI level-5 certified shops in the nation.

    I can also tell you that NASA avoids having to make unnecessary changes to the FSW. For example, the new "glass cockpit" recently discussed here on Slashdot: when these upgrades were designed, they chose to design the interface to the new display modules to exactly mimic the interface to the old intruments. In other words, they are true plug-and-play replacements; one significant reason for this was so the flight software didn't have to be modified.

    Likewise, people often ask why the shuttle continues to use such antiquated General Purpose Computers: slow, 16-bit machines designed back in the seventies. There are many reasons, but a big reason is that new hardware would almost certainly require massive changes to the flight software. And rewriting and recertifying all that software would be a huge task. The current FSW works reliably; if it ain't broke...

    Huzzah! As I type, we just launched Atlantis. Go, baby, go!

    --Jim
  • Of course, not everyone [voodooextreme.com] can even write a perfect Hello World implementation. *Sigh*
  • by Anonymous Coward
    Time and again we hear about the requirements for 100% reliability. But most of us are simply paying lip-service to this idea. Formal Methods and techniques which can PROVE a program is BUG FREE have been around since the late 60s, but hardly anyone is using them.

    If everyone would simply use VDM/Z or Larch/CLU for all their development work, it would be much easier for us to prove our software is correct, and then all bugs would be a thing of the past.

    It really is that simple. Don't these people remember what they were taught at college ?

  • Notice the article similar to ISO (International Standards Org), everything revolves around the 'process'. The result is determined by the process. I use to work for a company that had a documented process for everything...from software devlopment right through to filling out your wage timesheet! I think the important thing to note is that it all depends on the 'culture' and type of the organization. If people accept this style of operation then it's great. For a oraganization that has to program software that directly deals with lives at stake, there must be a 'process' to ensure the s/w written perfectly (and tested).

    I have come across fellow works where they absolutly hate this type of practice... well they probably best suited for development in non-critical life threatening systems.
  • "C is a great, if complicated language. It's simple, yet can get complicated very easily..."

    It's complicated.

    It's simple.

    It's complicated again.

    The article gets worse from there.
    --
  • by dnnrly ( 120163 ) on Friday May 19, 2000 @01:22AM (#1062121)
    Some of my most succesful programs (read, they actually worked or there abouts) came about because I was in a funny mood and decided to actually plan it out. From what I hear about in the real world, some (but by all means not all or even most) programmers look down on clients just because they don't know much about programming. They assume that just because they have a certain expertise over others that they somehow know more than them in general.
    The good thing about the way software is written here is that the requirements are written down and sorted out before they even do the planning. How many prgrammers, groups, firms etc. can say that. I will admit, though, that a major problem is changing requirements. Something that just happen in the same way for NASA. It might just be better if people decided to wait a bit before jumping in to the programming. They'll save themselves more time and money in the long run.
  • Well my professor at college asked this question:

    "Would you rather fly on an airplane with software that has been proven to be correct, or on an airplane with software that has been rigorously tested through actual flight time?"

    I think the answer is clear.
  • by El Cabri ( 13930 ) on Friday May 19, 2000 @05:28AM (#1062126) Journal
    After the Ariane 5 maiden flight failure in '96, the software was tested by an academic lab in France with heavily mathematical formal methods. The arithmetic exception that caused the $1b explosion was proved to be possible, along with several other 'dangerous' operations. Formal methods are now taken much more seriously and the incident is invariably told as an incentive to students in majors that relate to the mathematical aspects of programming.

    Another formal system originated in France is the Methode B, that consists in progressively refining logical statements that apply to the desired behaviour of your program (like assert() you put before and after the body of a function) into the implementation of the behaviour :

    http://estas1.inrets.fr:8001/ESTAS/BUG/WWW/BUGho me/

    An academic formal methods team that checks the Ariane 5 software:

    http://pauillac.inria.fr/para/eng.htm

    http://pauillac.inria.fr/para/eng.htm
  • by MosesJones ( 55544 ) on Friday May 19, 2000 @01:34AM (#1062141) Homepage

    I never quite understand why it is an act of macho bravado to work all night and live off pizza. It indicates two things 1) A badly run project and 2) poor maintainability in the code.

    In one of my previous incarnations I worked on display systems for Air Traffic Control, where the quality level was also very high, where the performance requirements were exacting and the specifications precise.

    Some would think that this means simple and boring... Of course not. Having to display a track from reception at the Radar to the display in 1/10th of a second isn't easy by any stretch of the imagination, and to do it so it works 100% of the time means you have to understand the problem properly rather than coding and patching.

    If only more projects worked like that then there would be a lot less bugs in the world.
  • All too often it's a fickle client that causes a program to become a mess... With each and every couple weeks they want a new feature. Then they get the first revision in their hands and they want something completely different. It's not that the programmer never gave them the time of day to figure out what they want, it's just that they are not engineers such as those at NASA that can write a tight spec on what is _needed_ as opposed to what their own whisical mind thinks would be cool - err I meant "productive".
  • Congratz,

    Something I've always wondered about is at what point to you figure you have done enough planning and start to work on the actual project? What is their time-line dependant on; 100% error free pseudo code before everything gets actually implemented or after they put it through a set number of readings?

    I think a lot of the reasons most companies don't go through this extensive pre-stage process is because they fear the project will get lost in a black hole of redesign and doublechecks.

    Also where can one find the Software Engineering Institute ( SEI ) specs?

Solutions are obvious if one only has the optical power to observe them over the horizon. -- K.A. Arsdall

Working...