I'm not sure how familiar you are with safety critical software and systems (you see it all the time in aviation), but there's actually a pretty well defined process for the entire thing. I'll make a really poor attempt at summing it up:
- A hazard analysis is performed on the system by various engineers (and occasionally even a 3rd party is brought in for peer review). There are a multitude of different ways to go about it, but eventually you end up with a long list of ways the product could fail, with a probability and severity assigned to each failure case.
- After this analysis, everyone comes up with ways to mitigate each of the risks. Removing the risk entirely is preferred, followed by passive safety mitigations are preferred, followed by active, followed by monitoring with alarms. Probabilities and severities are updated accordingly.
- Software is then analyzed in a similar way, except that no probability numbers are assigned. Mitigation steps for software range from self checks (a common example might be to read a sensor on a scale of 0-5v, then read a separate sensor using a separate function that measures the same thing but on a 5-0v scale), to having multiple CPUs of different manufacture running the same code in lockstep and checking each other on the fly. What methods are picked will depend on the hazard analysis and what severity has been assigned to each of the risks
Then in order to be safety certified you need to show documentation that all of those previous steps were followed, as well as show a software process in which:
- There's a clear set of requirements that are traceable to the hazard analysis
- Every line of code is traceable back to those requirements
- There's a set of test cases that are traceable back to the lines of code and the appropriate hazard analysis/requirement
- Documentation showing that all of these test cases have been run (sometimes a 3rd party is brought in to verify this)
Then after all of that is finished, the project managers look at the final risk analysis and sign off on it. They're the ones ultimately responsible for if it fails. In the event that it does fail, they have a stack of paperwork about a mile high to go back and trace how the failure occurred (note: this is the opposite of what Toyota had during the whole unintended acceleration thing). The idea is that in the unlikely event that your software fails and kills someone, you can prove in a court of law that appropriate measures were taken to assess and account for any possible risks.