I've heard and read of enough problems with restoring complex transactional data structures that I can imagine this situation is far more complex than many believe.
What I'd love to see is a full post-mortem, a lesser version of that the NTSB does for airplane crashes.
Google's been doing some of this for their (too frequent) outages, but they've very high level -- typically something about a system reconfig overloading a router. The Cloud user base needs a far greater level of error exposure.
It took openness and in depth analysis to make air travel safe.
The Cloud won't be safe until we learn the same kind of lessons, and apply what we learn in new and improved systems.