I actually wasn't assuming incompetence, the hallmark of many SysAdmins is being understaffed, overworked and underpaid, and thus do not have the resources to properly test all backup and redundant systems.
As consultants and contractors in the area of System Administration, you get let go if anything like this was ever to happen. This is why they charge a little bit more.
Whatever happened, it failed. A good lesson for next time. Not knowing exactly the cause, but it is safe to safe there were too many eggs in one basket. Multiple geopgrahically diverse load-balanced DNS servers? Why was there an overheating problem in the first place? Only one air conditioner?
Wikipedia has had a few failures, not all their fault. In 2006 Cogent pulled a block of IP addresses that were leased to Wikipedia.
It is impossible to enjoy idling thoroughly unless one has plenty of work to do. -- Jerome Klapka Jerome