I work with and build them all the time. Mind you I realy no longer think you can get any complex service into 5+ 9's without the application being part of the solution, reliability is not a bolt on thing it's baked into the design.
The story is laughable, fault fans, buggy firmware on the nexus 3k's. Those are TOR switches should be extremely easy to replace and always used in redundant setups. They probably got suckered into VPC and similar, guess what I dont care what they say all stacks share a single failure domain, dont get me wrong they are great but you need at least A+B stacks. These guys cited cable issues, it realy sounds like PHB;s trying to blame the vendor because they picked the lowest bid not the right one and failed to test every failure mode they could come up with before going into production. 7 9's work is hard and your never going to just bolt it onto somebody else's design.