Comment Re:Downtime [Offtopic] (Score 1) 85
The coward might laugh at your storage cluster, but I'm laughing too, because I've heard this song before.
And every time I see another one of these, I am reminded why I run standalone replicas with the replication right up at the application level with integrity checks to ensure that a failure in one place doesn't wipe other things.
http://blog.fastmail.com/2014/...
People are right to laugh that a single bad disk can take your site offline for hours because the storage cluster software screwed up. I don't use heartbeat any more, because we found it was LESS reliable than our servers, and we had more downtime because heartbeat screwed up. Clusters and SPOF SANs fall right into the same basket in my mind - a single place where everything breaks.
I feel for your ops team, but like the others - I hope they learn the points-of-failure lesson from this.