Now, if you recall what happened with AWS in April, they had a low-bandwidth management network that all of a sudden had all primary EBS API traffic shunted to it. This was caused by a human flipping a network switch when they shouldn't have. Something like this is not something that happens all the time, has little, if any diagnosable features, is not well-defined to have a proper workflow attached to it, and needs human engineers to correct. This is an example of a complex, large-scale problem.
I wonder when this army of automated-problem-fixing engines will encounter a corner case its masters never considered and how it will react.
I give the ops guys at Facebook a lot of credit for managing such a gigantic workload with just a (relatively) few, very smart, people. Amazon also has a lot of smart people who have been working on EBS (in one form or another) since before Facebook was founded. These systems just interact in unpredictable ways when they get out of their comfort zone.
Systems so complicated they require self-managing management systems are going to have some interesting failure modes, to say the least.
The PCI-SIG release is here. The electromechanical specification is also due to be released shortly:The new released doubles the signalling rate from 2.5Gbps to 5Gbps. The upshot: a x16 connector can transfer data at up to around 16GBps.
"The companion PCI Express Card Electromechanical 2.0 specification is currently at revision 0.9, having completed its 60-day member review. The PCI-SIG anticipates that this specification too will be released in the near future.
People are always available for work in the past tense.