Comment Re:Complexity arising from simplicity (Score 1) 74
Now, if you recall what happened with AWS in April, they had a low-bandwidth management network that all of a sudden had all primary EBS API traffic shunted to it. This was caused by a human flipping a network switch when they shouldn't have. Something like this is not something that happens all the time, has little, if any diagnosable features, is not well-defined to have a proper workflow attached to it, and needs human engineers to correct. This is an example of a complex, large-scale problem.
I wonder when this army of automated-problem-fixing engines will encounter a corner case its masters never considered and how it will react.
I give the ops guys at Facebook a lot of credit for managing such a gigantic workload with just a (relatively) few, very smart, people. Amazon also has a lot of smart people who have been working on EBS (in one form or another) since before Facebook was founded. These systems just interact in unpredictable ways when they get out of their comfort zone.
Systems so complicated they require self-managing management systems are going to have some interesting failure modes, to say the least.