Comment ROC, Slashdot, and Education (Score 1) 224
As a professor, I can't help but think that some Slashdot responses are disguised pleas for help. Therefore, let me offer some guidance:
* The factor of 10,000 performance improvement in 20 years is not the focus of the article, but if you are interested in where it came from, please see a book. On page 3 of Computer Architecture: A Quantitative Approach, 3rd edition, (http://www.amazon.com/exec/obidos/ASIN/1558605967 /), Figure 1.1 shows a performance improvement of a factor of 1.58 per year between 1984 and 2001 using a few generations of the SPEC benchmarks. That is a factor of 2400. If we add 3 more years at the same rate, we get a factor of 9600. QED.
* Backup is one of the 3Rs of system administrator undo that we are pursuing, but it is not all of them. The 3Rs are Rewind, Repair, and Replay. Backup gives us Rewind, but not Repair or Replay. It is also different from ACID transactions, which operate at a very low level of the system. We are interested in undo of higher level "verbs" that correspond to high-level user actions. If you want to learn about our undo ideas before you need to reply, see http://roc.cs.berkeley.edu/papers/sigops-ew2002-un do.pdf.
* TMR stands for Triple Modular Redundancy, which is an effective but expensive technique to protect from hardware failures. If hardware failures were the leading problem, then TMR would be the path to follow. Hardware errors are responsible for only 15% of the outages, as those who have read the Scientific American article already know. TMR and systems like HP's (nee Tandem's) NonStop do not address operator error.
* We are focused on Internet style applications that are considerably above the operating system, but the problems we have documented about operators being a major source of outages include all systems, including Linux systems. To learn about hard to find data about causes of failures before you reply, please see http://roc.cs.berkeley.edu/papers/usits03.pdf.
* We agree that the telephone industry did many fine things to make communication dependable, and that there is much to learn and emulate from them. If computers were as reliable as telephony, we could be much prouder of our field.
* Our focus is on Internet services, the so the cost of ownership is probably higher for such servers than for PCs. I wouldn't be surprised, however, that if you multiplied a typical white-collar hourly pay rate times the average of number of hours that one spends administering a PC, you may get similar results.
* For those of you who were not using computers in 1983, that is the era of open source UNIX software (BSD) on 32-bit computers (VAX). Sound familiar? Punched cards had been passé for quite a while in 1983.
* For those wanting to read something with more technical depth, see http://roc.cs.berkeley.edu/papers/ROC_TR02-1175.pd f. For the Slashdot readers who only have time for a quick overview, see the Scientific American article www.sciam.com/article.cfm?chanID=sa006&articleID=0 00DAA41-3B4E-1EB7-BDC0809EC588EEDF). For those who only have time to read Slashdot, may God protect you on your journey towards technical obsolescence.
* The factor of 10,000 performance improvement in 20 years is not the focus of the article, but if you are interested in where it came from, please see a book. On page 3 of Computer Architecture: A Quantitative Approach, 3rd edition, (http://www.amazon.com/exec/obidos/ASIN/155860596
* Backup is one of the 3Rs of system administrator undo that we are pursuing, but it is not all of them. The 3Rs are Rewind, Repair, and Replay. Backup gives us Rewind, but not Repair or Replay. It is also different from ACID transactions, which operate at a very low level of the system. We are interested in undo of higher level "verbs" that correspond to high-level user actions. If you want to learn about our undo ideas before you need to reply, see http://roc.cs.berkeley.edu/papers/sigops-ew2002-u
* TMR stands for Triple Modular Redundancy, which is an effective but expensive technique to protect from hardware failures. If hardware failures were the leading problem, then TMR would be the path to follow. Hardware errors are responsible for only 15% of the outages, as those who have read the Scientific American article already know. TMR and systems like HP's (nee Tandem's) NonStop do not address operator error.
* We are focused on Internet style applications that are considerably above the operating system, but the problems we have documented about operators being a major source of outages include all systems, including Linux systems. To learn about hard to find data about causes of failures before you reply, please see http://roc.cs.berkeley.edu/papers/usits03.pdf.
* We agree that the telephone industry did many fine things to make communication dependable, and that there is much to learn and emulate from them. If computers were as reliable as telephony, we could be much prouder of our field.
* Our focus is on Internet services, the so the cost of ownership is probably higher for such servers than for PCs. I wouldn't be surprised, however, that if you multiplied a typical white-collar hourly pay rate times the average of number of hours that one spends administering a PC, you may get similar results.
* For those of you who were not using computers in 1983, that is the era of open source UNIX software (BSD) on 32-bit computers (VAX). Sound familiar? Punched cards had been passé for quite a while in 1983.
* For those wanting to read something with more technical depth, see http://roc.cs.berkeley.edu/papers/ROC_TR02-1175.p