Comment Re:Do I get at least a pair of rubber gloves? (Score 1) 135
Don't want to reduce your smug, but we're doing just this - restart services from the failed component, service the failed resource on a non-critical timeframe. The small shop with a half-dozen server boxes doesn't give a damn about cooling costs or this level of service, for the most part. If they do, they're likely going to someone else to satisfy that requirement, not doing it in house.
I've got stack of servers in my datacenter that are allocatable on demand. Any unused server blade is a potential spare. If a production blade tips over with a CPU fault, memory error, or similar crash, its personality (FC WWN's, MAC's, boot and data volumes, etc.) are moved to another blade and powered on through an automated process. Since the OS and apps live on the SAN, both VMs and dedicated server hardware can be abstracted away from the actual services they provide.
This is a product my company's selling to the market at large right now, and that I designed. Any of our IaaS customers can take advantage of the redundancy and fault tolerance built into the system. Even the six-server small IT shop.
Even then, a small IT organization can easily virtualize and provide some level of HA services in hypervisor clusters now. It's just not that hard anymore. Take the handful of servers you're running on now, replace them with an equal number of nodes in a VMM cluster, and go to town. Any of those systems fails, shift the load to the other nodes and effect repairs.