The key to achieving high uptime ...

... is actually quite simple: You keep your hands off the systems. Period.

In detail, you plan, install and _test_ your setup before it enters production. You make sure that you can survive whatever you throw at it wrt. errors and incidents. You then figure out how much downtime you are allowed to have according to SLA. You then divide this number into equal sized maintaince windows together with the customer. And then you adhere to these windows! No manager should ever be allowed to demand downtime out of band. Period. In between you basically minimize your involvement with the systems and plan your activities for the next scheduled closing window.

And you ofcourse only deploy stable, true and tested versions of software and operating systems. And even though your OS supports online capacity expansion on the fly, you really shouldn't use the capability unless you absolutely have to. Instead you plan ahead in your capacity management procedure and add capacity in the closing windows. And you do not test and rehearse failures! It only introduces risks ... besides that you have already tested and documented them. And as you haven't changed the configuration, there is no need to test again.

So in essence. Common sense will easily yield 99.9%. Carefull planning and execution will yield 99.99%. The really hard part is 99.999%... /zensonic

gps? on the ocean floor?

Trying to solve one problem at the time ;-)

This problem was a device burried at the oceanic floorbed that took 2+ years to recover. The 'i am here' distress signal consists of 30 days worth of 'pings' that in itself requires a probe far down to be able to hear the pings.

But you are right, if it is bolted to the airframe, then a big flotation device is required.

