Uptime Realities in the Internet World 357
schnurble writes: "My former boss has written an interesting article on the realities of uptime in the Internet World. It poses the idea that four and five nines of reliability are too expensive to be realistic, especially in the post dot-bomb economy. It's an interesting read, especially if you answer to an 800lb gorilla for outages and uptime issues."
Uptime (Score:5, Funny)
Re:Uptime (Score:3, Funny)
Re:Uptime (Score:4, Informative)
I'd also say impractical. 5 nines is 99.999% availability, i.e. can be down for 1 second every 100000 seconds, or 27.77 hours. That gives approximately 6 seconds of downtime per week.
Even if all that weeks downtime came at once, six seconds is little enough that most users would just hit refresh and never even notice. Besides which, most web servers are taken down for maintenance tasks, upgrading software or disk, etc... Chances are even restarting the web server would take up more time than your maximum weekly downtime.
Given that over the course of a month (which is the billing period on most ISP lines), you only have 24 seconds of possible downtime, it's very unlikely that the ISP will be able to meet that target. Pretty much *any* fault would take longer than that to fix, so any company offering a refund if the SLA isn't met is just asking for trouble.
Server vs Service (Score:3, Informative)
You are not making the distiction between "server uptime" and "service uptime". When people talk about 99.something% uptime, they are ususlly refering to "service uptime". With proper hardware (redundancy etc ..) you can reboot servers, change disks, memory and even routers and it won't cost you even 1 second of "service downtime".
Re:Uptime (Score:3, Funny)
You could always try the Google Mirror [alltooflat.com]
Re:Uptime (Score:2, Funny)
Re:Uptime (Score:2)
Should be retitled: (Score:2, Redundant)
Uptime Realities in the Slashdot-linked World
Customers want it, but don't understand it (Score:5, Insightful)
How many engineers out there have heard the marketing / sales 'it has to be always available' and priced out an infrastructure accordingly.
Even recently I'm working with a customer who wants a compromise between price and availability - but it still needs five nine's
Availability is infrastructure plus process. You need to have the supporting process to go along with the hardware - maintenance schedules, change management (well FCAPS in general), etc. It's not just a big box.
Re:Customers want it, but don't understand it (Score:5, Funny)
$999.99
Problem solved.
Management want it, but does it understand it (Score:2)
Re:Management want it, but does it understand it (Score:3, Informative)
I really mean that - Good Luck.
Re:Management want it, but does it understand it (Score:3, Interesting)
Re:Management want it, but does it understand it (Score:2)
Six Sigma is a maximum of 3.4 defects per million. So converting to uptime would be.
Uptime percent = 100*(1 - 3.4*10^-6) = 99.99966
After we take off the literal filter, I'd have to say that was a pretty funny comment. Just hoping to add a little connection to the Six Sigma to Five Nines relationship.
Re:Customers want it, but don't understand it (Score:3, Interesting)
Re:Customers want it, but don't understand it (Score:4, Insightful)
If you're a business, your money is far better spent improving the user experience rather than working on buying redundant-everything, building the support infrastructure, and incurring the extra overhead of the tedious and careful processes needed to obtain 5 nines (and 4, and even to a degeree 3 nines).
If your site sucks and no one visits, it doesn't really matter if it's down...work on building something reasonably reliable that is very compelling to your users; that's money much better spent...
Close, but it depends (Score:5, Interesting)
When we launched, 3 days of downtime a month was considered okay. It was considered a better choice than spending an extra $5k on hardware for redundancy. Well, when the site broke $40k/month, we immediately decided that that was no good and invested in the redundancy.
The site has had a few 15 minute outages over the past 6 months, and a 1 day outage over a holiday weekend (not a big deal). However, if the site doubles in revenue again, downtime is becoming less acceptable, and we'll drop $10k to avoid it.
If your site sucks and no one visits, downtime doesn't matter. If you are making lots of money, downtime does matter. $10k on hardware is worth it if the downtime would cost you $25k?
Alex
Re:Close, but it depends (Score:5, Funny)
codesta.com uptime (Score:3, Funny)
Re:Customers want it, but don't understand it (Score:5, Informative)
One word to clients... "Outsource"
Maintaining backend infrastructure with a 5 9's service level agreement really is prohibitively expensive for all but the largest businesses. Especially if they are not a tech company.
The level of engineering that goes into providing true 5 9's service is extraordinary. Also, some military contracts actually require 6 9's!! (Let alone completely seperate networks for classified data).
I'm actually in the design phase of a data center which requires 5 9's (so we can take on those who decide to outsource). Redundant generators, redundant UPS, redundant routers, redundant HVAC, two seperate cable runs from different sides of the building, two connections to the power grid, etc., etc....
And thats just the physical infrastructure! Now you need to develop, or integrate the software to completely cover every aspect of your operations. Anything from cable tagging, to ticketting systems, to emergency procedures. After you build all the infrastructure, take that price and double it... that's how much you will be spending to develop all of those operating procedures. Which, at that point, go get ISO certified - since you've already gone above all the requirements.
If I had to take a guess at a physical cost, $250-300 a square foot seems pretty close (around here anyway). And that only gets cheaper if you are looking at a facility greater than about 10000 sq. ft.
Unless of course, only marketing has those 5 9's!
my boss.... (Score:5, Funny)
Re:my boss.... (Score:2)
But now that his email is posted on the front page of slashdot, maybe they'll just split the difference between being fired and getting a raise.
-schussat
No Grudge? (Score:3, Funny)
Nice, and you go after your ex-boss by getting his article slashdotted!
In my dept... (Score:4, Funny)
The boss didn't do for, though. :(
9 9s (Score:5, Funny)
Our web server does about 4 9's, which is a downtime of about 8 hours a year, I think. I really suck at math though. I mean it.. I'm so bad at math I have no idea if thats right. I said "well theres 8544 hours in a year, so 8 divided by that is 0.0009, so thats about 4 9s. I think. 8 hours of downtime isnt that bad. I think the next step up from 8 hours of downtime is essentially those megacorps that have redundant systems, and sirens go off and people die when their server goes down for under a second. In fact, I think if their server actually went down for more than a second, some sort of structual damage to the building hosting it is the only likely scenario. Course, that's closer to 7 9s. I cant figure out how long any of the other 9s are cause I only knew what our average downtime is, and could do the math that way only. Wow, its really hot in here.
Could someone with an 8th grade math education please post the amounts of downtime 1 through 9 9s are, please?!
Re:9 9s (Score:5, Informative)
2 nines: 99% availability, or 88 hours of downtime per year
3 nines: 99.9% availability, or 9 hours of downtime per year
4 nines: 99.99% availability, or 53 minutes of downtime per year
5 nines: 99.999% availability, or 315 seconds of downtime per year
6 nines: 99.9999% availability, or 32 seconds of downtime per year
7 nines: 99.99999% availability, or 3 seconds of downtime per year
Beyond that, it doesn't much matter.
Re:9 9s (Score:3, Interesting)
Well, beyond "7 nines" you would start talking about 100% reliability. So you start with contingency plans for a terrorist attack on
one data center at the same moment of a quake under another data center. Now you're in the realm of needing your own redudant power plants, and probably network infrastructure that does not
really exist yet.
So in reality, your guarantee of "9 nines" or, effectively ZERO downtime for the life of the product, really would be specified in terms of compensation and not technology. In other words,
you'd be stating what the client will receieve when (not if) the uptime guarantee is not met.
Re:9 9s (Score:5, Informative)
9's ---- time
1 876 hours
2 87 hours
3 8 hours
4 52 minutes
5 5 minutes
6 31 seconds
7 3 seconds
8
9 you get the idea
Re:9 9s (Score:4, Funny)
365 days * 24 hours/day * 60 minutes/hour = 525600 minutes/year.
%uptime %downtime Fuzzy description of downtime
Re:9 9s (Score:2, Interesting)
IIRC, and it's been a number of years, the overall goal was about 50 minutes of outage per line per year (a little less than three nines). Different failure modes were allocated different parts of that total. Components like the wires, that only took a single line out of service, were allocated the lion's share. Switch components were allocated smaller amounts, depending on how many lines would be out of service. Total system failure on a switch was allocated about 4.5 minutes per year (five nines).
No switching system ever actually made that grade. Probably the ones that came closest were the old electromechanical "steppers". Many small steppers in small towns ran completely unattended, and maintenance consisted of someone driving out once a month to make sure the building was still there and to polish some relay contacts.
All of the computer-controlled switches had dual synchronized processors (ie, each one executing the same op codes at the same time) and duplex memory, with a bunch of extra hardware to detect faults. The single most common cause of total system failure was when a fault had occured, and the system was running "simplex", and a tech pulled a card from the active rather than the failed processor.
Re:9 9s (Score:4, Funny)
Hmmm...
Enough nines of reliability and you can probably easily claim that network latency is responsible for the slow response a client is experiencing:)
The server can go down, be rebooted before the client thinks something is really wrong!
In other words... (Score:2, Insightful)
My ISP (Ameritech) seems to think so, considering my DSL connection and their promptness to "Get ahold of me within 24 hours..."
Bleh
Everyone with competent sysadmins on rock solid *nix systems raise your hands...
Re:In other words... (Score:2)
Win2k my friend.
And whom supported it that whole time?
Me, the web application developer.
Sure, we _could_ have paid for a 'rock solid *nix system' and a couple of admins to go with, but my raises over the past couple of years sure would have looked dismal.
It's called TCO. Sometimes, in some cases, nix isn't necessarily better, or at least there's nothing wrong with Win IF you rtfm.
Guess you never did! You should try it sometime before slamming WinServer users.
Oh, never got nailed by Nimda or the red or any others either.
Re:In other words... (Score:2, Interesting)
I wish we only had 5mil hits/day.. One web server takes 18mil req/day.. We have bunches of 'em out there.
http://voy37.voyeurweb.com/1.stats.html [voyeurweb.com].
Did I mention we're a Linux shop?
Cost of reliability (Score:3, Interesting)
To get higher reliability you have to design for it, if you only require the lower reliability, it would be considered overdesigned.
I don't think high reliability is "too expensive". I think sometimes people ask for more then they need.
Phone system reliability, 911 should be "highly reliabile", long distance across the world can get by being significantly less reliable.
The main hospital server system should have a high reliablity, because it is important and worth it > 99.99% of the working day.
The fundraising server, or something could be a bit less reliable high 90's.
Demanding high reliability for unimportant applications isn't worth it, and is just a lazy design.
Re:Cost of reliability (Score:3, Interesting)
Unfortunatly.... (Score:5, Funny)
Must hate his ex-boss (Score:5, Funny)
99.999% perfection (Score:4, Insightful)
Re:Then the *system* is not five nines! (Score:4, Funny)
As long as it's parked on the ground during those five minutes, it's no problem.
slashdotted, so I'll blather anyhow... (Score:2)
But when people are going to the server through the internet, they get used to interruptions - there are so many links between, some of which periodically become overwhelmed with traffic, that no one could tell the difference between two nines and five nines on your server itself. So sales & product information sites don't need more reliability than you can readily afford. They do need high capacity.
And if it's your blogs concerning your navel lint - no one's looking at your uptime but you...
Simple (Score:5, Funny)
Five nines uptime is cheap and easy. It all boils down to where you put the decimal point.
Re:Simple (Score:2)
Oi! You act like a manager! (Score:3, Informative)
THAT is the goal. It's called redundancy. You will *not* meet any reliability milestones on a single server or network link. It's an obtainable goal, but it does cost money depending on your architecture.
Re:Oi! You act like a manager! (Score:4, Insightful)
And this is why 5 9s is foolish. Sure, you're redundant behind the pipe, but if you lose the pipe you can't blame your datacenter when you charged a customer for uninterrupted service. Technically, if their modem disconnects them for a few hours you've broken contract.
Besides, who needs it? If yahoo is unreachible from my desk, I wait and reconnect. It doesn't matter if the downtime was my fault or theirs...the effect on my user experience was the same. Any services I might have used, or products purchased, I will use or purchase at a later time. After all, I don't refrain from buying shoes just because the mall is closed!
Re:Oi! You act like a manager! (Score:3, Interesting)
Having a DR plan and being reliable go hand in hand for the most part, however under normal day-to-day business conditions, servers need to be upgraded and things unplugged. You don't switch your entire infrastructure over to a DR site to upgrade your apache web server!! It is for this reason you have redundancy on the network and server level leading out to the Internet (or wherever your customer base resides).
Disasters, on the other hand, do not happen everyday. They happen once a year, maybe.... sometimes once every 2 years. If you live in an area more prone to disasters (like southern California), you may need an alternate site located on the east coast.... but, that is the cost of doing business.
Also, having 5-9's on uptime does NOT mean being accessible to everyone in the world at any time no matter what. Having 5-9's of uptime means that your organization has successfully kept it's applications and services available to the Internet. How is it my company's fault if you don't plug your modem into the wall? It's not, so to say that our "reliability" decreases because of an end user being a moron is a stupid statement.
I'd love to read it... (Score:2)
not economically possible? (Score:2, Insightful)
Total bullshit... let's see -- windows machine *requires* reboot every time you apply a patch; a reboot on a large machine is... i dunno, 10 minutes if you got a lot of crap. security update turns up about twice a week or so... that puts up to be ~99.8% MAXIMUM;
even if you don't buy my numbers, three 9s uptime means every week you only gets ~6 seconds downtime.
yeah... sure... not if you want to patch up than internet explorer / IIS so your system does not die from DoS, hackers, or worms!
Re:not economically possible? (Score:2)
Never go to work again! (Score:2, Insightful)
The problem I used to have is I'm not a morning person so being available as an admin before 7am is tough, but now I can admin my network while trapped in rush hour traffic. =] Reboot servers, telent into devices, stop/start services, add users, manage DNS... the list goes on and on.
Uptime can be maintained without even having to leave the comfort of your easy chair. If you're an admin you should check this product out.
SonicAdmin by sonicmobility
(http://www.sonicmobility)
For those bad at math: (Score:2)
4 9s = 99.99% uptime =
5 9s = 99.999% uptime =
9 9s = 99.9999999% uptime =
I call bullsh*t on anything that claims to have 9 9s reliability. 3 seconds every HUNDRED years.
Speaking of uptimes... (Score:2)
Boss Slashdotted! (Score:4, Funny)
What other best way to get back on your former boss than slashdotting him or his company server back to medieval ages..
Follow that up with multiple queries on google about boss's info, credit cards, ssn etc..
To cut things short, by the end of the week :
Boss's boss realizes the server crashes were due to Boss, fires his ass on the spot.
Wife realizes that the new unexplained charges on Credit card from "Suzy's Parlor" were not exactly the next door cafe. Gives him the boot as well.
You evil man..you!
Does anyone else find it ironic... (Score:2, Funny)
Hillarious (Score:2)
His server is toast!
I don't really agree here... (Score:3, Interesting)
Heartbeat/Mon/Fake/Coda/Linux/IPVS for the High Availability, failover from DS1->DS2, each on different backbone nodes.
Mirrored systems in different geographic locations:
Firewall
IPVS Gateway
Apache->Weblogic bridge (Apache vhosts with ssl)
Apache->Zope bridge (Apache vhosts with ssl)
Zope->Zeo setup for content management.
SAN drive array for Oracle, running on two E4500s
This system isn't really that expensive, just the costs of hardware and my salary for setting them up.
Re:I don't really agree here... (Score:5, Funny)
Personal Experience (Score:2)
hmm (Score:2)
You really want to see someone go berserk over downtime, try running a MUD...
Re:hmm (Score:2)
eBay.
The slashdotted site (Score:2)
800lb gorilla of eBay (Score:2)
Please. Let's not talk so badly about eBay. Do you know how many people have been crushed under their CIOs foot?
Rephrase the question (Score:3, Informative)
Remember that downtime is related not only to reliability of each piece of equipment but the number of pieces of equipment. 99.99% uptime sounds good, less than an hour of downtime a year, right? Scale that to a 500-server farm and it's an hour and ten minutes or so of downtime a day, every single day of the year including weekends and holidays (OK, we'll give you one day off in leap years). This concept has boggled a few salescritters who don't grasp the concept of scale.
New Uptime Server (Score:2, Informative)
The URL is http://uptimes.wonko.com/
A GNU/Linux box was number one the last time I looked, with a NetBSD box coming in second.
They only have 1 "nine" now ... (Score:2)
In Germany, 5 nines is bad. (Score:5, Funny)
"Nien."
"Are ve up yet?"
"Nien."
"How about NOW?"
"Nien."
"Vill ve be comink up soon?"
"Nien."
"Vill ve be up next veek?"
"Nien."
Nien? (Score:3, Informative)
Full Text - Page 1 (Score:5, Informative)
Pagers going off. Phones ringing. People shouting fragments of conversations over the tops of cubicles. Groups of people huddled around monitors. Others dashing up and down the hallways, sticking their heads into office doors for just a moment, then scampering along to the next doorway. You are frantically talking on your cell phone, silencing your pager, and yelling into the speakerphone on your desk while typing on two different keyboards attached to three different monitors.
Sound familiar? It's a classic case of the dreaded 'downtime' disease, a terrible ailment where none of your systems work and for reasons you can't always understand. Of course, it typically strikes at the most inopportune moments - the launch of a major product upgrade, or right after announcing your partnerships with 5 of the Fortune 100.
Nobody wants downtime. It's a terrible thing that always involves blood, sweat, tears, and inevitably, a loss of money. This is why when you talk to the upper management of any company with a strategic online initiative you'll be told that the IT group has the highest goals, and that downtime is considered to be an anathema to be stamped out vigorously.
Unfortunately, when you talk to the company's IT manager you commonly hear a different story; the resources to back-up the company's lofty online goals are hard to come by. In fact, with the down swing of the last couple years, combined with the fact that IT isn't, at least directly, a revenue generating entity, IT budgets are being reduced while uptime performance levels are expected to be the same. This can just lead to a death march of extremely over-worked IT personnel, and longer, more numerous, occurrences of system downtime. These goals need to be re-evaluated.
Genesis of the 'Five Nines'
We've all heard the mantra of 'five nines', or 99.999% reliability. Somewhere in the depths of the Internet's 'big bang', when systems were slow and cranky, reliability became a major selling point of why one company's system was 'better' than the competition.
First, people talked about being 'two nines' or 99% reliable. Then someone else would top that, and make their product seem better, claiming 'three nines' (99.9%). Not long after that came 'four nines' (99.99%) and then, near the peak of the dot com era, came 'five nines'.
The herd mentality left no room in which to pitch for investment without the 'five nines' claim. "After all," it was thought, ôif everyone else is saying they can provide 'five nines', I'd have to pretend I didn't know what I was doing if I didn't say I could match everyone else's claim."
'Five nines' isn't impossible. It's merely impractical and unnecessary in the world of the Internet. A shocking statement, perhaps, but a truism none-the-less.
We're not talking about launching people into space (which, by the way, is unfortunately done under 'three nines'), or working with nuclear power plants. We're working within the reference of online systems providing services to users both on and off the Internet - nobody dies from a system failure.
The Greasy Steel Bar
Think of uptime as a chin-up bar coated in grease. The higher the reliability desired, the greater the coating of grease. It's clearly tougher to hang to a higher standard of reliability.
What's not so obvious, but very important, is that the higher the uptime target, the worse one does if not prepared. An IT department capable of three nines faced with a bar that's five nines slippery won't even manage the three nines they are capable of doing.
Re:Nothing is THAT Important (Score:2, Insightful)
No, thanks.
Re:Nothing is THAT Important (Score:2)
No, thanks.
Point taken. Somehow, IMO reasonable reliability in the software and hardware industry is rediculously exensive. I guess it wouldn't be too bad if one were willing to trade off performance (speed) for reliability rather than requiring speed and reliability.
I'd be happy to get consistent two nine or better reliability from my ISP!
That's sad.
Re:Nothing is THAT Important (Score:3, Interesting)
Not to sound like a suit, but it's really about total cost of ownership. For example, software RAID comes with most modern operating systems, but you still need to power down the server to remove and replace a failed drive. However, if you make the upfront investment in a hardware RAID controller with hot-swap capability, you save time and reduce tech support calls, saving money in the long run. If you're offering commercial services (as an ISP or whatever), you start to develop a reputation for reliability that will earn you more customers over time.
Re:Nothing is THAT Important (Score:4, Insightful)
Re:If the ailerons are not available (Score:2)
Now, over a year, bunch it up over every day, and you get 29 seconds. Now thats scary, but if you meet 99.99999% uptime, you're probably not going to bunch all your downtime together in one incident.
Although, I'd say that the article looks like it wasn't written for _really_ critical stuff like this.
But its scary hen you have to argue with your boss about whether you should spend 2 weeks figuring out why your server crashes after 2 weeks up uptime
Re:If the ailerons are not available (Score:4, Interesting)
My particular area of development was the actual display software which was provided data from the DPC systems. Each of the six displays (2-pilot, 2-copilot, 2-EICAS in the console) received multi-cast data from each of the DPCs and then fed data back to the DPCs on the display's status. The DPCs would then automagically evaluate if the displays were functioning properly and switch primary functions away from a malfunctioning display to a functioning display if error conditions were detected.
The PFD (primary flight display) is the pilots most important display as it displays airspeed, artificial horizon, TCAS warnings, altitude and a few other things. The ND (navigation display) is the inner screen on both the pilot/co-pilot sides and if the PFD experiences error conditions, the DPCs switch the PFD to the ND and the ND to one of the EICAS (engine indicators, etc.) displays.
All very interesting stuff
Ahh
Re:If the ailerons are not available (Score:3, Interesting)
It's instructive to read about the United Flight 232 incident a few years back. The #2 engine of a DC-10 exploded in flight (at around 30,000 feet) and severed ALL the hydraulic systems and their backups. Without rudder, ailerons, elevator, spoilers, flaps, or one of the three engines, the pilots set the plane up for a forced landing. And about 200 of the 300 passengers on board survived.
Of course, certain bugs can be really bad. I was down at Boeing Field once last year when somebody attempted to take off in a light plane that had just been serviced. Unfortunately the mechanic hooked up the ailerons backwards, so that when the pilot attempted to correct for a crosswind on takeoff, he promptly rolled and landed on top of another plane in the parking area. (Sounds like inadequate preflight action by the pilot on that one, since he appears to have missed the "control surfaces free and correct" item on his pre-takeoff checklist, but no injuries to the best of my knowledge.)
Note that I'm hardly going to argue that flight-control software shouldn't be damn good. But... it's overstating your case to assume that downtime or error necessarily means a plane is going to fall out of the sky.
Re:Nothing is THAT Important (Score:4, Insightful)
Re:Nothing is THAT Important (Score:2, Informative)
Sorry this should read 50% availability.
Yes I know there may be arguments that scheduled maintenance won't count. But even without counting this jetliner availability never reaches 99.999%.
About two years ago I had to fly a longer distance . I sat in the plane, but the plane didn't leave the gate for about an hour. Then the pilot spoke to us asking our patience for another hour, there would be a problem with the oil pressure and the mechanics were looking at it. After this hour he told us that the oilfilter was defective and had to be changed. And after another hour he asked us to leave the plane, there weren't any new oilfilter available at the airport and they had to get another one from another airport. After five hours we finally got clearance and started.
That's five hours unavailability. If this was the only unplanned outage for the plane at all, and it was on average available 99,999% this means a lifetime of 500000 hours or about 60 years for this plane without any further problem with the plane (included outer conditions like weather, grounding due to Sep 11 et. al.)
So planes on average are much more often unavailable than 0.001% of their operation time. The average delay for the Frankfurt Airport (FRA) is currently 15mins, if we assume that every plane lands on FRA about once per day this would be an average outage of 1%, 1000 times that of 5-9.
Re:Nothing is THAT Important (Score:5, Funny)
The same thing happened to me once, a little puddlejumper from Dallas or Houstan to Austin, I think it was.
Anywho, the pilot revs up the engine, then throttles it back done. Fine, brakes and throttle work. Throttles back up, trips the brakes, and off we go screaming down the runway.
Then the plane slows down, and stops.
Pilot comes on the intercom and says 'Um, folks, you may have noticed, we didn't take off. A warning light has come on in the cockpit, and we don't know why. Until we do, we're going to stay right here.
Now, that's not the bad part. The heat and humidity, and a plane full of sweaty smelly passengers isn't the bad part, either.
No, the bad part was the pair of off duty pilots in the seats next to me who started, in loving detail, discussing every thing that could possibly be wrong.
Re:Nothing is THAT Important (Score:3, Informative)
Five-nine reliability in the airline industry would mean that we'd see a major commercial jetliner crash about every other day.
At first I didn't believe you.
According to this [airlineslounge.com] page, there were 10 fatal accidents in 18 million flights in 1998. That is a little worse tthan six nines. Five nines would be 180 flights, or almost exactly every other day.
I'm really glad I checked before spouting off. :-) Did you know that stat or did you pull it out of the air?
Re:Nothing is THAT Important (Score:2)
Re:Nothing is THAT Important (Score:3, Funny)
I wonder if the internet will glow in the dark after the nuclear power plant's webserver melts down? Hey, maybe that is where Trolls come from.
Re:Nothing is THAT Important (Score:2)
And nothing contributes to reliability more than hiring, training and retaining high quality operators and engineers.
Re:Nothing is THAT Important (Score:2, Interesting)
Re:Nothing is THAT Important (Score:3, Funny)
Re:Nothing is THAT Important (Score:2)
How could it get that crowded? Jpeg files are very small!
Re:Nothing is THAT Important (Score:3, Insightful)
Some things ARE that important, most things aren't.
Re:Nothing is THAT Important (Score:2, Funny)
Out for a pleasant evening troll? (Score:2)
Re:Nothing is THAT Important (Score:2)
I've worked on systems where a failure results in a hearing before an investigative board.
Re:Nothing is THAT Important (Score:3, Insightful)
One thing I saw again and again during the
And that's the point the guy seems to be making: people are spending millions of dollars where they only need to spend a tenth that, to build systems you could run a trading floor with.
Re:Nothing is THAT Important (Score:3, Insightful)
Some factors that precede recent crash between Tu-154 and Boeing 757 DHL were
- Traffic warining system in its scheduled 10-minutes maintenance - dispatcher got no warnings
- Busy phone lines to dispatch - Deutch dispatch was not able to get to Switzerland dispatch to tell them about dangerous situation...
This is an example that cost a lot of lives...
(other tragic circumstance was that pilots of Tu-154 gave priority to dispatch commands instead of commands of collision avoidance system...)
Re:Netcraft have the final word on this (Score:3, Interesting)
I know there are some projects/sites that will allow people to submit uptimes sent from cron jobs or agents to a server, which then stores the uptime data there. Of course, that doesn't mean that you can just generate junk data (ie: 999 day uptime with 2934 users).
Re:Netcraft have the final word on this (Score:3, Funny)
I'm sure it isn't too difficult to keep them running - just make sure the power is on and the network cable is plugged in.
Re: Netcraft have the final word on this (Score:3, Insightful)
> Too Bad that a lot of the servers on the top 50 uptime list still have the default page that apache provides. I'm sure it isn't too difficult to keep them running - just make sure the power is on and the network cable is plugged in.
Historically, some very popular and widely sold operating systems couldn't even do that much.
Linux uptime stats wraparound at 497 days (Score:2)
See the Netcraft FAQ at http://uptime.netcraft.com/up/accuracy.html#cycle [netcraft.com]
Re:Netcraft have the final word on this (Score:2)
Netcraft saying that the boxes with the longest times are BSD only implies there is most likely some kind of relationship between BSD and long uptimes; it does not imply that BSD is responsible for those uptimes.
It could be that the class of administrators who like BSD happen to have administrative practices that preclude rebooting often.
It could be that for some reason BSD is only used in very static configurations where the kinds of activities that would cause you to want to reboot are simply not done.
It could be anything.
Re:4 or 5 nines? (Score:2)
100% uptime is virtually impossible, so the holy grail is as close as possible--99.999%
Re:4 or 5 nines? (Score:2, Informative)
Re:4 or 5 nines? (Score:2)
There are questions about what gets counted when figuring reliability. For one thing, almost no one would count a slashdotting or a DOS attack against their uptime, but nevertheless from the user's viewpoint the server is down. Also, how do you count "scheduled downtime" such as rebooting NT servers after installing security patches, or unplugging the boxes to move them around when it's time to expand the system? A news server with a worldwide audience has no "penalty free" time slots. So either you settle for a lower uptime goal, or you need redundant servers configured so that even major upgrades can be put in by unplugging just one at a time while the others keep running. OTOH the company database server, downtime during working hours is far more serious than downtime for the web server, so if it's a big company you do need redundant servers with automatic switchover. But in most cases there are times late at night or on weekends when no one cares if you shut them _all_ down at once - which certainly makes the upgrades easier.
So anyway, one person's "5 nines" may look like a lot less to someone else. E.g. a server vendor may claim that because only one in a million of their servers is broken at any given time their reliability is 6 nines. Your single server may never break at all - but once a week you take it off-line for ten minutes to load the newest security patches, so to anyone who wanted to keep working for those ten minutes you are only at 3 nines.
So the upshot is ... (Score:2, Funny)