LiveJournal Servers Go Down 596
Wind writes "According to any journal hosted off of LiveJournal.com, the LiveJournal data center Internap has suffered a critical power failure, leaving all of LiveJournal and its content temporarily offline and requiring the revival of 100+ servers. Perhaps Six Apart wasn't quite prepared for the responsibilities of a website of this size? Updated information is posted here."
slashdot has repeated 503 errors, (Score:5, Insightful)
What a cock (Score:5, Insightful)
Perhaps shit happens, and a blog service doesn't warrant the necessary investment to survive whatever caused this outage?
Was that really called for? (Score:3, Insightful)
Ok, I understand that you don't like Six Apart; I'm no fan of their new licensing scheme either. However, I really doubt that SixApart has any control over any power failures that might occur at Internap.
Re:slashdot has repeated 503 errors, (Score:5, Insightful)
Slashdot has semi-major problems almost every day. 503 errors, "nothing for you to see here" annoyances, and a search engine that goes down more than a Thai hooker.
Re:Disclaimer: I am Not an Electrical Engineer (Score:2, Insightful)
Six Apart is hosting them already? (Score:4, Insightful)
Re:./ed !!!! Server Reboot Time? (Score:5, Insightful)
But we intentionally don't have databases come back up on boot because if there was a blip, we want to do an integrity check first. (we run InnoDB, so it's ACID, but we're paranoid
We have clusters of 2 identical databases in separate cabinets, separate switches, separate Internap power feeds... so normally losing one database in each cluster doesn't matter: the other one gets used. But when we lose every single database, in all clusters, all at once... that's the time to be paranoid and double check stuff.
Where's my irony stick? (Score:5, Insightful)
Re:What a cock (Score:2, Insightful)
THIS doesn't reflect poorly on them. their licensing scheme for movabletype does.
No... (Score:5, Insightful)
What does Six Apart have to do with Internap? Livejournal has been using - and wanting to switch from - Internap for a long time.
Re:Elsewhere (Score:1, Insightful)
Not related to Six Apart (Score:3, Insightful)
From the article write-up (and reflecting the thoughts of quite a few of the comments I just read):
I'd love to know what makes you think this has anything to do with Six Apart. The very first line at http://www.livejournal.com states:
They've been with Internap for years, predating Six Apart's takeover. Unless LJ staff is lying, the fault here sounds like it lies entirely with Internap.
And as far as I can tell, Six Apart didn't ditch the LJ team when they bought them out, so you probably have the exact same people working on bringing the site back up now as you would have if Six Apart had never got involved.
Re:Update (Score:1, Insightful)
Re:Update (Score:1, Insightful)
How do I know this? Outages are common on LJ....
bigger explination (Score:5, Insightful)
That being said, LJ's servers are back up now, but they're making sure that the databases are all in sync -- LiveJournal has one of the most massive distributed MySQL clusters in existance along with a complete caching system.
They need to make sure that the database is all synchronized before bringing it back up -- chances are they're going to rebuild the cache too. If they didn't, the initial strain on the DB servers would probably bring the site down again.
This does however, bring up some questions about LiveJournal's network infrastructure. Danga (the creaters of LJ, recently purchased by Six Apart) are heavy users of Perl and MySQL. Needless to say, they have made numerous contributions to both projects and have developed an innovative memory caching system for linux.
The questions raised however, come from Perl and MySQL. Both are questionable in terms of scalability. Although I'm not qualified to comment on this, I belive that the general concensus is that MySQL is one of the least efficent databases today. Livejournal has 100+ servers. I honestly don't think that a system the size of LiveJournal should require a server cluster that big. It seems that they are trying to solve their performance/reliability problems by blindly throwing hardware at it.
Of course, I love livejournal. It's simple, easy to use, and is a great tool for building communities. Just as it is simple, it can also be incredibly nerdy (there's actually a command prompt!). They're also completely open source.
Hopefully, Six Apart can make their network infrastructure more 'professional' while still maintianing the community spirit that has made it so successful.
I call bull on all this (Score:3, Insightful)
Re:bigger explination (Score:5, Insightful)
Sure, MySQL has its flaws -- some of them pretty big -- but we can work around them.
As for the "not needing a server cluster that big" -- do you have any clue how much data we push in an average day? We maintain so many DB clusters to improve reliability, and we maintain so many web nodes because we push a screaming shitload of traffic.
Re:./ed !!!! (Score:1, Insightful)
It all depends on their sample representing the internet as a whole, whatever that is.
Re:What a cock (Score:2, Insightful)
Re:Internap is *down*? (Score:4, Insightful)
That being said, I think you didn't quite understand what I was trying to say. I really don't care whether they have "plenty of backup power", "plenty of generator capacity" and "top-of-the-line big datacenter grade stuff" (which really sounds more like a collection of buzzwords than anything else, anyway). If a wiring fault (of whatever kind) can bring up the entire UPS system as well as the "generator capacity behind that" and all other safeguards they supposedly had in place as well, then it's just worthless and a waste of money - a UPS is supposed to be an *uninterrupted* power supply.
And while I admit that it's not possible to guard against *all* problems, saying that the colo facility is "one of the most solid in the state" and supposedly can't be taken offline by something "short of a direct strike from a comet" is just silly when a "wiring failure" can bring down the whole thing, and even more so when it's not the first time that happens.
Really, this just stinks of an attitude that's all too prevalent in parts of the IT industry - just piecing together the components of a reliable system won't necessarily give you one, and if you can't build one properly, then don't go advertising that you have one. Don't you think the fact that the LJ people are now planning to buy their own UPS equipment to use on top of the facility's should tell you something?
Oh, and regarding six nines of uptime - I don't think you actually realize for how little downtime that actually would allow. It's about 30 seconds per year, and Livejournal has been down for at least 16 hours, which corresponds to an uptime of about 99.8% - only two nines left. They probably (hopefully!) won't fall down to one, but things are bad enough as it is, and I, at least, fully blame Internap for that (and, again, I'm a paying user on LJ, so I reserve the right to do just that. ^_~)
Re:Value of Livejournal - "Open Source Philosophy" (Score:3, Insightful)
Oh yes. If I ever feel the need to post any of those quiz-things I make good use of the <lj-cut> tag. So if anyone on my Friends list (or a random person finding my Journal) doesn't want to see the results they don't have to.
Actually one of the more useful LJ Features i know of is one that allows you to screen out images over a set size from your Friends list. So you need to view the entry in question to see the image, which is good for your bandwidth and/or narrow page layout.
Re:What a cock (Score:3, Insightful)
Then they'd either need multi-gigabit bandwidth between the two co-los (which would probably cost for a week what they make per year), or they'd have to make separate, semi-independent communities. Google's servers don't stay in sync - you get different results according to which servers you hit, which isn't something you can do with "live" journals.