Slashdot Log In
Slashdot Back Online
Posted by
CmdrTaco
on Sun Jun 24, '01 06:35 PM
from the well-that-was-a-mess dept.
from the well-that-was-a-mess dept.
I'm still not exactly clued in as to why we're back online, but hey, we are. Sometime saturday morning our Cisco router melted down. Ordinarily this would only be the end of the world, but none of our qualified personel were available to fix it, thus triggering the end of several nearby worlds as well. Props to Yazz, KurtG and Scott from Cisco for managing to help get us back online. We'll post more when we know it.
This discussion has been archived.
No new comments can be posted.
Slashdot Back Online
|
Log in/Create an Account
| Top
| 346 comments
(Spill at 50!) | Index Only
| Search Discussion
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
Re:Original Story
(Score:4)Jeff and Rob are not easy to work with. Best wishes to my replacement.
- Anne Tomlinson
Re:Original Story: Who are you?
(Score:4)(http://cmdrtaco.net/ | Last Journal: Thursday June 15, @02:11PM)
Re:I went Outside!!!!
(Score:4)(http://www.pfstuff.com/)
Re:Exodus
(Score:5)(http://www.parrhesia.com)
Exodus
(Score:5)(http://www.dcc.vu/)
We use Exodus, and they provide us with two separate ethernet feeds, down separate cable runs, from two separate routers in different parts of the internal NOC. No need for any routers at all; we have separate endpoint hardware on each feed and just do a rough load balance across the feeds with round robin DNS.
The recommended (by Exodus) alternative is to have a pair of peered routers which actively load balance across the feeds at the IP level, and back up each other if one goes down. I didn't do this as we're a startup I didn't want to pay for an extra pair of routers.
Either of the above will ensure that there is no single point of failure on the front end. This is referred to as a dual-homed configuration. Exodus' WAN will ensure there is no SPF further out; making your own equipment cluster and software fault tolerant is your problem
It sounds like Slashdot is running with a single-homed connection, and that the router which failed is their own kit in their own rack. $$ permitting, they could have either (a) done a proper dual-homed setup, as per one of the above, or (b) had a spare router sitting in the rack and lease Exodus' managed hardware monitoring service, which would have meant Exodus techs switching it out when it failed.
I don't know what Slashdot's budget for hosting is, but we are a much smaller company than Andover and dual-homed service is not exactly killing our budget. I would conservatively assume that bandwidth is Slashdot's biggest expense.
You cannot throw pies at the co-lo provider for your own failure to have a robust setup and make proper use of the facilities they offer.
Very.
(Score:4)(http://inoshiro.com/)
I do all system administration -- DNS, mail, etc., whereas the VA owned sites all share the same pool of cool admins (like Yaz, Alliecat, etc).
Rusty and I are happy with our current colocation service (vhosting [vhosting.com]). We've never, ever had problems of connectivity (only of perl/admin error
--
Original Story
(Score:5)I'm still not exactly clued in as to why we're back online, but hey, we are. Sometime saturday morning our Cisco router melted down. Ordinarily this would only be the end of the world, but none of our qualified personel were available to fix it, thus triggering the end of several nearby worlds as well. And when our qualified personel arrived, we discovered that she wasn't actuually as qualified as we had hoped. Then she quit, thus terminating 3 local star systems. Hemos or I will update this story as soon as we know what the hell happened. But apparently creds go to Kurt Grey and Cisco tech support. Hopefully we'll have more info soon.
Re:Story keeps changing?
(Score:5)(http://www.acc.umu.se/~maswan/)
All the versions I've seen personally in chronological order:
The one from another comment:
Re:your cisco?
(Score:5)(http://alanjstr.blogspot.com/)
- by Robin "Roblimo" Miller - On Saturday, June 23, the primary controller in the router that controls access to all OSDN servers hosted at the Exodus facility in Waltham, MA, suffered a catastrophic failure. The sites affected were Slashdot, freshmeat, NewsForge, and Mediabuilder, among others. The secondary controller did not automatically take over as it shoud have. It did not work when activated manually, either. The first Cisco support people contacted professed to be "amazed" at the situation, saying it was the first time they had seen a failure of this kind. OSDN and Cisco people, working through Saturday night, were unable to cure the problem. Sunday afternoon, OSDN employee Kurt Gray and Cisco rep Scott, working by telephone, were stepping through the router's configuration and, says Kurt, as they worked to undo other changes that had been made, "on one reset everything came back." OSDN network operations were already in the process of rebuilding the company's network to eliminate the router as a potential single point of failure. As of 7 p.m. US EDT most of the sites were available at least part of the time, but full service was not yet restored. There may still be slowdowns or intermttent failures until a permanent fix is made. We'll have a more complete story within a few days. Right now, OSDN network operations staff members are too busy working to talk.
Interesting (NOT!)
(Score:5)(http://www.cityofhope.org/microseq)
IIRC, they are using a Linux box for their load ballancer. It was their router that got fried, which is a completely different beast. Heavy duty routers remain specialized boxes, and Linux hasn't really serious inroads into that market yet.
Not funny at all when you get the facts straight. The serious problems that MS had were with their DNS servers- which were running Windows- not their routers. IIRC the DNS servers were later cracked, too, which was rightly seen as an indication of poor security. When Microsoft uses its own products, they don't stand up to the use they're being put to, and then Microsoft has to use *BSD based systems to get working again, that's very different from when a Linux site has its non-Linux hardware melt down (and the description did make it sound like a hardware, not software, problem).
I called my ISP!
(Score:5)(http://ellem.is-a-geek.org:5280/...html | Last Journal: Monday June 26, @08:07AM)
ISP : Sa-lash dot?
Me : Dude slashdot.org!
ISP : www.
Me : No no no... listen 64.28.67.150
ISP : Uh... www
Me : Damnit I'm down can't you see I'm down?
ISP : We're like up and stuff. Is this a Macintosh?
Me : I am calling my lawyer! I'll sue you blind!
ISP : Uh I have to get my supervisor.
Me : -click-
---
Re:Original Story: Who are you?
(Score:4)(http://slashdot.org/)
As a perfectionist, you'll of course want to know that you use "me" when the reference comes after the verb. "Neither of those phrases sounds like Jeff or me".
Also, given that you're a perfectionist, you'll be appalled to hear that someone has been using the "CmdrTaco" identity to post poorly spelled, ungrammatical crap all over Slashdot for the last three years. That same person has always tried to justify himself by whining that he "doesn't care about that stuff" and "doesn't want to be too fussy". This may or may not be the person who wrote the notoriously buggy first release of Slash, and said that it was "close enough".
But of course that couldn't be you
Because you're a perfectionist.
Ha, ha, ha. I think I'm going to have to give up satire.
*ahem*
(Score:5)if that was a post about a guy, and he was thought to be less than qualified, would you be posting this?
sexism goes both ways, assuming someone isn't incompetent due to their gender is just as stupid as assuming they are
equal rights for incompetent people dammit! *L*
and yes, I am a chick
on a personal note, maybe that post was taken down due to it's rudeness, rather than the sex of the person involved...
my 2 cents...
Re:What really happened
(Score:5)(http://slashdot.org/)
CiscoChick: Hi Rob. It's that time again. I came by to check on your equipment.
Rob: Equipment!?! Okay. Just give me a minute to get my pants off.
CiscoChick: No, no! I meant your Cisco router. I'm here for a scheduled routine preventative maintainence checkup.
Rob: Oh! That equipment.
CiscoChick: Yeah, the router. But when I'm finished, I could check out any other hard ware you have around. <wink>
Rob: Okay. Just let me know when youre ready.
later.....
CiscoChick: Okay, Rob. I'm done checking the Cisco router.
Rob: Okay. Cool.
CiscoChick: Wow! Look at that equipment!
Rob: Yeah.
CiscoChick: I mean, it's so small!
Rob: Yeah, it's the latest new thing in miniaturization.
CiscoChick: Okay, well.... Let's not focus on the size. What is the uptime on that thing? Does it go down very often, like Windows?
Rob: Ummm... Have you ever done it in a co-location cage?
CiscoChick: No, but there's a first time for everything.
3 minutes later...
Rob: Ahhhhh! I needed that.
CiscoChick: Oh, no! What's happening!
Rob: Eeeeeeiiiiiiieeeee!!!! The router is melting!
--
"Linux is a cancer" -- Steve Ballmer, CEO Microsoft.
Hmmm.
(Score:4)(http://sacha.free.net.ph/ | Last Journal: Sunday March 17, @12:30PM)
When I couldn't get my Slashdot, I assumed the worse. High-profile hijacking. Aliens beaming up the OSDN headquarters. Servers sneakily migrated to Windows, which then promptly crashed.
Kidding aside, I'm glad Slashdot is back up.
I went Outside!!!!
(Score:5)(Last Journal: Sunday July 04, @02:52AM)
When I stepped outside it looked like everything was being generated by 500,000,000 GeForce3s!!! The trees looked REAL!! It must have been at least 1,600,000,000 x 1,240,000,000!!! I couln't even see any jaggies! Talk about anti-ailiasing!!
After spending 2 days outside sue to lack of Slashdot it's hard to come back to my Power Mac 6100/60 with a 14" monitor at 640x480. I wish I had reality's 3D card...
And it seem's Slashdot has slashdotted itself. How did that happen??
--Volrath50
Re:Original Story: Who are you?
(Score:5)(Last Journal: Sunday July 04, @02:52AM)
I mean who is this CmdrTaco guy and how the heck did he get UID #1???
It's obvious that this isn't the real Rob Malda, we all know that Slashdot editors NEVER post at Slashdot....
--Volrath50
Re:Interesting
(Score:4)(http://slashdot.org/)
That would be a pretty dumb comment to make in this context since the router that went down was on the premises of the customer whose sites went down.
I mean, if you step on the modem in your house, you lose your link to the internet, but that doesn't mean you've identified an Achilles' heel in the internet's infrastructure - "Aha! This single modem controls access to the ENTIRE INTERNET! By stepping on it I have rendered the whole network inaccessible to EVERYONE in my house!!"
Seems that should be pretty obvious.
Translation
(Score:5)(Last Journal: Saturday December 27, @02:15AM)
Congratulations...
(Score:5)(http://www.radiokaos.com/)
Gathering data from your May 2nd demographic evauation, Im thinking that you nearly lost five percent of your readers in the space of forty five minutes.
Yours in disappointment,