Slashdot Log In
The Slashdot DDoS: What Happened?
from the from-the-horses-mouth dept.
What follows is more-or-less Pat "BSD-Pat" Lynch's account of the DDoS... Pat is our super 31337 BSD Junkie sysadmin. He wants everyone to know that the timeline below is little screwy, but things are more or less in sequential order. Things might not be exactly perfect, but hey, what do you expect after 30 hours without sleep?
Having moved the day before, none of us were truly familiar with exactly how the new hardware would handle the full burden of being 'slashdot.org'. The cluster (known affectionately as The Matrix) had handled its premiere day with flying colors, but we didn't really have an accurate feel of how things would react. Combine this with a couple of extremely high traffic stories posted on both Thursday and Friday, and it took us a awhile to determine that the problems were external, and not a flaw in some new component in the cluster."
The Attacks began Thursday morning. Most of it came in the form of SYN floods, from obvious /16's no less, and some /24's. We didn't have any zombie-killing software or a firewall installed because of certain network topology issues. Later on, a second wave came, this closer to 8 or 9pm and the load balancer (an arrowpoint CS-100) died under the load.
The DDoS, as far as I could see, was a lot of SYN and Zero port packets coming from various /16's and /24's as well as a bunch of RFC1918 reserved addresses (10.0.0.0/8, 172.16.0.0/12 and 192.168.0.0/16) At one point we reached 109Mbits worth of traffic into our network.
Liz and I went back to Exodus and rebooted the Arrowpoint, then the site seemed "ok" for a bit. By 3 in the morning, Liz decided that the PIX (Cisco's firewall) could simply not do what it was supposed to do, so we went back and started building a FreeBSD box as a bridging firewall.
just before we went to plug it in, I tried to ssh into the vpn-gate and noticed that nothing was working right: while the site worked, outgoing traffic and source groups on the Arrowpoint was screwed. As if that wasn't enough, two ports died on it already!
At some unknown point (time blurs after 30 hours straight!) Martin and PatG show up (thank the gods!) and they force us to go to sleep, they bring the site up outside the Arrowpoint, while Liz and I watch from a hotel room.
As of Friday morning, the site is semi-working, but the adsystem can't be updated, and we have no access to the backend servers. I scream bloody murder to Arrowpoint, who eventually shows up to blame the router: a cisco 6509 switch with two RSM/MSFCs.
Liz and I do packet dumps and determine it's not the router, the little CS-100 had died the night before, and thats where it all started. The Arrowpoint guy insists we did something to make the Arrowpoint not work (CT: Explicit description of precisely where Liz and and Pat wanted to store the newly deceased Arrowpoint removed to keep things rated PG) By 7 the CS-800 CSS is up we're almost done for the day, but we stay to make sure. By 10pm we're exhausted but stable, although we're running 4 servers on a round-robin DNS while the new load balancer waits.
Netops (Liz , Martin and I) regroup, and do reintegration of new Arrowpoint CS-800 and installation of a new FreeBSD Firewall box instead of the PIX during Saturday Afternoon. Slashdot returns to normal. Sysadmins get well-deserved sleep.
So that was the story. It was a pretty hellish weekend for everyone involved, but thanks again to those that helped get our ducks back in a row. Again, Part #2 to this (which originally was gonna be run last Thursday, but with all this ddos stuff got pushed aside) is a fairly detailed description of the new Slashdot setup at Exodus, complete with all the changes mentioned above. Fun for the whole family if your family is really into clusters of web servers."
Defense in Depth
(Score:5)(http://www.orion-com.com/)
The simplest case is building two small walls instead of one humongous wall. If you build a humongous wall, it takes a long time to get through... unless the enemy finds a single weak point -- then you're screwed. Two walls each take less time to get through, but if they're well-built using different techniques, the enemy may not get through to begin with and if they breach the first they lose time covering ground and then adapting. They're also very obvious as they traverse the open ground between barriers.
Network security can benefit from the same concept. Others have already mentioned heterogeneous "airgap" systems -- one of the most common and least excusable faux pas by so-called "security admins" is a single firewall protecting a herd of boxen. Second to that is identical airgap firewalls.
Of course real defense doesn't end with the walls. Even services running behind an airgap should be structured with an eye toward reasonable security, as others have pointed out. Many companies think their firewalls make them safe; come the day those firewalls are breached and the attackers make off with everything stored on the NT intranet server before wiping the drive, they'll find out differently.
Any server, no matter how well shielded, should start life in a lockdown configuration and then be made less secure only as needed ("do we really need to enable daytime on this box?"). Admittedly I haven't kept up with developments in secure distros, but does anyone make a "locked-down by default" distro based off Red Hat/Debian/*BSD? It'd be a real service to admins and if not it's something I might consider starting a project for. I know of Bastille Linux but that's (as far as I know) not so much a distro as a set of scripts to tighten up Red Hat.
The only thing we have yet to figure out is how to effectively make systems under attack "shoot back". The most they can do at the moment is call in an airstrike (i.e. alert the admins). Any return-fire capability would only be as good as the intermediate links let it be. It might not even be a good idea, as it would increase network traffic and make the attack that much more severe.
Re:What's the Cisco angle?
(Score:5)(http://parapet.net/)
I disagree 100%. Knowledge of an installation's infrastructure should never comprimise the security of the setup. If it does, then you're relying (to a certain extent) on security through obscurity. Security should be provided by a well thought out layered approach: network layering (multiple firewalls, screening routers, IDS, etc...), host-based security (tcp wrappers, service minimalization & replacement, tripwire, etc..), and application security (ie. authentication, verification, etc...)
In designing networking/server infrastructures it's best to think of it as an open source project, and you should be willing to get opinions and discussion from any number of sources that could include crackers who may at some point want to use that knowledge to attack your site. This is one of the things I like about TIS Gauntlet once upon a time..."crystal box" was the term they used to describe it.
You should prepare for an attack ASSUMING that the infiltrators know as much about your setup as you do. In the long run, if you know that your infrastructure can hold up to someone with that amount of knowledge, then you'll be doing pretty well.
My only question...did I actually see in a comment that they're using NFS to publish data to the distributed webservers??? Ew. Run.
-buffy
(Hmm...I seem to really like parentheticals, don't I? (well maybe not. (really!)))
Re:Blame Canada
(Score:5)(http://slashdot.org/ | Last Journal: Thursday September 19, @05:41PM)
Re:Blame Exodus SQUARED
(Score:4)(http://www.ackthud.net/)
I would definitely look at Exodus for some of this trouble. At times, they have been less than helpful for the service level they claim they will provide.
-They changed their security policy a while ago, and neglected to tell us until after the fact. All visitors to your cage must be announced, and just try to get replacement parts in and out without a whole rigamarole. Previously, one person "on the list" could escort others in and out of the facility, but no more. Granted this makes some sense, but when we showed up the first time after they changed their policy, before informing us, we balked, and complained. The response was (I kid you not) - "Well, we're a big company now, so we can't give the same level of service we used to." WHAT KIND OF ORGANIZATION SHOOTS THEMSELVES IN THE FOOT LIKE THAT?
-Their HVAC is substandard, and they don't truly care what equipment is placed in a cage. I pity the poor sun techs who have to replace the Sun server at the bottom of a stack of 10 other machines (ie, no shelf).
-They continue to abide by their own notification procedures when their "monitoring" software reports trouble. We've gone over their policy several times with them, and verified they had correct contact information for us, and yet they still follow old ways of notification. In this case, it's paging one person instead of using the paging mechanism that contacts the actual people who will do the work - the effort is the same either way.
-The number of times that we've notified them of trouble before their monitors catch it - for example, try working with them to show DNS requests from the outside to their servers aren't being handled.
END rant
I could go on, but I won't.
Nice account, but who?
(Score:5)(http://resume.prouse.org/)
Re:Why a firewall?
(Score:5)Having a firewall in place to filter invalid packets and other crud thrown at the servers means that more of the servers' time is spent generating slashdot pages. Also, the simpler the Unix box, the easier it is to secure - hence, securing a stripped down firewall instead of a big, complex slashdot server.
MPAA
(Score:4)(http://www.sitenation.com/)
You'd better watch it with this comment... the MPAA might come after you too!
Timing
(Score:5)(http://racedot.org)
What's the Cisco angle?
(Score:5)(http://drteknikal.blogspot.com/)
Owned?
(Score:4)(http://slashdot.org/)
I wasn't going to talk about this in public because of /. silence about the DDoS, for I thought things could be somewhat related.
This is what I got this morning when I asked for www.slashdot.org:
<html>
//-->
? name=slahsdot.org&channel=www"
<head>
<title>Not Slashdot.org</title>
<meta name="keywords" content="">
<meta name="description" content="">
</head>
<script language="javascript">
<!--
if (top.frames.length != 0)
{
top.location=document.location
}
</script>
<frameset
rows="*,90" marginwidth="0" marginheight="0"
framespacing=0 frameborder=no border=0
>
<frame
marginwidth="5" marginheight="2"
src="http://slashdot.org"
name=thepage framespacing=0 frameborder=no border=0
>
<frame
marginwidth="0" marginheight="0"
src="http://red.namezero.com/strip2/strip.jhtml
name=pb scrollbars=no scrolling=no
framespacing=0 frameborder=no border=0
>
</frameset>
<noframes>
Sorry
</noframes>
</html>
Weird. Did anybody else see this?
Blame Exodus
(Score:5)(Last Journal: Thursday March 28, @03:56PM)
Topology my ass. Exodus fights hard to make you use their 'value add' security services. Be honest guys, the reason you weren't protected was b/c those bastards were working you over for more money and don't want you running your own security, right? In fairness, there's some nice things about running out of an Exodus facility, but dealing with their physical and network security chimps is not one of the high points.
Re:Why a firewall?
(Score:4)(http://lemongecko.org/)
1. I'd say that they don't want to limit thier functionality. A tweaked firewall will let them keep useful schtuff turned on.
2.If the firewall uses its CPU to deflect the crap, then the web servers wont have to deal with it.
3. They have a BSD uberadmin who can make that BSD box walk the dog. If something else wierd goes on, it'll be in his back yard.
Dirk
Re:Why a firewall?
(Score:4)Also, a firewall acts like a choke point -- any attack must pass through it. By monitoring the health of that one machine, you can monitor the health of the entire networks. In addition, if you want to allow remote administration of the items in the cluster, you can provide a secured path through the firewall; again, you have only the one point of failure.
It's usually wise to have stacked firewalls (an "airgap") in front of a popular site, though, and it's often best to use a variety of operating systems on those firewalls. Somehow, though, I can't see Slashdot doing the wise thing there, though, and putting a FreeBSD->W2K airgap at the front, with the Linux-based Slash behind it.
10.0.0.0 net
(Score:4)I've had alot of portscans for 31337 and 12345 in the past week on the mediaone network, all from 10.0.0.0/16 networks. I am massively annoyed that they let this through and block ports 137:139. Umm.. is this solving the problem? No! Oh, and they've taken a liking to scanning their customers boxen.. but I digress.
DDoS is the direct result of sloppy upstream administrators. IF I were in your shoes, I would be suing every person upstream for atleast a few hops for passing those 10.0.0.0 packets along for gross negligence.
What about the children?
(Score:5)Firstly, I don't think the blame for this DDoS can be centered on just one person or group. Obviously, those who attacked Slashdot are to blame, as are Slashdot's sysadmins, and the people at Arrowpoint. And secondly, the costs of this are much greater than you might think.
I have an eight year old daughter. We had a family pet - a rabbit, black, named Midnight, and my daughter was very fond of it. Midnight, sadly, passed away about two months ago. A week or two after Midnight died, my daughter came to me in tears and asked me, "Daddy, why won't God bring Midnight back? I've been praying like Deacon Simmons told me to."
Naturally, I had to think about how to respond to this. I finally answered, "well, honey, God is a little like Slashdot. He can seem arbitrary, cruel, and unresponsive, but he's really a nice guy who's just a little out of touch and is a little slow at responding to requessts."
This was fine, and I thought that would be the end of it. However, when Slashdot went down last week, my daughter burst into my den, positively sobbing and wailing, and managed to choke out "Daddy! Daddy! I can't get to Slashdot!" "Honey," I said, "it's just a website." But, between sobs, she said, "but you said God is just like Slashdot, remember? Does this mean God is dead?"
I tried to console her as best I could, but nothing seemed to work. When Slashdot came back up, she seemed to return to normal, but she hasn't been quite the same since. She doesn't ask me about God so much any more, and she seems less interested in Church.
As a good Christian, I will turn the other cheek, and not call for the punishment of those responsible. But to the heinous criminals and negligents responsible for this, I must ask, how do you feel about destroying a small girl's sense of innocence and wonder about the world? About crushing her childish dreams and idealism? About shattering her faith in God and his benevolence? About possibly having crushed her soul and emotion forever, leaving her to live the rest of her days in spiritual agony as a broken, scarred husk of a person?
I hope all of you think long and hard about what you've done. What is the soul of a child worth, next to a few double-checks of the router?
Thank you.
No doubt it was MS/MPAA/RIAA/Metallica/Dr Dre
(Score:4)(http://www.mozillatips.com/ | Last Journal: Saturday August 18, @10:13AM)
I'm sure that these great enemies of the Slashdot Empire have found this to be a convenient time to strike. We must systematically seek and destroy all those suspected of having sympathies with the MPAA, RIAA, or Microsoft for security reasons.
Therefore, all
Windows users
CD listeners
Movie watchers
Metallica fans
are asked to please leave now or face prosecution.
thank you.
Re:Owned? - Nope
(Score:5)(http://www.spamsu.cx/ | Last Journal: Friday January 04, @07:58AM)
requested http://slahsdot.org (slaHSdot) not
slashdot.org...
I registered that domain (for free @ namezero) to
help the people who couldn't type. Sorry if I scared you
Cpyder@slahsdot.org
_
/
\_\ sig under construction
RSM/MSFC definations
(Score:5)- Basically a router on a card in the switch for routing between VLANs
MSFC - Multilayer Switch Feature Card
- Once a route for a packet flow is figured out (from the first packet going through the router) all other packets from the flow get switched instead of routed.
A little more detail on the hardware setup
(Score:5)(http://slashdot.org/ | Last Journal: Thursday September 19, @05:41PM)
All of these machines were behind an Arrowpoint (CS-100) firewall/load balancer which took it on the chin when we got DDoSed, so basically the Arrowpoint was taking the full force of the attack. So as described above we replaced it with a CS-800 and a BSD firewall.
I guess we learned that if you're going to post a letter from a Microsoft attorney on your web site the same day you implement a few new troll filters you better be prepared for the fury of hell to rain down on you. Then again this is Slashdot, so we always should be prepared for the fury of hell to rain down on us.
DDOS != 10.0.0.0
(Score:5)Um, no.
DDOS simply requires that a lot of compromized boxes be able to send you packets. Spoofing to non-existant return addresses is an orthogonal issue. You reply that it's used to mask the souce boxes? Any _valid_ address could also be used for that, so filtering would gain you nothing against that.
I agree that filtering of reserved addresses should be done, but that would not hinder a DDOS attack.
A link or two
(Score:4)(http://praxxus.blogspot.com/)
Jason Schwarz Ethernet Tutorial [lothlorien.net]
Lantronix Networking Tutorials [lantronix.com]
You might also try typing "ethernet tutorial" or somesuch in your favorite web search engine. Hope this helps!
--
Re:What's the Cisco angle?
(Score:5)The idea behind the PIX, or any firewall-like object, is to allow 'good' traffic (http, smtp, etc) into the production network, and reject 'bad' traffic (oddball ports, like port 0, unauthorized UDP traffic, etc).
The problem with the PIX, is that it is essentially a fairly stupid router that can do network address translation and other bells and whistles, but it does it poorly. VERY poorly. It was designed as a network address translation system back in the mid 80's (anyone remember all the "We'll run out of IP's by 1997!") by a company that Cisco later bought. Cisco took the product, did a logic problem ( "Firewalls can do address translation. PIX does address translation. PIX is a firewall!"), and had themselves a firewall.
Its configuration makes a lot of sense to someone familiar with cisco router ACL rules, but no one else.
They are probably much better off with the BSD box. Although it's not a good idea to advertise their security infrastructure layout to the world. (Hint, Hint, CmdrTaco!)
jf
Re:PIX inusfficient?
(Score:5)The problem here is that we only had one subnet to work with. The PIX we had wouldn;t to the type of filtering/bridging that I wanted.
Cisco wants a DMZ on these things.
I needed a bridge...why I didn't use linux...
It was quicker and easier for me... ipchains has always been a pain in my arse... ipfw and ipfilter I know best.
The other thig is that we fried an arrowpoint cs-100 (little itty bitty dinky thing that was being replaced with a bigger one)
the little arrowpoint couldn't take the traffic of 109Mbits , it wasn;t meant for that, we were waiting on arrowpoint to ship us the unit we were *supposed* to have.
*BSD fills the gap because I know it inside and out, and it was the quickest to get up at that point.
As far as the router, we can't do any type of stateful filtering on the 6509, due to some setup that exodus has with the HSRP stuff, I'm sure given enough thought I could figure out how to do it, however we were running on crisis mode.
The BSD firewall filled that gap for us...I can now do access lists on that, instead of the cisco.
and we still have a "DMZ" but its on the same subnet.
The arrowpoint CS-800 was emergency shipped to us that afternoon....its about as big as a cisco 6509...and ummm won't die under that type of traffic/content checking (its layer 5 remember)
-Pat
Re:Anti-spoof filters on the Exodus network
(Score:4)even more wacky, we were getting stuff from 0.0.0.0/8 (gee, how the F#@% do you filter that??!?!) lets filter the equivalent of "any", gee...
we have been talking to Exodus to get this problem resolved.
Re:DDOS != 10.0.0.0
(Score:4)(http://slashdot.org/)
As long as you aren't in wild and wooly peering arangements, one should be able to know all the ipaddress that are inside ones network (and within each segment of the network). Once a router sees something that can't possibly be coming from inside that network, it should be dropped and throw up alarms, bells, flashing lights, etc. cause something just ain't right (either a misconfigured client or someone trying something bad).
Doing this type of filtering doesn't prevent your system from being used in a DDOS attack, but it prevents your system from being used in the attack with a spoofed address. Hence see 50mb/sec from host w.x.y.z, contact owner of that address block and get it stopped, since it is not forged they have a compromised box internally. If everybody started doing that the world would be a MUCH better place to live in.