Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×

Slashdot's Setup, Part 1- Hardware 273

As part of our 10-Year anniversary coverage, we intend to update our insanely dated FAQ entry that describes our system setup. Today is Part 1 where we talk mostly about the hardware that powers Slashdot. Next week we'll run Part 2 where we'll talk mostly about Software. Read on to learn about our routers, our databases, our webservers and more. And as a reminder, don't forget to bid on our charity auction for the EFF and if you are in Ann Arbor, our anniversary party is tomorrow night.

CT:Most of the following was written by Uriah Welcome, famed sysadmin extraordinaire, responsible for our corporate intertubes. He Writes...

Many of you have asked about the infrastructure that supports your favorite time sink... err news site. The question even reached the top ten questions to ask CmdrTaco. So I've been asked to share our secrets on how we keep the site up and running, as well as a look towards the future of Slashdot's infrastructure. Please keep in mind that this infrastructure not only runs Slashdot, but also all the other sites owned by SourceForge, Inc.: SourceForge.net, Thinkgeek.com, Freshmeat.net, Linux.com, Newsforge.com, et al.

Well, let's begin with the most boring and basic details. We're hosted at a Savvis data center in the Bay Area. Our data center is pretty much like every other one. Raised floors, UPSs, giant diesel generators, 24x7 security, man traps, the works. Really, once you've seen one class A data center, you've seen them all. (CT: I've still never seen one. And they won't let us take pictures. Boo savvis.)

Next, our bandwidth and network. We currently have two Active-Active Gigabit uplinks; again nothing unique here, no crazy routing, just symmetric, equal cost uplinks. The uplinks terminate in our cage at a pair of Cisco 7301s that we use as our gateway/border routers. We do some basic filtering here, but nothing too outrageous; we tier our filtering to try to spread the load. From the border routers, the bits hit our core switches/routers, a pair of Foundry BigIron 8000s. They have been our workhorses throughout the years. The BigIron 8000s have been in production since we built this data center in 2002 and actually, having just looked at it... haven't been rebooted since. These guys used to be our border routers, but alas... their CPUs just weren't up to the task after all these years and growth. Many machines plug directly into these core switches, however for certain self contained racks we branch off to Foundry FastIron 9604s. They are basically switches and do nothing but save us ports on the cores.

Now onto the meat: the actual systems. We've gone through many vendors over the years. Some good, some...not so much. We've had our share of problems with everyone. Currently in production we have the following: HP, Dell, IBM, Rackable, and I kid you not, VA Linux Systems. Since this article is about Slashdot, I'll stick to their hardware. The first hop on the way to Slashdot is the load balancing firewalls, which are a pair of Rackable Systems 1Us; P4 Xeon 2.66Gz, 2G RAM, 2x80GB IDE, running CentOS and LVS. These guys distribute the traffic to the next hop, which are the web servers.

Slashdot currently has 16 web servers all of which are running Red Hat 9. Two serve static content: javascript, images, and the front page for non logged-in users. Four serve the front page to logged in users. And the remaining ten handle comment pages. All web servers are Rackable 1U servers with 2 Xeon 2.66Ghz processors, 2GB of RAM, and 2x80GB IDE hard drives. The web servers all NFS mount the NFS server, which is a Rackable 2U with 2 Xeon 2.4Ghz processors, 2GB of RAM, and 4x36GB 15K RPM SCSI drives. (CT: Just as a note, we frequently shuffle these 16 servers from one task to another to handle changes in load or performance. Next week's software story will explain in much more detail exactly what we do with those machines. Also as a note- the NFS is read-only, which was really the only safe way to use NFS around 1999 when we started doing it this way.)

Besides the 16 web servers, we have 7 databases. They currently are all running CentOS 4. They breakdown as follows: 2 Dual Opteron 270's with 16GB RAM, 4x36GB 15K RPM SCSI Drives These are doing multiple-master replication, with one acting as Slashdot's single write-only DB, and the other acting as a reader. We have the ability to swap their functions dynamically at any time, providing an acceptable level of failover.

2 Dual Opteron 270's with 8GB RAM, 4x36GB 15K RPM SCSI Drives These are Slashdot's reader DBs. Each derives data from a specific master database (listed above). The idea is that we can add more reader databases as we need to scale. These boxes are barely a year old now — and still are plenty fast for our needs.

Lastly, we have 3 Quad P3 Xeon 700Mhz with 4GB RAM, 8x36GB 10K RPM SCSI Drives which are sort of our miscellaneous 'other' boxes. They are used to host our accesslog writer, an accesslog reader, and Slashdot's search database. We need this much for accesslogs because moderation and stats require a lot of CPU time for computation.

And that is basically it, in a nutshell. There isn't anything too terribly crazy about the infrastructure. We like to keep things as simple as possible. This design is also very similar to what all the other SourceForge, Inc. sites use, and has proved to scale quite well.

CT: Thanks to Uriah and Chris Brown for the report. Now if only we remember to update the FAQ entry...

This discussion has been archived. No new comments can be posted.

Slashdot's Setup, Part 1- Hardware

Comments Filter:
  • Windows? (Score:4, Funny)

    by mseidl ( 828824 ) * on Friday October 19, 2007 @11:02AM (#21043111) Homepage
    I'm like sooooooooo surprised you guys aren't running nt4 boxes. IIS was this sh!t back in the day
  • Savvis (Score:5, Funny)

    by garethwi ( 118563 ) on Friday October 19, 2007 @11:07AM (#21043175) Homepage
    Nice to see you're hosted by a Microsoft Gold Partner. That's a benchmark of quality.
    • Re: (Score:3, Informative)

      by Anonymous Coward
      They've changed hands several times and names even more times since we moved in.
      • Re: (Score:3, Informative)

        by bunco ( 1432 )
        Twice, actually. Slashdot is hosted in an Exodus legacy data center. Exodus was bought by Cable & Wireless who then sold their US network assets to Savvis.

        Depending on who you talk to, you'll get different responses about Savvis. This is mainly due to the heritage of various customers. i.e. Savvis/Bridge/Intel vs Exodus reputation.

        Savvis is actually the conglomeration of _many_ companies.

        Exodus == (Exodus, AIS, Arca, Cohesive, Network-1, Global Center)
        C&W US == (MCI (IP backbone), Exodus, Digita
  • Redhat 9 (Score:5, Funny)

    by Anonymous Coward on Friday October 19, 2007 @11:07AM (#21043199)
    Tell me that's a hilarious joke...
    • Re:Redhat 9 (Score:4, Insightful)

      by MisterFuRR ( 311169 ) on Friday October 19, 2007 @01:10PM (#21045455) Journal
      If it works, and theres no need to change -- why introduce unknown incompatibility...its a production network -- not your home box.
      • Re: (Score:2, Funny)

        by eln ( 21727 )
        I wouldn't use it for a production system because it was end-of-lifed like 3 years ago and is therefore completely unsupported. I don't think I'd want to run a website that (presumably) generates quite a bit of revenue on ancient unsupported software.
      • Amen.

        Although, in the words of a wise man, "The singular of data is not 'anecdote'", I'll cite my own little war story data point. It's trivial, of course, as every story told by a 6-digit /. ID is going to be, but still...

        My little bitty household server had been running RH variants from 5.something onwards, surviving numerous hardware transplants and live upgrades all the way up through RH 9.

        It was running pretty smooth but I felt like it was getting harder to find RPMs packaged for it, and security updat

        • Comment removed based on user account deletion
        • Re: (Score:3, Informative)

          How do you BRICK a computer short of taking an axe to it? Boot from install media and reinstall. If the hard drive is shot, the hard drive is shot. But a dead HDD doesn't mean it's bricked. I can see maybe fucking up a BIOS upgrade but even with that there are ways to undo the damage.

          You people keep using the word "brick" to refer to "broken software that can easily be reinstalled."
  • Can I...? (Score:2, Funny)

    by Kranfer ( 620510 )
    can I play on that awesome hardware? Or perhaps run SETI on it and make it a huge waste of processing power? oh oh, please please!!!
  • The hardware that powers slashdot?

    I wanna know about the power that powers slashdot... are you really as green as the default colour scheme?
  • We have a hard enough time using CARP never mind specifying servers that just read or just write. I need to take a class. ;-)
  • Interesting read about Slashdot server farm. I'm somewhat surprised to see that Slashdot subscribers have two dedicated servers to read the main page, that's as many servers dedicated to a minority of users as to the rest of the users. But well, that's good for them, they help our best thrustworthy news site so they diserve to be rewarded :-p

  • Redhat 9? (Score:4, Interesting)

    by eli pabst ( 948845 ) on Friday October 19, 2007 @11:17AM (#21043363)
    It'll be interesting to read the software section. It was surprising to see that they use an EOL'd version of Redhat (RH 9) that is no longer supported by Redhat. Granted, they're just webservers, but you'd think that would still require a lot of manually updating to keep things patched.
    • Not really, because as they're just web servers, you have a fairly minimal OS install footprint. There aren't that many things to keep up with. The odd kernel or basic library update, httpd update, and probably ssh-related stuff.

      I should know; my web server is on 7.3. 8^)
      • Thats true, but updates have to be backported, it just means increased effort with no company to blame if something goes wrong.
  • by jolyonr ( 560227 ) on Friday October 19, 2007 @11:19AM (#21043401) Homepage
    That sounds useful! I use /dev/null as a write-only database. Very efficient.

    Jolyon
    • Re: (Score:3, Informative)

      by Ron Harwood ( 136613 )
      I'm trying to assume that's humour... but that said...

      If you have a farm of replicated mysql servers (which are read only - as replication is one way here) you need a db to write to.... not reading from it reduces the load on that server.

      So, assuming that your read-mostly - it's actually a nice way to balance the load across multiple systems.
    • I use /dev/null as a write-only database.


      That does sound fast, certainly, but I prefer the idea of using /dev/urandom as a read-only database. Or maybe /dev/zero as a source for your encryption keys......
  • by athloi ( 1075845 ) on Friday October 19, 2007 @11:21AM (#21043419) Homepage Journal
    What determines why you run Red Hat 9 on some systems, and CentOS on others? Was BSD even considered? (You wouldn't run on Macs, would you?)
    • by Precision ( 1410 ) * on Friday October 19, 2007 @11:24AM (#21043477) Homepage
      Deployment date. The Redhat 9 machines were deployed 3 years ago and just haven't needed to be reinstalled yet. BSD, not so much.. we have a team of great linux admins, introducing another variable isn't likely to happen.
    • by saterdaies ( 842986 ) on Friday October 19, 2007 @11:42AM (#21043793)
      Usually these decisions are made based on familiarity, availability, and the like. If you're staff and you are all really familiar with RedHat, why would you force them to run BSD or Debian? Each system has pros and cons, but to be honest, the largest pro or con is usually familiarity. It's really easy to get familiar enough with any *nix to get Apache running. The issue is whether you have the knowledge to deal with it when your live webserver suddenly stops responding to requests.

      Stability and familiarity are more important than the latest cool distro. Is there a reason that they should have picked BSD over RedHat? Of course there are some. There are others to pick RedHat over a BSD. In the end, you have to go with what you're comfortable and familiar with in order to ensure that you can deal with sudden, unexpected problems.
  • It's been 10 freakin years!!! I can remember going to Rob's page for his E apps. An amazing ride!
  • Reference Materials (Score:2, Interesting)

    by Ided ( 978291 )
    This may be slightly off but I was wondering if anyone could recommend some good reading materials for setting up clustered sites or how to spread out work loads like they're doing with their systems.
    • Re: (Score:2, Interesting)

      by the_tsi ( 19767 )
      http://www.linuxvirtualserver.org/ [linuxvirtualserver.org] or anything about F5 BigIPs. Most of understanding load balancing is about understanding (a) how to fool layers above you in the OSI stack (switching on layer 4 through 7 -- particularly 7 -- can take a while to wrap your head around) and (b) the algorithms to pick which physical server gets the next connection (round robin, least connections, predictive, whatever).
    • The complexity is as much in the software as the hardware. I've had some experience with this from the programming side and started a DocForge wiki page [docforge.com].
  • by Jack Malmostoso ( 899729 ) on Friday October 19, 2007 @11:27AM (#21043519)
    Oh yes, geek pornography finally appears on /. :)
    Thanks for the report, looking forward to the software part!
  • by TwoWheelTomy ( 952518 ) on Friday October 19, 2007 @11:42AM (#21043799) Homepage
    wonder how much bandwidth slashdot is using and how much it costs.
    • by Precision ( 1410 ) * on Friday October 19, 2007 @12:56PM (#21045215) Homepage
      The average monthly bandwidth usage for /. is around 40-50mbit/sec, which is relatively small. As for cost, you can contact your local ISP for a guesstimate, we get fairly deep discounts since we push quite a bit more with all the sites consolidated.
    • by anticypher ( 48312 ) <anticypherNO@SPAMgmail.com> on Friday October 19, 2007 @02:56PM (#21047125) Homepage
      For a 100Mbit/sec commit on a GigE connection for a full internet feed, I've been getting quotes from California of around US$10-$16 per Mbit, depending on the data centre and provider. For a 1 Gbps commit, the price drops to around US$6/Mbit. Those are prices from sales droids without any attempt at negotiating a better price.

      For a site like /., a 95th percentile bandwidth of 50 Mbps for the month would cost between US$500 and $800, less if the total commit for all of OSDN's traffic was much higher. Add in hosting costs estimated around US$2,500 per month for a cage with room for 6 racks and matching electricity and cooling, and you can calculate how much operating costs /. runs.

      After that, you have to count up all the amazing 7 figure salaries of Rob and the gang who keep things running :-)

      the AC
  • Why CentOS? (Score:3, Interesting)

    by bogaboga ( 793279 ) on Friday October 19, 2007 @11:49AM (#21043919)
    I am not saying that CentOS is any inferior at all but wonder why they chose it over all the possible serious systems in the Linux world. Is there anything CentOS does better than say OpenSUSE or Ubuntu/Debian and the rest?
    • by Synn ( 6288 ) on Friday October 19, 2007 @11:59AM (#21044117)
      It's familiar to people who are used to working with Red Hat.
    • CentOS is redhatish for the sake of redhatishness, as opposed to the current RedHat, which we all know is redhatish for profit, and Fedora, which is redhatish for testing purposes only. Ubuntu can't be taken seriously; you should have asked about Debian. Debian is redhatish to a fault.
      • CentOS is redhatish for the sake of redhatishness

        CentOS is redhattish because they use RedHat's freely-available source code.
        • Yes indeed; RedHat with the serial numbers filed off.

          My GNU/Linux server experience is pretty much all in the RH family tree, except for a brief fling with SUSE for SPARC in the early '00s. For desktop, though, I have to admit that Kubuntu is pretty sweet. I just haven't had to (or had the chance to) learn the guts like I've had to (or had the opportunity to) learn the ugly white underbelly of server OSs.

          But Hell, I come from a SunOS and AIX background, so almost nothing in the GNU/Linux universe can really

    • Re: (Score:2, Informative)

      by pak9rabid ( 1011935 )
      CentOS is RHEL, minus the support. CentOS is 100% binary compatible w/RHEL as well, meaning the RPMSs you'd get from RHEL would work just fine in CentOS and vice-versa.
    • Re:Why CentOS? (Score:5, Informative)

      by Precision ( 1410 ) * on Friday October 19, 2007 @01:00PM (#21045281) Homepage
      We use a combination of CentOS and RHEL. The reason we chose CentOS over say debian is because it is basically identical to RHEL, we end up with a "single" platform that we have to deploy, test, and build packages for regardless of support. Depending on the system we will deploy either RHEL or CentOS accordingly based on support requirements.
    • by LWATCDR ( 28044 )
      Well I can not answer for Slashdot but I can think of a few reasons I would.
      CentOS based off Redhat Enterprise. It favors stability over "new hotness". But unlike Debian it keeps pretty up to date without going to "Testing" or "Unstable". Yes I have used Debian and I am not a big fan. It may have changed so try it for yourself.
      I also use OpenSuse daily. Yast is a mixed blessing. I find it very slow and too gui like for a server. I use it on my desktop and several servers in my office. I have years of experi
    • stability and enterprise hardware vendor support (by way of RHEL).
    • Seven year maintenance cycle. Wide ISV support, since it's RHEL compatible. Backported patches and bugfixes.
  • Multiple master DBs (Score:3, Interesting)

    by atomic777 ( 860023 ) on Friday October 19, 2007 @11:56AM (#21044077)
    "These are doing multiple-master replication, with one acting as Slashdot's single write-only DB, and the other acting as a reader."

    Isn't that a contradiction? If you have only one write DB, why do you need multiple masters, aren't the other 6 just slaves at that point? Or are there separate master/slave pairs (I'm assuming these are MySQL databases)
    • by Bellum Aeternus ( 891584 ) on Friday October 19, 2007 @12:22PM (#21044547)
      master-master allows really fast fail over because you don't need to down the system to re-cofig a slave as a master. I've actually worked with companies that have master-master-master clusters.
    • by Unoti ( 731964 )
      Perhaps it meant the write-only database has only 1 machine that is a reader for it, and the rest of the slaves all use that one reader as their source for replication. So the master writing database only has a single client, and all the other readers read from the "master" reader.
  • Considered a CDN? (Score:5, Interesting)

    by xmpcray ( 636203 ) on Friday October 19, 2007 @12:07PM (#21044235)
    I was wondering if you ever considered using a CDN service like Akamai to serve content? Most of the big sites (Apple/MS etc) use it.
  • by foo fighter ( 151863 ) on Friday October 19, 2007 @12:15PM (#21044381) Homepage
    I always imagined slashdot ran on hundreds (perhaps thousands) of modded Dreamcast consoles powered by lucky, randomly selected registered users running in hamster wheels who were lured by blocks of Wisconsin cheese dangling just out of reach.

    Thanks for destroying my sense of childlike wonder, you insensitive clods!
  • backup? (Score:5, Interesting)

    by nido ( 102070 ) <<moc.oohay> <ta> <65odin>> on Friday October 19, 2007 @12:15PM (#21044391) Homepage

    Well, let's begin with the most boring and basic details. We're hosted at a Savvis data center in the Bay Area.
    Do you ever worry that a big earthquake will hit and your datacenter goes offline? Do you at least keep an offsite backup?

    • Re: (Score:2, Informative)

      by statikuz ( 523906 )
      From the website: "SAVVIS has done extensive engineering to ensure that any Datacenter located in a region prone to seismic activity is braced for such events. Design elements include, seismic isolation equipment to cushion facilities against movement as well as seismic bracing earthquake bracing on all equipment racks. All SAVVIS Datacenters have racks anchored to the concrete slab below the raised floor."
      • Which sounds great until the fiber link is cut by extraneous damage half a mile from the data center, or the slightly less hardened ISP data center is taken offline. The lengths they've gone to are primarily to ensure that the customers machines and data stay undamaged. Very little is going to prevent the site from going down in a major disaster other than backups and an alternative facility.
        • Re: (Score:3, Insightful)

          by shokk ( 187512 )
          At which point, Slashdot is the least of people's worries. This is a news entertainment site, not a critical care facility.
    • Re:backup? (Score:4, Informative)

      by Eponymous Bastard ( 1143615 ) on Friday October 19, 2007 @12:51PM (#21045105)

      Well, let's begin with the most boring and basic details. We're hosted at a Savvis data center in the Bay Area.
      Do you ever worry that a big earthquake will hit and your datacenter goes offline? Do you at least keep an offsite backup?
      First rule of offsite backups: Never talk about your offsite backups.
      Second rule of offsite backups: Never talk about where you keep your offsite backups.

      You thought I was going somewhere else with that didn't you?

      In all seriousness, that sounds like it would be in the software article instead.
    • Re:backup? (Score:5, Informative)

      by Precision ( 1410 ) * on Friday October 19, 2007 @01:02PM (#21045317) Homepage
      Of course we do offsite backups, but also we're currently preparing building a new primary data center in Chicago away from Earthquake land.
    • by brarrr ( 99867 )
      The rest of the world might be surprised to know that earthquakes are not a daily concern to californians. There is no 4pm shake. Sorry to disappoint.
  • Thanks (Score:5, Interesting)

    by debrain ( 29228 ) on Friday October 19, 2007 @12:17PM (#21044417) Journal
    To the editors:

    Thanks for this. It's really very interesting.

    -B
  • by clem ( 5683 ) on Friday October 19, 2007 @12:21PM (#21044509) Homepage
    I can't wait for "Slashdot's Setup, Part 8 - Root Passwords".
    • I can't wait for "Slashdot's Setup, Part 8 - Root Passwords"

      And what would you do with them? Knowing the root password shouldn't get you into a properly configured and patched system.

      I even remember one cracking contest where the owner of the machine gave out the root password to the target machine. (quick google: nope)

      You could attack the bandwidth, or try to get physical access. But if Cmdr. Taco can't get in....

    • by moosesocks ( 264553 ) on Saturday October 20, 2007 @02:03PM (#21057133) Homepage
      I have this funny vision of what would happen if /. got hacked, and how it would have been done:

      The admins would wake up the next day to discover that the site was running perfectly normally, but was performing slightly faster than normal.

      After closer inspection, they'd find that their datacenter had been emptied, and replaced by a single Apple ][ that had been hacked to run the latest version of Ubuntu, and that slashcode had been rewritten so that it would perform all of the same functions as the previous slashcode, but ran at twice the speed... on the Apple ][.

      A post-it would be found stuck to the screen, stating that all of slashdot's old and now unnecessary hardware had been sold, with the proceeds being donated to the EFF. The message would likely include or be in the form of a Soviet Russia joke. Additionally, a miniaturized plastic Gnu would be left behind as a calling-card.

      The news of this would be regarded as insignificant by the editors, until over a year later, it gets posted four times in the span of two days.
  • Comment woes (Score:3, Interesting)

    by theantipop ( 803016 ) on Friday October 19, 2007 @12:55PM (#21045187)
    Any chance that with all that iron you can loosen up the crazy restrictions on comment posting? It can get rather ridiculous [case.edu] sometimes.
  • by Anonymous Coward
    "Really, once you've seen one class A data center, you've seen them all. (CT: I've still never seen one. And they won't let us take pictures. Boo savvis.)"

    Send in a courtroom artist :-)
  • I went down memory lane so I fired up archive.org's wayback machine. This was a post on 1998 Booker writes "So IBM announces a 25 gig hard drive... does the world need this yet? Unless this is in a RAID, would you really want to trust 25 gigs on a single drive? What would you use this for? 400+ hours of MP3s comes to mind... "
  • The BigIron 8000s have been in production since we built this data center in 2002 and actually, having just looked at it... haven't been rebooted since.

    Gee, you really should update the firmware on your routers and switches more often than once every 5 years (or never). All I really need to do to hack Slashdot now is to look at all of the vulnerabilities on BigIron 8000s for the last 5 years and pick one to exploit. I wouldn't do that, but I'm sure a lot of miscreants could DOS you something fierce, or ju

  • Really, once you've seen one class A data center, you've seen them all. (CT: I've still never seen one. And they won't let us take pictures. Boo savvis.)

    Have you ever asked if you could take photos of your own installation? Find a manager or someone somewhat in charge of the data center, and let them know you need to get photos for insurance reasons or backup plans. Or the slashdot FAQ.

    I've never had a problem taking photos in data centers in Europe and New York, by asking permission each time. It's a grea

Some people claim that the UNIX learning curve is steep, but at least you only have to climb it once.

Working...