Follow Slashdot stories on Twitter

 



Forgot your password?
typodupeerror
×

Slashdot's Setup, Part 1- Hardware 273

As part of our 10-Year anniversary coverage, we intend to update our insanely dated FAQ entry that describes our system setup. Today is Part 1 where we talk mostly about the hardware that powers Slashdot. Next week we'll run Part 2 where we'll talk mostly about Software. Read on to learn about our routers, our databases, our webservers and more. And as a reminder, don't forget to bid on our charity auction for the EFF and if you are in Ann Arbor, our anniversary party is tomorrow night.

CT:Most of the following was written by Uriah Welcome, famed sysadmin extraordinaire, responsible for our corporate intertubes. He Writes...

Many of you have asked about the infrastructure that supports your favorite time sink... err news site. The question even reached the top ten questions to ask CmdrTaco. So I've been asked to share our secrets on how we keep the site up and running, as well as a look towards the future of Slashdot's infrastructure. Please keep in mind that this infrastructure not only runs Slashdot, but also all the other sites owned by SourceForge, Inc.: SourceForge.net, Thinkgeek.com, Freshmeat.net, Linux.com, Newsforge.com, et al.

Well, let's begin with the most boring and basic details. We're hosted at a Savvis data center in the Bay Area. Our data center is pretty much like every other one. Raised floors, UPSs, giant diesel generators, 24x7 security, man traps, the works. Really, once you've seen one class A data center, you've seen them all. (CT: I've still never seen one. And they won't let us take pictures. Boo savvis.)

Next, our bandwidth and network. We currently have two Active-Active Gigabit uplinks; again nothing unique here, no crazy routing, just symmetric, equal cost uplinks. The uplinks terminate in our cage at a pair of Cisco 7301s that we use as our gateway/border routers. We do some basic filtering here, but nothing too outrageous; we tier our filtering to try to spread the load. From the border routers, the bits hit our core switches/routers, a pair of Foundry BigIron 8000s. They have been our workhorses throughout the years. The BigIron 8000s have been in production since we built this data center in 2002 and actually, having just looked at it... haven't been rebooted since. These guys used to be our border routers, but alas... their CPUs just weren't up to the task after all these years and growth. Many machines plug directly into these core switches, however for certain self contained racks we branch off to Foundry FastIron 9604s. They are basically switches and do nothing but save us ports on the cores.

Now onto the meat: the actual systems. We've gone through many vendors over the years. Some good, some...not so much. We've had our share of problems with everyone. Currently in production we have the following: HP, Dell, IBM, Rackable, and I kid you not, VA Linux Systems. Since this article is about Slashdot, I'll stick to their hardware. The first hop on the way to Slashdot is the load balancing firewalls, which are a pair of Rackable Systems 1Us; P4 Xeon 2.66Gz, 2G RAM, 2x80GB IDE, running CentOS and LVS. These guys distribute the traffic to the next hop, which are the web servers.

Slashdot currently has 16 web servers all of which are running Red Hat 9. Two serve static content: javascript, images, and the front page for non logged-in users. Four serve the front page to logged in users. And the remaining ten handle comment pages. All web servers are Rackable 1U servers with 2 Xeon 2.66Ghz processors, 2GB of RAM, and 2x80GB IDE hard drives. The web servers all NFS mount the NFS server, which is a Rackable 2U with 2 Xeon 2.4Ghz processors, 2GB of RAM, and 4x36GB 15K RPM SCSI drives. (CT: Just as a note, we frequently shuffle these 16 servers from one task to another to handle changes in load or performance. Next week's software story will explain in much more detail exactly what we do with those machines. Also as a note- the NFS is read-only, which was really the only safe way to use NFS around 1999 when we started doing it this way.)

Besides the 16 web servers, we have 7 databases. They currently are all running CentOS 4. They breakdown as follows: 2 Dual Opteron 270's with 16GB RAM, 4x36GB 15K RPM SCSI Drives These are doing multiple-master replication, with one acting as Slashdot's single write-only DB, and the other acting as a reader. We have the ability to swap their functions dynamically at any time, providing an acceptable level of failover.

2 Dual Opteron 270's with 8GB RAM, 4x36GB 15K RPM SCSI Drives These are Slashdot's reader DBs. Each derives data from a specific master database (listed above). The idea is that we can add more reader databases as we need to scale. These boxes are barely a year old now — and still are plenty fast for our needs.

Lastly, we have 3 Quad P3 Xeon 700Mhz with 4GB RAM, 8x36GB 10K RPM SCSI Drives which are sort of our miscellaneous 'other' boxes. They are used to host our accesslog writer, an accesslog reader, and Slashdot's search database. We need this much for accesslogs because moderation and stats require a lot of CPU time for computation.

And that is basically it, in a nutshell. There isn't anything too terribly crazy about the infrastructure. We like to keep things as simple as possible. This design is also very similar to what all the other SourceForge, Inc. sites use, and has proved to scale quite well.

CT: Thanks to Uriah and Chris Brown for the report. Now if only we remember to update the FAQ entry...

This discussion has been archived. No new comments can be posted.

Slashdot's Setup, Part 1- Hardware

Comments Filter:
  • Re:Savvis (Score:3, Informative)

    by Anonymous Coward on Friday October 19, 2007 @12:14PM (#21043295)
    They've changed hands several times and names even more times since we moved in.
  • by Precision ( 1410 ) * on Friday October 19, 2007 @12:24PM (#21043477) Homepage
    Deployment date. The Redhat 9 machines were deployed 3 years ago and just haven't needed to be reinstalled yet. BSD, not so much.. we have a team of great linux admins, introducing another variable isn't likely to happen.
  • by Ron Harwood ( 136613 ) <harwoodr@nOspaM.linux.ca> on Friday October 19, 2007 @12:37PM (#21043697) Homepage Journal
    I'm trying to assume that's humour... but that said...

    If you have a farm of replicated mysql servers (which are read only - as replication is one way here) you need a db to write to.... not reading from it reduces the load on that server.

    So, assuming that your read-mostly - it's actually a nice way to balance the load across multiple systems.
  • by Lumpy ( 12016 ) on Friday October 19, 2007 @01:12PM (#21044337) Homepage
    That's crazy. just lease a dark fiber. WE do that for a point to point that is 12 miles and pay $1500.00 a month. bring my own gear and I'm running 1000Mb happily.

    The savings pays for the gear in less than 2 years plus we have 10X the band width as well as full control over the connection.
  • Re:Interesting (Score:5, Informative)

    by jamie ( 78724 ) * Works for Slashdot <jamie@slashdot.org> on Friday October 19, 2007 @01:13PM (#21044359) Journal

    Yeah, I wasn't sure what he meant either. We have 2 webheads serving static pages (like the non-logged-in homepage), and 4 serving specifically the dynamically-generated homepage for all logged-in users. Plus 1 that serves all SSL traffic, which subscribers can use.

    People often say "subscriber" when they mean "logged-in Slashdot user," not specifically a paying subscriber [slashdot.org].

  • by Bellum Aeternus ( 891584 ) on Friday October 19, 2007 @01:22PM (#21044547)
    master-master allows really fast fail over because you don't need to down the system to re-cofig a slave as a master. I've actually worked with companies that have master-master-master clusters.
  • Re:backup? (Score:2, Informative)

    by statikuz ( 523906 ) <djboge@gm a i l . com> on Friday October 19, 2007 @01:42PM (#21044895)
    From the website: "SAVVIS has done extensive engineering to ensure that any Datacenter located in a region prone to seismic activity is braced for such events. Design elements include, seismic isolation equipment to cushion facilities against movement as well as seismic bracing earthquake bracing on all equipment racks. All SAVVIS Datacenters have racks anchored to the concrete slab below the raised floor."
  • Re:Why CentOS? (Score:2, Informative)

    by pak9rabid ( 1011935 ) on Friday October 19, 2007 @01:51PM (#21045099)
    CentOS is RHEL, minus the support. CentOS is 100% binary compatible w/RHEL as well, meaning the RPMSs you'd get from RHEL would work just fine in CentOS and vice-versa.
  • Re:backup? (Score:4, Informative)

    by Eponymous Bastard ( 1143615 ) on Friday October 19, 2007 @01:51PM (#21045105)

    Well, let's begin with the most boring and basic details. We're hosted at a Savvis data center in the Bay Area.
    Do you ever worry that a big earthquake will hit and your datacenter goes offline? Do you at least keep an offsite backup?
    First rule of offsite backups: Never talk about your offsite backups.
    Second rule of offsite backups: Never talk about where you keep your offsite backups.

    You thought I was going somewhere else with that didn't you?

    In all seriousness, that sounds like it would be in the software article instead.
  • by Precision ( 1410 ) * on Friday October 19, 2007 @01:56PM (#21045215) Homepage
    The average monthly bandwidth usage for /. is around 40-50mbit/sec, which is relatively small. As for cost, you can contact your local ISP for a guesstimate, we get fairly deep discounts since we push quite a bit more with all the sites consolidated.
  • Re:Why CentOS? (Score:5, Informative)

    by Precision ( 1410 ) * on Friday October 19, 2007 @02:00PM (#21045281) Homepage
    We use a combination of CentOS and RHEL. The reason we chose CentOS over say debian is because it is basically identical to RHEL, we end up with a "single" platform that we have to deploy, test, and build packages for regardless of support. Depending on the system we will deploy either RHEL or CentOS accordingly based on support requirements.
  • Re:backup? (Score:5, Informative)

    by Precision ( 1410 ) * on Friday October 19, 2007 @02:02PM (#21045317) Homepage
    Of course we do offsite backups, but also we're currently preparing building a new primary data center in Chicago away from Earthquake land.
  • Re:Considered a CDN? (Score:5, Informative)

    by Precision ( 1410 ) * on Friday October 19, 2007 @02:08PM (#21045419) Homepage
    Actually many of our sites do use a CDN, however the /. devs long ago decided against it for some reason or another. Heck it's still even all setup for them.
  • Re:Redhat 9 (Score:3, Informative)

    by WhatAmIDoingHere ( 742870 ) * <sexwithanimals@gmail.com> on Friday October 19, 2007 @03:44PM (#21046961) Homepage
    How do you BRICK a computer short of taking an axe to it? Boot from install media and reinstall. If the hard drive is shot, the hard drive is shot. But a dead HDD doesn't mean it's bricked. I can see maybe fucking up a BIOS upgrade but even with that there are ways to undo the damage.

    You people keep using the word "brick" to refer to "broken software that can easily be reinstalled."
  • by JohnnyComeLately ( 725958 ) on Friday October 19, 2007 @03:52PM (#21047053) Homepage Journal
    I've worked 2nd Tier Technical Support within Sprint PCS, and it even got to the point I was helping their Level 3 with future system designs. Their level 1 was often a joke. One guy wanted me to reinstall packages on a Sun Solaris machine...I said, "This is not Windows...reinstalling will result in EXACTLY the same error" which of course, it did.

    Anyway, it did get to a point where I instantly got escalated to their 2 or 3 tier because if I couldn't fix it, or I couldn't find the answer withing a Unix forum on-line, they would have a hard time offering a solution. This was supporting about 300 Sun Netra systems running Solaris 9.

  • Re:Savvis (Score:3, Informative)

    by bunco ( 1432 ) on Friday October 19, 2007 @04:07PM (#21047293)
    Twice, actually. Slashdot is hosted in an Exodus legacy data center. Exodus was bought by Cable & Wireless who then sold their US network assets to Savvis.

    Depending on who you talk to, you'll get different responses about Savvis. This is mainly due to the heritage of various customers. i.e. Savvis/Bridge/Intel vs Exodus reputation.

    Savvis is actually the conglomeration of _many_ companies.

    Exodus == (Exodus, AIS, Arca, Cohesive, Network-1, Global Center)
    C&W US == (MCI (IP backbone), Exodus, Digital Island)
    Savvis == (C&W US, Intel Hosting, Bridge Networks)
  • Re:Windows? (Score:3, Informative)

    by BadMackTuck ( 910473 ) on Friday October 19, 2007 @05:32PM (#21048707) Homepage
    Microsoft isn't running linux servers, they're using Akamai who does.

    http://news.netcraft.com/archives/2003/08/17/wwwmicrosoftcom_runs_linux_up_to_a_point_.html [netcraft.com]
  • Yes, MySQL. (Score:3, Informative)

    by Ayanami Rei ( 621112 ) * <rayanami&gmail,com> on Friday October 19, 2007 @06:23PM (#21049455) Journal
    NT
  • by Precision ( 1410 ) * on Friday October 19, 2007 @11:59PM (#21052487) Homepage
    Yes, all our servers are at least RAID1, as for email, this article was Slashdot specific machines only. There are quite a few shared systems, including the outgoing mail relay.
  • by Precision ( 1410 ) * on Saturday October 20, 2007 @12:02AM (#21052509) Homepage
    Ding, we /used/ to use them as layer 3 routers, but they couldn't keep up after the years and alas, they've been relegated to dumb layer 2 switches now. The poor cpu's can't keep up with anything else. We do have OOB serial management on them like you mentioned however.
  • by anticypher ( 48312 ) <anticypher.gmail@com> on Sunday October 21, 2007 @09:03PM (#21067527) Homepage
    I was shopping for transit in the U.S. this summer, and those were the reasonable prices from companies that I also work with here in Europe. I don't know of any tier-1 who will bother with 1Mbps, most tier-2's won't either. My "standardized" quote is for 100Mbps commit on a GigEthernet port, that can handle sustained traffic of 800Mbps. This lets me compare without giving away details of my clients before contracts and NDAs can be signed.

    However, I had a strange split in quotes I received. Some were in the range I expected, from about US$10 for a 100 Mbps commit (minimum bill about US$1000) up to around US$20-$25/Mbps. Then there was a huge jump up to the $500/Mbps range you speak of. Companies that were obviously not one of the tier-1 or 2 players, just resellers of tier-2 bandwith, but who didn't seem capable of competing.

    Quite a few places seemed to think they could obfuscate the quote by refusing to deal in Mbps/month, and instead would offer traffic totals of 100 Gigabytes for inbound+outbound together. There were others who offered peak+offpeak or other ways to hide the usual Mbps/month quote.

    One place was offering GigE ports, but I discovered later their internet transit was just a pair of 100M copper links. They sold their traffic as a package but when you calculate out 50 Gigabytes in one month into a traffic figure, you come up with something like 1-2 Mbps, for the low price of US$500. This may be where you are getting your quotes from.

    As a very general rule of thumb, the tier-1s don't want to deal with a monthly bill of less than US$10,000, the tier-2s don't want anything less than US$1,000, and the tiny resellers will try to sell you everything they can (rackspace, metered electricity, port costs, traffic) to try to keep the bill upwards of $300-$500/month.

    Just for comparison, even with the US dollar in free fall this summer, US prices were well over twice what we pay in Europe for internet transit.

    the AC

8 Catfish = 1 Octo-puss

Working...