Stories
Slash Boxes
Comments

News for nerds, stuff that matters

Introducing The New Slashdot Setup

Posted by CmdrTaco on Thu May 18, '00 11:00 AM
from the its-been-a-long-time-coming dept.
At the request of countless users, we're happy to finally present a summary of the new setup over at Exodus. It's the result of over 6 months of work from a lot of people, so shadouts to Adam, Kurt, and Scoop, Team P:Pudge, PatG & Pater for the code, and Martin BSD-Pat and Liz for getting the hardware and co-loc taken care of. Now hit the link below and see what these guys did:

the original version of this document was written by Andover.Net Alpha Geek Kurt Grey. The funny jokes are his. The stupid jokes are mine.

The Backstory
We realized soon that our setup at Digital Nation was very flawed. We were having great difficulty administering the machines and making changes. But the real problem was that all the SQL traffic was flowing over the same switch. The decision was made to move to Exodus to solve these problems, as well as to go to a provider that would allow us to scatter multiple data centers around the world when we were ready to do so.

Meanwhile Slashcode kicked and screamed its way to v1.0 at the iron fists of Pudge (Chris Nandor) and CaptTofu (Patrick Galbraith). The list of bugfixes stretches many miles, and the world rejoiced, although Slashdot itself continued to run the old code until we made the move.

The Co-Loc
Slashdot's new co-location site is now at Andover.Net's own (pinky finger to the mouth) $1 million dedicated datacenter at the Exodus network facility in Waltham, Mass, which has the added advantage of being less than 30 minute drive for most of our network admins -- so they don't have to fly cross-country to install machines. We have some racks sitting at Exodus. All boxes are networked together through a Cisco 6509 w/ 2 MSFCs and a Cisco 3500 so we can rearrange our internal network topology just by reconfiguring the switch. Internet connectivity to/from the outside world all flows through an Arrowpoint CS-800 (which replaced the CS-100 that blew up last week) switch which acts as both a firewall load balancer for the front end Web servers. It also so happens that the Arrowpoint shares the same office building with Andover.Net in Acton so whenever we need Arrowpoint tech support we just walk upstairs and talk to the engineers. Like, say, last week when the 100 blew up ;)

The Hardware

  • 5 load balanced Web servers dedicated to pages
  • 3 load balanced Web servers dedicated to images
  • 1 SQL server
  • 1 NFS Server

All the boxes are VA Linux Systems FullOns running Debian (except for the SQL box). Each box (except for the SQL box) has LVD SCSI w/ 10,000 RPM drives. And they all have 2 Intel EtherExpress 100 LAN adapters.

The Software
Slashdot itself is finally running the latest release of Slashcode (it was pretty amusing being out of date with our own code: for nearly a year the code release lagged behind Slashdot, but my how the tables have turned).

Slashcode itself is based on Apache, mod_perl and MySQL. The MySQL and Apache configs are still being tweaked -- part of the trick is to keep the MaxClients setting in httpd.conf on each web server low enough to not overwhelm the connection limits of database, which in turn depends on the process limits of the kernel, which can all be tweaked until a state of perfect zen balance has been achieved ... this is one of the trickier parts. Run 'ab' (the apache bench tool) with a few different settings, then tweak SQL a bit. Repeat. Tweak httpd a bit. Repeat. Drink coffee. Repeat until dead. And every time you add or change hardware, you start over!

The Adfu ad system has been replaced with a small Apache module written in C for better performance, and that too will be open sourced When It's Ready (tm). This was done to make things consistant across all of Andover.Net (I personally prefer Adfu, but since I'm not the one who has to read the reports and maintain the list of ads, I don't really care what Slashdot runs).

Fault tolerance was a big issue. We've started by load balancing anything that could easily be balanced, but balancing MySQL is harder. We're funding development efforts with the MySQL team to add database replication and rollback capabilities to MySQL (these improvements will of course be rolled into the normal MySQL release as well).

We're also developing some in-house software (code named "Oddessey") that will keep each Slashdot box sychronized with a hot-spare box, so in case a box suddenly dies it will automatically be replaced with a hot-spare box -- kind of a RAID-for-servers solution (imagine... a Beuwolf cluster of these? *rimshot*) Yes, when it'll also be released as open source when its functional.

Security Measures
The Matrix sits behind a firewalling BSD box and an Arrowpoint Load balancer. Each filters certain kinds of attacks and frees up the httpd boxes to concentrate on just serving httpd and allows the dedicated hardware to do what it does best. All administrative access is made through a VPN (which is just another box).

Hardware Details

Type I (web server)
VA Full On 2x2 Debian Linux frozen
PIII/600 Mhz 512K cache
1 GB RAM
9.1GB LVD SCSI w/ hot swap backplane
Intel EtherExpress Pro (built-in on moboard)
Intel EtherExpress 100 adapter

Type II (kernel NFS w/ kernel locking)
VA Full On 2x2
Debian Linux frozen
Dual PIII/600 Mhz
2 GB RAM
(2) 9.1GB LVD SCSI w/ hot swap backplane
Intel EtherExpress Pro (built-in on moboard)
Intel EtherExpress 100 adapter

Type III (SQL)
VA Research 3500
Red Hat Linux 6.2 (final release + tweaks)
Quad Xeon 550 Mhz, 1MB cache
2 GB RAM
6 LVD disks, 10000 RPM (1 system disk, 5 disks for RAID5)
Mylex Extreme RAID controller 16 MB cache
Intel EtherExpress Pro (built-in on moboard)
Intel EtherExpress 100 adapter

This discussion has been archived. No new comments can be posted.
Introducing the New Slashdot Setup | Log in/Create an Account | Top | 306 comments (Spill at 50!) | Index Only | Search Discussion
Display Options Threshold:
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
(1) | 2 | 3 | 4 | 5
(1) | 2 | 3 | 4 | 5