Scaling Server Performance 349
An anonymous reader writes "When Ace's Hardware's article Hitchhiker's Guide to the Mainframe was posted on Slashdot, they got 590,000 hits and over 250,000 page requests during one day. This kind of traffic caused only a 21% average CPU load to their Java-based web server, which is powered by a single 550MHz UltraSparc-II CPU. In their newest article, Scaling Server Performance, Ace's Hardware explains how this was possible."
6 per second. (Score:4, Insightful)
yes. (Score:5, Funny)
dynamic content (Score:2)
Even for dynamic content, it seems any reasonable web server should easily be able to generate half a dozen pages per second. Of course, it won't be able to if you do something stupid like put all your content into a database.
Re:yes. (Score:2)
I think that you must be having a problem on your end, since it loaded in under two seconds for someone else and loaded instantaneously for me.
Re:yes. (Score:2)
Re:yes. (Score:2)
Re:6 per second. (Score:2)
Sounds kinda weak to me too. I am currently working on a web server application that's supposed to serve highly dynamic, personalized pages (perhaps comparable to slashdot). Our perf goal is 200 pages/sec. Of course, it would be on bigger hardware but I think we could easily beat the number mentioned in the article on a 500Mhz PC.
Re:6 per second. (Score:3, Interesting)
*REQUEST TIMED OUT*
My 1ghz server with 3 terrabytes of ram can handle any traffic you can throw at it!!! Now to upgrade that 56k....
Burning karma
How many per second? (Score:4, Informative)
This says nothing about what they can serve under ideal conditions; this is what they actually served up during an actual slashdotting. If you want to max out their server, you will need to get more
Read the article; on ApacheBench with one particular page they tested, the server tested out at five dozen pages served up per second.
I don't know about you, but I was somewhat impressed by all this. A $1000 Sun does seem to have been a wise choice for them.
steveha
Re:6 per second. (Score:2)
How they did it... (Score:5, Funny)
Re:How they did it... (Score:2, Informative)
So the article on preventing the /. effect ... (Score:5, Funny)
Of course, it is incumbent upon all of us to rush out and try to the link to the article. And some of us to actually read it as opposed to just reading the title.
Re:So the article on preventing the /. effect ... (Score:2)
Re:So the article on preventing the /. effect ... (Score:4, Funny)
Re:So the article on preventing the /. effect ... (Score:2)
Yep. Looks like they really put their money where there mouth is. When I try to access the page I receive the infamous "The document contains no data". They are already down and the article will be on the /. front page for another day. Reading an article on server scaling from these guys is like taking a class in conflict resolution from Saddam and GWB.
<blows horn> (Score:5, Funny)
SLASHDOT THEM AGAIN!!!
only 600, 000 per day? (Score:5, Informative)
Re:only 600, 000 per day? (Score:5, Insightful)
The web-design and server world seems to be focused on quantity, not quality.
And frankly, much of what
Re:only 600, 000 per day? (Score:5, Informative)
Re:only 600, 000 per day? (Score:2, Interesting)
The difference between just showing a page and creating one is like the difference between a pre-rendered
I still figure bandwidth is the big killer. I mean you can only stuff watermelons through a garden hose so fast.
Re:only 600, 000 per day? (Score:4, Funny)
Maybe it's just late, but I'm having a problem following all this technical jargon
Re:only 600, 000 per day? (Score:3, Funny)
you're thinking of static pages (Score:4, Interesting)
Lots of people could use this type performance. I only had a chance to use JSP on one project, a while back. Tomcat was notoriously difficult to install back then. But when it was up, the difference between JSP application server and PHP become apparent. Application servers can make quite the difference.
Just having an application scope for variables saved us a trip the the ldap server per request. PostNUKE, squirellmail, and lots of other large PHP apps could be sped up drastically if some of those features were available in the PHP engine.
Re:you're thinking of static pages (Score:2)
Like I said, misconfigured ;-)
(yes, I'm joking)
Re:only 600, 000 per day? (Score:2, Funny)
But this is a Java-based server we're talking about.
Re:only 600, 000 per day? (Score:2, Insightful)
Re:Server performance in 1994 (Score:2, Funny)
Re:Server performance in 1994 (Score:2)
I remember back in '95 when my Accelerated Content Internet Daemon (ACID) was cool, but most people recommended that you not handle more than 1-2 hits every 8-12 hours.
But the ad server is slashdotted (Score:5, Insightful)
Now, a while back on
Big deal (Score:2)
Academic: MIT.edu, Stanford.edu, Maryland.edu
Business: Amazon.com, CDNow.com, Slashdot.com, Google.com
Pleasure: TheHun.net, Playboy.com, Napster.com
Re:Big deal (Score:2)
Ones that crashs (Score:5, Interesting)
Re:Ones that crashs (Score:2)
Re:Ones that crashs (Score:2)
I wish I understood/knew more about this... (Score:2)
Not to be cynical... (Score:2, Interesting)
As an example, I run a pretty popular site that pumps out about 250,000 as well, all CGI-created and database fed pages. This is being served by two 1ghz web heads and a 1ghz db server. Granted that those three machine run at 100% load during peak hours, it's still not a huge deal (this is because I haven't finished the local caching mechanism yet). Did I mention that the two webservers also toss 1 million images a day too?
Of course, I don't wan to belittle the article that much -- If anything, it shows the preformance gains one gets when you use efficient hardware (I have no doubt that their 550 mhz Ultrasparc II has nearly the same horsepower as a 1 ghz x86) and efficient caching (caching data in RAM and serving from there, avoiding disk access penalties, is a huge performance increase).
Re:Not to be cynical... (Score:3, Insightful)
256K of cache on die, ALI chipset board that's a lot like a PC, slow PC133 (with very high latency) memory, dog-fucking-slow disk, unless they're using SCSI.
This is not your father's E450.
That's hardly impressive (Score:5, Informative)
OTOH, my puny little SDSL connection was seriously maxed out.
Even old hardware can happily serve up hundreds of documents a second, if the pages are static.
Re:That's hardly impressive (Score:2)
Re:That's hardly impressive (Score:2)
The steps:
Re:That's hardly impressive (Score:2)
Re:That's hardly impressive (Score:5, Interesting)
I am familiar with serving dynamic content of very high information density, and let me tell you, Ace's doesn't compare. The data I serve from work is updated every second; the stories on Ace's (and most other hardware-review sites) change every couple of days.
Re:That's hardly impressive (Score:2)
It did, when slashdotted. It's since been updated to a VIA C3 at 800 MHz [mini-itx.com].
No, it's a high-content site without all those frilly graphics and doesn't use dynamic methods to serve up static data. Just dumb-obvious things.
Isnt the real problem BANDwidth? (Score:5, Insightful)
Re:Isnt the real problem BANDwidth? (Score:2)
Re:Isnt the real problem BANDwidth? (Score:4, Informative)
Yes, mod_gzip is great and I use it on my own server [pjrc.com], but for any "normal" website the main advantage is an interactive speed-up for dialup users. It really doesn't save huge amounts of bandwidth (in this case, enough to matter for withstanding the slashdot effect).
As an example, the page slashdot linked to is 22443 bytes of compressable html, and approx 84287 bytes of images (not including the ads and two images that didn't load because they're not handling the slashdot effect so well as they thing they can). At -9, the slowest and best compression (remember, this is a dynamic JSP site, not static content you can compress ahead of time), the html compresses to 5758 bytes, thereby reducing the total content from 106730 bytes to 90045.
That's only a 15.6% reduction in bandwidth.
Also, a typical HTTP response header, which can't be compressed, is about 300 bytes (not including TCP/IP packet overhead, which we'll ignore hoping that HTTP/1.1 keepalives are putting it all in one connection...). There were 18 images (actually 20, but junkbuster filtered 2 out for me). That's 19 HTTP headers, at 300 bytes each, all uncompressable. Adding in HTTP overhead we're at (approx) 112430 without compression and 95745 with mod_gzip. So the uncompressability of the headers reduces the bandwidth savings to 14.8%.
The big advantage that makes mod_gzip really worthwhile for a site like that is the a dialup user can get all the html in about 2 seconds, rather than 5-6 (assuming the modem's compression is on). Then they can start reading, while the remaining 82k of images slowly appear over the next 20-30 seconds.
Now in some cases, like slashdot's comments pages, mod_gzip makes a massive difference. But for most sites, the majority of the bandwidth is images that are already compressed. That 10% to 20% reduction in bandwidth from simply installing mod_gzip is pretty small compared to a bit of effort redesigning pages to trim the fatty images.
Content expiration (Score:3, Informative)
In addition, for static content, "LastModified" is easy to compute. Clients can request a page, send an "If-Modified-Since" header with the timestamp of the static item, and if the item hasn't changed, return a 304 response and no data.
The same can be done for dynamic content, but it requires a bit more work. Most web servers do these things for static content out of the box.
As was said in the article, the fastest request is the request that never has to be made.
Re:The problem is dynamic content (Score:3, Informative)
Have you ever tried a test where the clients kept their connections open for a reasonable length of time??
In the real world, virtually all clients are connected via links ranging from slow dialup to 1.5 Mbit/sec. They hold connections open and tie up server memory resources for a lot longer than a fast-as-possible benchmark running on the same machine or over fast ethernet.
Any server running on a single box is probably going to have trouble with 17000+ pages per seconds to modem users, who require many seconds to transfer the page. If the average connection open time is 2 seconds, that's 34000 open connections. Even if the server used only 32k of RAM per connection (barely enough to buffer a few packets and allocate "window" inside the TCP layer in the OS, and maintain OS-level info and buffering for the open file), that'd be over 1 gigabyte of memory. I suspect a combination of Windows (TCP/IP & file I/O), IIS, and ASP.NET uses a lot more than 32k per connection.
Did we /. the home page? (Score:4, Funny)
Re:Did we /. the home page? (Score:2)
The Easy Way to Reduce Webserver CPU (Score:5, Funny)
I'd like to see some of their network performance graphs from that same day. That might make more interesting reading. I recall waiting a good long time for some of those pages to come up.
that's nothing (Score:4, Funny)
I manage all that with only an Athlon something or other with some amount of ram.
I know a lot about the system - you have to when you are good like that.
Bottlenecks (Score:2)
One thing that sort of made me think though, was the focus on being able to deliver massive numbers of "pages" and "hits". For most sites, this is not an issue -- their bandwidth would be hosed before the server would be. You can only stuff eight great tomatoes into that itty-bitty can. It doesn't take much to saturate a T-1.
If you have nearly unlimited bandwidth, then these server-tuning issues start to become important. I think it is a good idea to focus on how applications are built and used when thinking about performance of servers. Too often, the sole focus is "can I do task X" and not "what is the most-efficient way I can do task X".
A nifty article, all in all.
GF.
Strange what they're saying about the CPU (Score:2)
"an overall average CPU utilization of 21% for a modest 550 MHz uniprocessor machine is not too shabby."
Firstly, when said CPU is an UltraSparc II, then 550Mhz is anything but modest. Secondly, I would not expect the CPU to be busy during a slashdotting; it would be hanging around waiting for the disk drives and network card to come up with something useful.
Re:Strange what they're saying about the CPU (Score:3, Interesting)
The UltraSparc II only goes up to 480MHz, and the UltraIII starts at 750. In between is the grey area of the IIe and IIi, and the ONLY Sun box with a 550MHz processor is the SunBlade 100/150.
If that's their web server, then the CPU is the least of their worries--the thing has internal IDE drives, two (only) 33MHz narrow PCI slots, and not much else. Assuming that one of the PCI slots is used for a faster and/or redundant network connection (QFE card most likely), then the other one is the only connection to SCSI disks. That CPU, low-end as it is (for Sun), is definitely going to spend its time waiting for the rest of the system.
(And yes, I know that was your second point--I just wanted to back it up with some detail)
We Win! (Score:4, Funny)
The lesson here is: put your money where your mouth is and you may end up eating it.
Re:We Win! (Score:2)
Re:We Win! (Score:2)
Updating Cache data (Score:2, Interesting)
Of course, there are more complex applications where data caching can be implementing, such as discussion forums where multiple users can be adding, editing, and deleting messages simultaneously. But that's a topic for another article.
Most of the applications I write involve updating data almost as often as fetching it from the database. In an environment like Apache where you have individual processes serving content (and database connections are process-centric), implementing caches that are updatable becomes a very complex excercise, without implementing an additional layer.
eToys used a b-tree (Sleepycat?) database layer situated in front of the database layer - they would store objects in the b-tree, and fetch them from there if they had not expired. Once cache amongst all the servers made this worth doing; a Java web server can do something similar, since the objects are stored in memory shared between the various serving threads. The end result is similar to what Ace's Hardware has done.
What have other people done? Since I use Apache, I'm leaning towards a disk-based caching system.
Okay team. (Score:2)
Go team
/.'ed - even more experience! (Score:2)
looks like they spoke too soon... (Score:5, Interesting)
Reloading their page a couple times (2nd page of the article, not the one slashdot linked to), I'm getting occasional 503 errors, and the rest are taking a very long time to load. Usually the page comes up with some "broken" images that didn't load.
At the bottom of each page, there's a number that seems to indicate the time they believe their server spent serving the page. Usually is says something like "2 ms" or "3 ms"... That may be how long their code spent creating the html, but the real world performance I see (via a 1.5 Mbit/sec DSL line) is many seconds for the html and many more for the images, some of which never show up, and sometimes a 503 error instead of anything useful at all.
So, Brian, if you're reading this comment (which will probably be worthy of "redundant" moderation by the time I hit the Submit button)... it ain't workin' as well as you think. Maybe the next article will be an explaination of what went wrong this time, and you can try again???
Re:looks like they spoke too soon... (Score:2)
repost (Score:2)
Better get back to work ... (Score:2)
That performance is supposed to be impressive? (Score:3, Informative)
OLD ARTICLE (Score:3, Informative)
------
It was published over a year ago, and undoubtedly was based on their spring/summer 2001 trials. Even then this info wasn't revolutionary, and is even less so now.
different meanings of "dynamic" pages (Score:5, Informative)
They've really simply discovered that dynamically generating essentially static content is a bad idea : the 'dynamic' pages they are talking of are just articles which once written stay the same, and so are serving identical pages to each user.
Using scripting with database look ups to create such pages is obviously not good - much better is to compile your data in to static pages and serve those. I have done this for my own website using XSLT to generate the html pages with consistant links and menu's etc. - but you do have to remember to re-build it after making any changes or adding new content (I use gnu make to handle the dependancies of one page upon another so it doesn't rebuild the entire site everytime.)
They've taken the alternative approach of still using a database for the requests, but then caching future requests for the same page-id's, which has the advantage of being compatible with their original dynamic generation system, but they don't mention how they handle the dependancy / cascading alterations problem if they change the content (though they could always flush the entire cache of course....).
Neither of these approaches can help you though if you have real dynamic pages where every request is unique or there are are too many possible pages for caching to be feasible (for example amazon or google).
Re:different meanings of "dynamic" pages (Score:2)
big difference between 1am and 2pm! (Score:5, Interesting)
This time around, the link got posted at 2PM not 1AM, and so far as I can see, they handle this flurry of hits much less gracefully than the previous ones! There are a lot more people online at 2PM than 1AM (all arguments of nocturnal nighthawks and people in other time zones aside).
I always wondered the rough numbers would be (Score:2)
web serving has become bloated (Score:3, Informative)
It's not apples to apples, since we weren't serving the same set of pages (we had around 500 personal homepages, each with a varied combination of static HTML, images and CGI programs) but honestly, if the numbers in this article are supposed to be impressive, we've grown too accustomed to web server feature bloat.
Thread-per-request model is a bottleneck (Score:3, Informative)
More on event-driven servers that minimize data copies and context-switching here [pl.atyp.us].
Who cares how long it takes for static pages? (Score:4, Insightful)
It all depends if you are actually doing something of interest.
Like the comments in Slashcode, most apps go from static, to dynamic, to static caching of dynamic pages.
At DTN we served up customized portal pages to people with commodity and equity quotes, news, graphs, etc. Since they didn't have any money we had to use a load balanced Pentium Pro and a Pentium II. The app had no problem serving the load, and it was fast.
Now that I work for companies that have money, our apps run really slow. Developers get expensive machines and don't know how to optimize any more.
Not so impressive (Score:2)
this is all pretty obvious (Score:2)
Yes, but we are back, stronger than ever! (Score:2)
It's nice to see a article like that, just what I was looking for [slashdot.org]
Ace HW needs a clue (Score:4, Informative)
For those of you interested in this topic here is a few pointers and words of wisdom.
Server scalabilty and performance has three basic metrics, thruput (urls/sec), simultaneous connections, and performance while overloaded. Of course, you could add latensy but I'd argue that with the correct design latency is directly proportional to the real work you are doing, bad design insertes arbitrary waits.
I know of a HTTP Proxy by a large ISP that does user authentications & URL authorization (re: database), header manipulation, and on-the-fly text compression at 3000 urls/sec for 2000-4000 simultaneous connections and maintains that performance under load by sheding connections, all this on a dual 1GHz Intel PIII box running a Open Source OS that starts with "L". That is a maximum of 260 Million URL/day, three orders of magnitude greater performance than Ace's Hardware stats.
The simple answer to the question "How do I create a scalable fast network server?" is Event-driven GOOD & Threads BAD. Event driven network communication is two to three orders of magnitude better performing than thread/thread-pool based network communications. See Dan Kegel's C10K web page [kegel.com]. That means you must use non-blocking IO to client sockets and databases. Once you accomplish that small feat, dynamic content just consumes CPU; with 2.8 Ghz Xeon processors you have plenty of cycles for parsing HTML markup or whatever. Threads cause cache thrashing, and context switching. While thread programmers don't see the cost in their code, just read the kernel code and you'll see how much work HAS TO BE DONE to switch threads. Event driven programming just takes some state lookups (array manipulation) and a callback (push some pointers onto the stack and jump to a function pointer).
Desgin is FAR MORE IMPORTANT than which runtime you use (execution tree, byte code, or straight assembly). I have done some very high load network programming with Perl using POE [perl.org].
Python has Twisted Python [twistedmatrix.com]
Java has the java.nio [sun.com] and the brilliant event/thread hybrid library SEDA [berkeley.edu] by Matt Welsch.
I am also looking into the programming language Erlang [erlang.org] which builds concurrancy and event driven programming into the language. Further, Erlang is used by some big telco manufacturers to great effect (high performance and claimed 99.9999999% nine-nines reliability on a big app).
Slashdot Handles It Better... (Score:2)
I've always thought more sites should do this. Why not have the pages you can get away with be static (updated every couple minutes for a 'real-time' feel), and only have the pages that need to be dynamic be generated on the fly? I was playing with ab (the Apache benchmark tool) on one of my computers, and I couldn't believe the difference -- loading a static page, I got something like 100,000 hits (I don't remember the time period); PHP got about 5,000 (unknown, but same as previous, time period). My numbers could be off, but assuming they're not, it would be 20x more effective to have the page generated every few minutes and saved as a static page, at least for high traffic sites. (For low traffic sites, this could probably consume *more* resources...)
4 ms? (Score:3, Interesting)
138974 ms
A little over their 4 ms goal. Specifically, 138,970ms.
Summing up the article. (Score:2, Interesting)
Obviously, if you don't have enough bandwidth, you are screwed anyway, but usually it's the server load that is the problem.
MfG shurdeek
It's the pipe, folks. (Score:4, Insightful)
You can saturate most any small-business-affordable pipe with a Pentium classic machine as a Web server. Or to put it another way, there's no point sticking a dual-P4-Xeon Web server with 4GB memory and a RAID-5 on a DSL line.
The computer I'm using right now (a PIII system) could run Apache very nicely in the background and would likely survive quite a hitrate without too much trouble. But if even just a few thousand people were to hit it all at once, there would be a traffic jam, some people wouldn't get served, and the ISP would probably close me down, because I'm only sitting on a 256k pipe.
still impressive, since they're back up (Score:2)
big hairy deal... thttpd does this (Score:2)
Nothing but a gigabit ethernet connection can even come close to handling that.. and last time I checked a T-1000 line was not an option on internet-1
CPU bound==something very very wrong (Score:3, Informative)
Ok, if the server actively plays chess against a hundred people, I'll let you be cpu bound.
I am sorry, but this is really just not impressive (Score:3, Insightful)
Getting it right the first time? (Score:5, Funny)
>Garth Brooks covers this in his famous book "The Mythical Man Month" where he proves in a controled lab environment that Java under X86 runs on the order of Olog(n) slower than it does on a RISC chip like an UltraSparc.
WTF? Fred Brooks wrote this book, and I don't seem to remember RISC or UltraSparc chips, not to mention Java, in 1974. Garth Brooks is (AFAIK) a country music singer. Try again.
huh? (Score:2)
Re:Maybe (Score:4, Funny)
Re:Guess... (Score:3, Interesting)
If you limit yourself to static stuff then you can cram any amount of traffic out of very limited boxes.
Even google does everything they can to cache stuff and turn dynamic requests into static ones, and they actually have a reason (lotsa traffic, complicated requests).
The fact that you can use java to write speedy code doesn't prove a thing either, it only says that it is now no longer a bottleneck.
You can probably saturate a decent sized pipe using -- aaarghh -- VB or something asinine like that as long as you do 'pictures and pages'.
Re:Guess... (Score:2)
The stats from the article indicate not overloaded. Therefore, they must have slow bandwidth.
Re:/. server admins? (Score:2)
Maybe the /. team needs to study this article and learn a few things.
Second this. (Score:2)
Re:What a joke! (Score:2)
HE IS RIGHT! (Score:2)
There is a problem with the page you are trying to reach and it cannot be displayed.
</I>
Ummm did tey get their article shlashdotted? That is what you get when you brag!
On the more serious note i find it very interesting that the other article gave an average load of 21% whereas now they are somewhere in oblivion. Could this have anything to do with the timr that the article is posted? (IE: If they post it 9:00 PST or when majority os slahdotters are online and the first article they saw was this one?)
Re:Time to break the record?-OR make it BURRRN! (Score:2)
Aaaah, the one that got away
Re:Hit != Page Request? (Score:2)
Re:jeez (Score:2)
--Mike--
Re:um... and now it's slashdotted? (Score:2)
'the ironing is delicious.'
- bart simpson