Slashdot is powered by your submissions, so send in your scoop

 



Forgot your password?
typodupeerror
×
The Internet

High-Performance Web Server How-To 281

ssassen writes "Aspiring to build a high-performance web server? Hardware Analysis has an article posted that details how to build a high-performance web server from the ground up. They tackle the tough design choices and what hardware to pick and end up with a web server designed to serve daily changing content with lots of images, movies, active forums and millions of page views every month."
This discussion has been archived. No new comments can be posted.

High-Performance Web Server How-To

Comments Filter:
  • 10'000 RPM (Score:3, Insightful)

    by Nicolas MONNET ( 4727 ) <nicoaltiva@gmai l . c om> on Saturday October 19, 2002 @07:10AM (#4484221) Journal
    The guys use 10'000 RPM drive for "reliabilit" and "performance" ... 10k drives are LESS reliable, since they move faster. Moreover, they're not even necessarily that faster.
  • by Ed Avis ( 5917 ) <ed@membled.com> on Saturday October 19, 2002 @07:11AM (#4484226) Homepage
    Computer hardware is so fast relative to the amount of traffic coming to almost any site that any web server is a high-performance web server, if you are just serving static pages. A website made of static pages would surely fit into a gigabyte or so of disk cache, so disk speed is largely irrelevant, and so is processor speed. All the machine needs to do is stuff data down the network pipe as fast as possible, and any box you buy can do that adequately. Maybe if you have really heavy traffic you'd need to use Tux or some other accelerated server optimized for static files.

    With dynamically generated web content it's different of course. But there you will normally be fetching from a database to generate the web pages. In which case you should consult articles on speeding up database access.

    In other words: an article on 'building a fast database server' or 'building a machine to run disk-intensive search scripts' I can understand. But there is really nothing special about web servers.
  • by mumblestheclown ( 569987 ) on Saturday October 19, 2002 @07:18AM (#4484239)
    From the article:

    If we were to use, for example, Microsoft Windows 2000 Pro, our server would need to be at least three times more powerful to be able to offer the same level of performance.

    "three times?" Can somebody point me to some evidence for this sort of rather bald assertion?

  • by grahamsz ( 150076 ) on Saturday October 19, 2002 @07:18AM (#4484243) Homepage Journal
    The article seemed way too focused on hardware.

    Anyone who's ever worked on a big server in this cash-strapped world will know that squeezing every last ounce of capacity out of apache and your web applications needs to be done.
  • by Ed Avis ( 5917 ) <ed@membled.com> on Saturday October 19, 2002 @07:24AM (#4484254) Homepage
    I know that in the server market you often go for tried-and-tested rather than latest-and-greatest, and that the Pentium III still sees some use in new servers. But 1.26GHz with PC133 SDRAM? Surely they'd have got better performance from a single 2.8GHz Northwood with Rambus or DDR memory, and it would have required less cooling and fewer moving parts. Even a single Athlon 2200+ might compare favourably in many applications.

    SMP isn't a good thing in itself, as the article seemed to imply: it's what you use when there isn't a single processor available that's fast enough. One processor at full speed is almost always better than two at half the speed.
  • by khuber ( 5664 ) on Saturday October 19, 2002 @07:37AM (#4484282)
    With dynamically generated web content it's different of course. But there you will normally be fetching from a database to generate the web pages. In which case you should consult articles on speeding up database access.

    I'm just a programmer, but don't big sites put caching in front of the database? I always try to cache database results if I can. Honestly, I think relational databases are overused, they become bottlenecks too often.

    -Kevin

  • Almost (Score:2, Insightful)

    by Anonymous Coward on Saturday October 19, 2002 @07:37AM (#4484284)
    > One processor at full speed is almost always better than two at half the speed.

    You can safely drop that 'almost'.
  • by khuber ( 5664 ) on Saturday October 19, 2002 @07:46AM (#4484296)
    Well, not to mention that high traffic sites usually have a bunch of webservers and then a load balancer in front of them. This article obviously isn't for big league web serving.

    -Kevin

  • by chrysalis ( 50680 ) on Saturday October 19, 2002 @07:59AM (#4484317) Homepage
    The article is about *WEB* high performance.

    I don't see your point. "ping" has never been designed to benchmark web servers AFAIK.

    My servers don't answer to "ping". Does it mean that the web server is down? Noppe... it's up a running...

    "ping" is not an all-in-one magic tool. By using "ping" you can test a "ping" server. Nothing else.
  • Quick howto (Score:1, Insightful)

    by Klerck ( 213193 ) on Saturday October 19, 2002 @08:37AM (#4484374) Homepage
    Here's a quicker howto.

    Get the fastest AthlonXP out there.
    Get a motherboard with onboard SCSI.
    Get 15,000RPM SCSI 160MB/s drives
    Get a NIC
    Install linux
    Install apache
    Install mysql, php, perl, etc.

    And there you have it. Is it really necessary to write a long article when all you're basically saying is "get the fastest hardware out there and slap it into one machine"? Come on folks.
  • by NineNine ( 235196 ) on Saturday October 19, 2002 @08:38AM (#4484378)
    Good databases are designed for performance. If databases are your bottleneck, then you don't know what you're doign with the database. Too many people throw up a database, and use it like it's some kind of flat file. There's a lot that can be done with databases that the average hack has no idea about.
  • by NineNine ( 235196 ) on Saturday October 19, 2002 @08:41AM (#4484383)
    You're absolutely right. Wish I had some mod points left...

    Hardware only comes into play in a web app when you're doing very heavy database work. Serving flat pages takes virtually no computing effort. It's all bandwidth. Hell, even scripting languages like ASP, CF, and PHP are light enough that just about any machine will work great. The database though... that's another story.
  • by januschr ( 118746 ) on Saturday October 19, 2002 @08:48AM (#4484395)

    The article seemed way too focused on hardware.

    Well the name of the website is "Hardware Analysis"... ,-)

  • by jimfrost ( 58153 ) <jimf@frostbytes.com> on Saturday October 19, 2002 @09:43AM (#4484546) Homepage
    Yea, big Suns are too expensive and you do need to keep the server count high enough that a failure or system taken down for maintenance isn't a really big impact on the site. I mentioned in a different posting that my cut on this is that the midrange Suns, 4xxx and 5xxx class, provide good bang-for-the-buck for high-volume sites.

    Beware of false economy when looking at hardware. While it's true that smaller boxes are cheaper, they still require about the same manpower per box to keep them running. You rapidly get to the point where manpower costs dwarf equipment cost. People are expensive!

    Capacity is an issue. We try to plan for enough excess at peak that the loss of a single server won't kill you, and hope you never suffer a multiple loss. Unfortunately most often customers underequip even for ordinary peak loads, to say nothing of what you see when your URL sees a real high load.[1] They just don't like to spend the money. I can see their point, the machines we're talking about are not cheap; it's a matter of deciding what's more important to you, uptime and performance or cost savings. Frankly most customers go with cost savings initially and over time (especially as they learn what their peak loads are and gain experience with the reliability characteristics of their servers) build up their clusters.

    [1] People here talk about the slashdot effect, but trust me when I tell you that that's nothing like the effect you get when your URL appears on TV during "Friends".

  • by Anonymous Coward on Saturday October 19, 2002 @10:00AM (#4484586)
    The thing is that "a couple of hunderd" clients isn't actually High Performance Web Serving. Maybe it is to your target overclocker-fan-boy audience, but to Slash-folk that's nothing...

    The lack of system setup detail isn't good. Too many variables there. Apache2 may have been a better choice for this too...

    BTW, you're prossibly disk io (requests not bandwidth) limited by your IDE RAID. Make sure atime is turned off - no point recording it for no good reason. Do what ever youcan to minimise disk io, because your IDE RAID is done in software (and if you use Promise drivers, stiff bikkies when you need to upgrade your kernel...)

    A high "load" isn't much good info-wise either... what does "sar" have to say? Where is the "load" being generated???

  • Re:Apache 1.3x? (Score:4, Insightful)

    by GoRK ( 10018 ) on Saturday October 19, 2002 @10:08AM (#4484598) Homepage Journal
    Their IDE-RAID is actually software RAID. The SCSI myth can go off the shelf, sure, but don't take the RAID myth down.

    The promise FastTrak and Highpoint and a few others are not actually hardware RAID controllers. They are regular controlers with enough firmware to allow BIOS calls to do drive access via software RAID (located in the firmware of the controller), and OS drivers that implement the company's own software RAID implementation at the driver level, thereby doing things like making only one device appear to the OS. Some of the chips have some performance improvements over a purely software RAID solutions, such as the ability to do data comparisons between two drives in a mirror during reads, but that's about it. If you ever boot them into a new install of windows without preloading their "drivers", guess what? Your "RAID" of 4 drives is just 4 drives. The hardware recovery options they have are also pretty damned worthless when it comes to a comparison with real RAID controllers - be they IDE or SCSI.

    A good solution to the IDE RAID debacle are the controllers by 3Ware (very fine) or the Adaptec AAA series controllers (also pretty fine). These are real hardware controllers with onboard cache, hardware XOR acceleration for RAID 5 and the whole bit.

    Anyway, I'm not really all that taken aback that this webserver is floundering a bit, but seems really responsive when the page request "gets through," so to speak. If it's not running low on physical RAM, it's probably got a lot of processes stuck in D state due to the shit promise controller. A nice RAID controller would probably have everything the disks are thrashing on in a RAM cache at this point.

    ~GoRK
  • I think the big problem here is the tendency to DBify EVERYTHING POSSIBLE.

    Like the State field in an online form.

    Every single hit requires a tag to the databases. Why?

    Because, heck if we ever get another state, it'll be easy to update! Ummm, that's a LOT of cycles used for something that hasn't happened in, what, 50 years or so. (Hawaii, 1959)
  • by Anonymous Coward on Saturday October 19, 2002 @11:06AM (#4484741)
    I'm sorry, but if your server cannot handle 2000 connections then NineNine is right, you have a crappy backend. How is the fact that you have Flash animation relevant? Isn't a 200k flash animation the same as a 200k jpeg from the server's point of view? If your server cannot handle 2000 connections, what business do you have writing an article about "high performance" webservers? It would be a different story if you entitled it "high performance webserver for less than $1000," but you didn't.

    Personally I think the new trend on Slashdot of "hey, I saw this article about ____, it's really insightful and just great!" being submitted by the author of that article is sort of shitty. If anybody knows about building a high traffic webserver, it would be Slashdot, so you'd think they'd be a little pickier about what they post regarding high performance servers.
  • by Anonymous Coward on Saturday October 19, 2002 @11:12AM (#4484751)
    I'll just mention a couple of items:

    1) For a high performance web server one *needs*
    SCSI. SCSI can handle multiple request at one time and performs some DISK related processing compared to IDE that can only handle request for data single file and uses the CPU for disk related processing a lot more than SCSI does.

    SCSI disk also have higher mean times to failure than SCSI. The folks writting this article may have gotten benchmark results showing their RAID 0+1 array matched the SCSI setup *they* used for comparison, but most of the reasons for choosing SCSI are what I mention above -- not the comparitive benchmark results.

    2) For a high performance webserver, FreeBSD would be a *much* better choice than Redhat Linux. If they wanted to use Linux, Slackware or Debian would have been a better choice than Redhat Linux for a webserver. Ask folks in the trenches, and lots will concur with what I've written on this point due to mainenance, upgrading, and security concerns over time on a production webserver.

    3) Since their audience is US based, It would make sense to co-lo their server in the USA. Both from the standpoint of how many hops packets take from their server to their audience, and from the logistical issues of hardware support -- from replacing drives to calling the data center if there are problems. Choosing a USA data center over one in Amsterdam *should* be a no brainer. Guess that's what happens when anybody can publish to the web. Newbies beware!!

  • by drouse ( 34156 ) on Saturday October 19, 2002 @11:42AM (#4484844) Homepage
    I wouldn't worry too much.

    Probably 90% of all non-profit websites could be run off a single 500 MHz computer and most could be run from a sub 100 MHz CPU -- especially if you didn't go crazy with dynamic content.

    A big bottleneck can be your connection to the Internet. The company I work for once was "slashdotted" (not by slashdot) for *days*. What happened was our Frame Relay connection ran at 100%, while our web server -- a 300 MHz machine (running Mac OS 8.1 at the time) had plenty of capacity left over.
  • by Anonymous Coward on Saturday October 19, 2002 @12:24PM (#4485012)
    This is exactly the stuff you CACHE! But there are VERY GOOD REASONS for putting state/country data in the database.
  • by Anonymous Coward on Saturday October 19, 2002 @12:37PM (#4485069)
    Too bad "millions of page views every month" is simply not even in the realm that would require "High-Performance Web Server"(s). These guys need to come back and write an article once they've served up 5+ million page views per day. Not hits. Page views.
  • Re:10'000 RPM (Score:5, Insightful)

    by Syre ( 234917 ) on Saturday October 19, 2002 @03:48PM (#4485898)
    It's pretty clear that whomever wrote that article has never run a really high-volume web site.

    I've designed and implemented sites that actually handle millions of dynamic pageviews per day, and they look rather different from what these guys are proposing.

    A typical configuration includes some or all of:

    - Firewalls (at least two redundant)
    - Load balancers (again, at least two redundant)
    - Front-end caches (usually several) -- these cache entire pages or parts of pages (such as images) which are re-used within some period of time (the cache timeout period, which can vary by object)
    - Webservers (again, several) - these generate the dynamic pages using whatever page generation you're using -- JSP, PHP, etc.
    - Back-end caches (two or more)-- these are used to cache the results of database queries so you don't have to hit the database for every request.
    - Read-only database servers (two or more) -- this depends on the application, and would be used in lieu of the back end caches in certain applications. If you're serving lots of dynamic pages which mainly re-use the same content, having multiple, cheap read-only database servers which are updated periodically from a master can give much higher efficiency at lower cost.
    - One clustered back-end database server with RAID storage. Typically this would be a big Sun box running clustering/failover software -- all the database updates (as opposed to reads) go through this box.

    And then:

    - The entire setup duplicated in several geographic locations.

    If you build -one- server and expect it to do everything, it's not going to be high-performance.

For God's sake, stop researching for a while and begin to think!

Working...