Slashdot Log In
Squid, FreeBSD Rock the House at Caching Bake-Off
Posted by
timothy
on Tue Feb 29, 2000 10:34 PM
from the baked-with-pride-seafood dept.
from the baked-with-pride-seafood dept.
Blue Lang writes: "Saw on the squid mailing list today that the results of the second polygraph Web-cache benchmarks are in, and squid on FreeBSD captured a few top marks, as well as performing exceptionally well overall. Interesting reading, especially as a comparison of free and open systems versus some very well-architected proprietary solutions."
This discussion has been archived.
No new comments can be posted.
Squid, FreeBSD Rock the House at Caching Bake-Off
|
Log In/Create an Account
| Top
| 159 comments
(Spill at 50!) | Index Only
| Search Discussion
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
(1)
|
2
(1)
|
2
Re:Rob - you learn anything from this article? (Score:4)
1.
2. Last I checked
3. Given (2) above and the highly dynamic nature of
4. As I said
As for
Re:It didn't win. (not flamebait!) (Score:3)
Re:Squid and Akamai (Score:3)
The CTO of Akamai is Daniel Lewin; his bio page at Akamai [akamai.com] says nothing about Squid.
You may, perhaps, be thinking of Peter Danzig, who is the VP of Technology at Akamai; his bio page at Akamai [akamai.com] says:
I think the Squid project was originally derived from the Harvest cache; the NetApp NetCache software was also originally Harvest-derived, although much, perhaps most, of it was done at Internet Middleware (a company founded by Peter and bought by NetApp) and NetApp. (I suspect much of Squid might also be non-Harvest code.)
It didn't win. (not flamebait!) (Score:4)
No offense, but you call that winning? It lost to it's competitors categorically and across the board - hits, latency, cost/performance.. what's the good news? Anyone?
Why the BSD vs Linux flames? (Score:5)
Out of sheer curiousity I tried out freeBSD. Their kernel is incredible. I know that the bench marks aren't there to show it but their "claims" are true.
Their TCP/IP stack is better, loads can be handled with ease even on a extremely low-end systems and their memory management is out of this world. I was impressed at how fast my shitty unix boxes went.
Now I know that linux heads like myself would become defensive but linux has made big improvements and a lot of issues are being addressed with the next 2.4 kernel. Their "claims" will be seriously tested soon.
I have decided to go back to linux because I prefer it. There's more software and it makes a better desktop for me. Plus it is stable enough, user friendly enough, fast enough and damn good!
However, freeBSD is a great unix OS and the only way to find out is to try any BSD yourself. Even a linux head like me can defend freeBSD.
Keep up the good work to all BSD contributers
My experiences with Squid (Score:5)
We have roughly 100 machines on our network, and Internet access was coming to a standstill - especially when everyone in the computer lab was on the Internet. Imagine a 128Kb/s fractional T1 with 25 *active* users all trying to look at mega-image-rich content, plus some other users on campus accessing the Internet at the same time (can you say sub-300 baud and ping times measured in whole-second increments?). I was having to pre-load web sites before a class came into the computer lab because just loading the first page could take roughly five minutes on a good day.
Then I configured and installed a Squid server on a rejuvinated Compaq Deskpro running Linux 2.2 that was donated with the above said specs. I was a little sketchy to implement it across the entire campus at first because I had always heard that proxy servers were a Bad Thing. So I silently pointed browsers to the Squid machine in a few classrooms to see if I would hear anything from anyone. I got calls from people that very day. They were asking me how I had finally coaxed our school district into buying us such a fast connection!
As it goes, the more classrooms I pointed to the proxy server, the faster things got (as the cache was growing and the hit rate was increasing), and the more happy teachers I had. In a school situation, many sites are visited multiple times by different students and classrooms. In the computer lab, every computer often visits the same site as a class. So having a caching-proxy server helps a great deal! I really believe that every school with less than a T1 should have one.
As for statistics, I have an average 'hit' rate of well over 80% because of the multiple viewings of sites. Initially I had 2GB set aside for caching purposes (on an IDE Samsung 2.1GB drive), and I found that as it reached its capacity the server just got way too slow. So first I brought it down to 1.5GB, and now I have it at 1GB (I may even take it to 750MB). It has been running pretty fast at 1GB - by far compared to not having a caching-proxy server at all, but I do see the performance start to degrade at about 750MB with my particular hardware.
Sure, faster server hardware would be *great* and is probably necessary to handle our unusually heavy load due to all of the graphics content on the visited sites, but right now that just isn't an option because we live on donations. My point is that even though we are running Squid on such a crappy box, it has worked wonders on our network. Internet access seems very fast now, whereas before it was almost unbareable. And most importantly people are happy and making use of the technology we have to its fullest extent, where as before they may not have been able to do this. I must admit though that I am writing grants in hopes of getting a faster/newer box because ours is getting tired and I worry about what will happen when the hardware finally kicks the bucket.
For a school in our situation, Squid is great because it even helps when you're using it on otherwise possibly worthless hardware, and the price is just right.
Anyways, I'd like to thank all who have donated their time on the Squid project, you've done great work and you're helping people more than you realize!
--SONET
http://www.hbcsd.k12.ca.us/peterson/technology
Architecture of Caching to large-scale sites (Score:3)
The technical report can be found at http://www.netapp.com/tech_library/307 1.html [netapp.com]
We would all save a scary amount of bandwidth if more sites were designed with public caches such as (the awesome) squid in mind, and it's a really simple use of headers that make it possible.
For those who use Apache and are interested in making your own sites more cache-friendly, I recommend you look at mod_expires [apache.org], which is part of the default distribution of Apache, although not compiled in by default. If you have large, static images that rarely change, then go ahead and put week-month-year long expiry headers on them, and watch the hits for those redundant images drop right down on your web server. And if you suddenly need to change them, then it's no real problem, as all you have to do is change the images URL and it will become a "new" entity for purposes of caching.
Yeah, granted, bandwidth is getting cheaper now, but for us poor Europeans, it's still a scarce commodity and we need to worry about these things
-anil-
Re:Accelerate your website -- it's awesome! (Score:3)
well-tuned operating systems that eliminate traditional OS overhead for these numbers.
True, but the operating system that Squid was running on (and that's what you were talking about, the operating systems) was FreeBSD, which also runs the iMimic, which captured the highest hits/sec and reqs/sec per $1000. By a large margin. Interestingly enough, the only linux-based entry, the Swell-1000, didn't do very well. Which goes to show you that just because you have a good starting point, doesn't guarantee success.
And, of course, the amazingly expensive Cisco products probably (I don't know, just assuming) do a lot more than just cache -- and are probably a lot more reliable (MTBF) and redundant, which is important if your cache is a vital business component. (And if cache == internet access, then, well, it probably is).
Re:Accelerate your website -- it's awesome! (Score:3)
A connection is destined for "www.excitestores.com", and ends up at the external DS/3 (T3, T1, insert your fast link here) port on our router. The router runs a rule against the packet and says "Hey, this is www traffic bound for the servers that are to be accelerated. Therefore my next hop is (insert IP address of cache here)!". It route-maps it to the cache server as it's next hop. The caching server is set up to "hijack" any incoming connections as if they are destined for itself, and makes the request to the origin web server on behalf of the requesting client. At this point, this does not differ too much from standard forward transparent proxying, except that you normally have an access control list that only permits transparent proxying of a limited set of URL's or IP addresses. You don't want to run an "open proxy" for the world to use to cache whatever they want.
Of course, note here that there are alternate methods of accelerating sites depending on the cache you choose and your infrastructure. The basic idea is to get the packets to your cache instead of the web server, however you choose to do it. Common methods include placing the cache in the natural route of the packets, making the webserver address point to the cache and have a non-public DNS that the cache looks to to resolve a web site on a non-routeable private network, or specifying on the cache that incoming connections on a certain IP are to accelerate a particular origin server.
Anyway, the benefits of this are enormous in our case. We have a (*&$load of modules compiled into our Apache server, tons of virtual hosts and modules to handle them all, and each daemon runs about 12 MB. Each web server has a gigabyte of RAM, therefore you do the math:
1024/12=85 and 1/3 connections run us out of physical RAM on each web server. Realize this is a rough estimate; our web servers can handle much more, but performance degrades quickly with more connections being served from virtual memory. I've also not taken into account OS overhead, other services running on the servers, and any other thing you may think of. However, modem users, particularly, saturate web server connections because it is so slow to deliver objects to them.
CNN.com, for instance, uses ICS caching boxes purely for connection management to handle these slower connections that could bog their servers down. Novell's ICS is rated at over 100,000 simultaneous connections on each box in reverse proxy mode. A big difference from 85 connections for one machine, no?
I'd love to discuss this in more depth, if you require a better answer. Better yet, check the FAQ at Squid's site [squid-cache.org] regarding transparent reverse proxying.
Seriously, this is what takes web sites to the next level, regardless of whether you use Squid, ICS, NetCache, or another type of reverse cache. Keep smiling!
Accelerate your website -- it's awesome! (Score:5)
It's really important to note that IRCache has no desire to point to any "winner" in this bakeoff, but instead to have real non-partisan numbers to point to when evaluating cache performance. Squid captured top honors in cache hit ratios, but nothing else (AFAICT), showing that those "expensive, proprietary systems" also can be very well-tuned operating systems that eliminate traditional OS overhead for these numbers.
One of the frequently overlooked uses of cache is as a web site accelerator, instead of the standard forward proxy. Using a few simple access control lists and a policy on a router, reverse-proxy caches managed to reduce the instantaneous load on our web servers by up to 94%. We serve about 3.5 million hits a day. A "reverse proxy" is an EXCELLENT use of a proxy cache, and after these technology evaluations I've been involved with in past weeks I'd recommend it to anybody considering running a high-traffic website. This allows your Apache servers to function more as the "cgi engine" of your site, and lets the static images, text, banners, etc. be delivered from a box that can handle 100,000 simultaneous connections. Very cool.
While I'm not allowed to post a "review" of any one of these units, because of various agreements for the evaluation boxes we tested, I can clearly state that Squid, NetCache, and ICS-based systems can and will vastly reduce infrastructure scalability costs for businesses when deployed in a reverse-proxy configuration. Our earlier estimates guessed we'd need to expand our web farm three times to handle our estimated load by the end of the year. Now we can reliably predict that our farm can serve 10 times the amount of hits we're running now by using a cache as an accelerator. VERY cool stuff.
Be sure and check out the system configurations in the bakeoff review. It's very illustrative that the boxes tested have VERY specific audiences. Don't be fooled by the "fastest hit response time" or "most throughput" -- you can spend $6,000 or $150,000 for any setup, depending on your needs.
Noticeably absent from the review was Inktomi, for the second year in a row. I'm hearing FUD from vendors that their performance isn't up to snuff-- any truth to these rumors?
My boss will love this article. (Score:3)
When I spec'd it out, all the techies I talked to asked me three questions, this article validates my answers to all three-
My answer to each was two parts:
Semi-Off-Topic
What do I mean by a 'Killer caching proxy'?
A pair of identical (load balancing and transparent failover via BigIP) rackmount servers, each with a PIII 600 CPU, 256MB, 2940UW and 20Gb of disk. And let's not forget the triply-redundant T3's to threee distinct Tier-1 internet providers.
All this just so I can read slashdot.
Distributed caches with 'proxy.pac' (Score:3)
I've done a lot of work with 'proxy.pac [squid-cache.org]' files in the last year- it's amazing how much decision-making power you can put into the autoproxy script, letting the client machine take on some of the responsibilities of smart proxying.
For example, right now I have two distinct sites with their own Squid proxies, users at both sites use identical 'proxy.pac' files. The browser decides whether to go direct or via a proxy based on the host/domain of the destination, then chooses a proxy based on it's own source IP address.
This means that every Netscape and IE browser in the enterpise has the same configuration, and even roaming users will always get their closest proxy server each time they connect.
If a business unit later gets their own internet firewall and proxy, it takes a line or two in the global script, and clients automagically use the new proxy.
You can also specify multiple proxies in the file- if the first one times out, all future requests (until the browser is restarted) will go to the next server in the list.
Now if only Lynx would parse the (javascript) proxy.pac file...
Look a little closer at the numbers people! (Score:4)
Squid showed perfect cacheability (why buy a cache except to cache?), whereas some others in it's price range (except the Swell box also running squid) displayed much lower cacheability. Response times from a lot of boxes were not so good either, while squid's was excellent (the other reason to cache...browsing speed). When you see a box with long response times and low cache hit rate, you are looking at a box that was being pushed WAY too hard. You would not run a cache with 30 or 40% DHR and mean response times of 2 seconds...ideally, you run it such that cacheability is near perfect and response times are very very fast. Squid did that. Microbits didn't.
The Squid team have done a great job with Squid, and it gets better every time around. Even compared to the ICS products (many of which are very very fast these days...but you pay the price for them...ICS on low end boxes suffers a bit), Squid didn't do so bad at all.
Anyway, if you'd like to see some more Squid numbers, we've got a $2139 squid box in the lab doing 110 reqs/sec from dual IDE drives, whereas the Squid team got 160 from a $4k box with 6 SCSI 10k drives. We will be posting pretty specific specs for it sometime in the future so that others who want to roll their own can do so (it takes a lot of work). Some of our recent benchmarks (using Bake-off rules and benches) are posted on the Swell Technology web page. Currently, the posted benches are for a run at 100 reqs/sec. The 110 run will be posted sometime soon.
Those interested in caching should check out the squid devel list lately. Discussion has centered on a couple of new filesystem ideas that should improve squid performance markedly. Fascinating stuff. I suspect the ICS guys will be a little more worried come next bake-off.
Re:Very true (Score:4)
Nothing is a problem once you debug the code.
John Carmack
Almighty squid (Score:4)
Way too many times the open source software is dismissed as sort of a dull knife -- it gets the job done, but doesn't do it in an elegant or efficient way. Take apache for example, how many people rag on apache because of it's focus on compatibility vs its speed?
For Squid, I can't honestly think of a better overall proxy software. If www.proxymate.com can handle the massive amount of traffic it does running Squid on Linux, all but the most stump headed ignoramuses would realize that business needn't drop a couple thousand $$ on a specialized platform.