New Breed Of Web Accelerators Actually Work 323
axlrosen writes "Web accelerators first came around years ago, and they didn't live up to the hype. Now TV commercials are advertising accelerators that speed up your dial-up connection by up to 5 times, they say. AOL and EarthLink throw them in for free; some ISPs charge a monthly fee. Tests by
PC World, PC Magazine and CNET show that they do speed up your surfing quite a bit. They work by using improved compression and caching. The downside is they don't help streaming video or audio." And they require non-Free software on the client's end, too.
rproxy -- also actually works, and open source (Score:5, Informative)
Just remember (Score:5, Informative)
Methods (Score:3, Informative)
Compression.. Now there's something! I have in the past used an ssh tunnel (with compression switched on) to my university's web proxy, and that sped up things quite a bit! Why isn't this switched on by default on my PPPoA connection? Doesn't apache handle gzip'ing these days? Doesn't seem to be used much, though.. This speed up might be less pronounced on dial-up links though, because POTS modems usually switch on compression anyway (again YMMV).
Some download accelerators simply download different chunks of the same file in multiple sessions from either one server (shouldn't matter - unless with roundrobin DNS) or even from mirrors (better!). That's quite effective as well, but we know this, and that's why we use bittorrent for big files, don't we?
But it has to be said.. Most download accelerators are just bloaty spyware and don't do *zilch* to help your download speed.. Feh!
Didn't AOL use to convert GIF graphics to their own, lossy,
Re:Just remember (Score:5, Informative)
For a given representation these are all compressed. However in all cases these have lossy compression, where you can degrade the quality of the final output and send a smaller bitrate over the wire. Want me to prove my point... Take your favorite CD quality MP3 - lets say the track is 100 K. Now take it and convert the quality to minimum quality - the file will be like 20 K now (if even that much)... you can still hear what is going on... but the quality will suck. Can do the same thing with the rest of the compressed formats as well.
Cache the Suckage (Score:3, Informative)
Basically, it proxied all requests through that ISP on port 80. If it found a request to an IP or sitename it had visited before, it tried to serve it out of cache. If it didn't, it proxied the result through and returned the results from the requested IP or sitename.
The problems:
The server had a difficult time with virtual hosting of any kind. About 4 out of 5 requests to a virtual host would go through. About 20% of the time, there was some critical piece of information that the cache server would mangle so that the vhost mechanism would be unable to serve the right data. This was a couple years ago, so bugfixes might have happened. Maybe.
The server definitely had a hard time with dynamic content that wasn't built with a GET url (thus triggering the pass-thru proxy). If the request was posted, encrypted, hashed, or referenced a server side directive of some kind (server-side redirects were a nasty) the cache would fail. A server side link equating something like "http://www.server.net/redirect/" to a generated URL or dynamic content of some kind was the most frequent case we rean into with this. The server simply couldn't parse each and every http request or every variety and try to decide if it should pass-thru or not. I can't think of a logical way around this that wouldn't break any given implimentation. Can you?
We used dynamically assigned IPs at the time, so proxy requests made from one PC were often returned erroneously to another assuming the IP changed between usage. Say a modem hangup, etc. This was a rare event, but I listened to at least one person complaining that he was getting someone else's Hotmail. The fix to this is either to blacklist sites from being cached-- infeasible for every site that could possibly be requested-- or assign static IPs. DHCP broadband users may have similar problems, especially for those who have new IPs every so often.
Finally, if something got corrpted on the cache server due to disk error, stalled transfer, or some other reason, the sever had little or no way to throw out the bad data. It would throw out data that it *knew* was corrupt due to unfinished downloads, etc... , but often times this check failed or data was assumed to be correct even when it wasn't. Everyone who requested the same piece of corrupt data got it. I had to answer this statement a few times. "I downloaded it on one computer connected to your ISP and got a bad download. I downloaded it on my other computer from the same ISP and got the same bad download. Then I connected to another ISP from the first computer and got a complete download. What's up wit' dat, yo?"
Cache servers are a bad idea. The very idea is to try to be an end-all be-all to everyone who uses them. There are bug-fixes to some of the problem, but no way to solve the essential problem of the fact that MOST data on the web is dynamic now. Using cache servers with dynamic data is inviting difficulty and problem.
Free Web Accelerators (Score:3, Informative)
Re:Cache the Suckage (Score:1, Informative)
Try free software next time, it often works better.
Re:You mean... (Score:5, Informative)
Sounds pretty much like that... Which Apache already supports, and the major browsers already support, making something like this redundant.
Moreover, dialup modems already use a fairly high level of compression at the hardware layer. While not exactly "gzip -9" quality, you can only realistically squeeze another 10% out of those streams no matter how much CPU power you devote to the task.
Others have mentioned image recompression, which has traditionally used VERY poor implementations, nothing more than converting everything to a low quality JPEG. I would point out that a more intelligent approach to image compression could yield a 2-3x savings without noticeable loss of quality (smoothing undifferentiated backgrounds, stripping headers, drop the quality a tad (ie, to 75-85%, not the 20-40% AOL tried to pass off as acceptible), downgrading the subsampling on anything better than 2:2, etc). But no, not a 5x savings.
There is an RFC for this (Score:1, Informative)
RFC 2394 - IP Payload Compression Using DEFLATE
www.faqs.org/rfcs/rfc2394.htm
Re:tradeoff (Score:3, Informative)
If you choose the highest speed (and hence the greatest compression), the image quality is downright poor. On PCMag
Doesn't sound lossless at all to me. If admins knew what they were doing the HTML,CSS, etc would already be compressed with mod_gzip or the little compression checkbox in IIS.
Re:You mean... (Score:3, Informative)
In fact, using mod_gzip I've experienced a very discernible speed-up in access and display over DSL!
Re:Methods (Score:3, Informative)
Wrong. Most of the larger residential ISPs probably do, but mine certainly doesn't, and of the last four ISPs I worked for, only two did any caching at all, and one of those only did caching in certain limited situations.
The downside is that whenever I've used an ISPs squid proxy, it slowed things down! Turning proxies off almost invariably helps speeds, in stead of hurting them.
Hogwash. I've been using a caching proxy server on my LAN for the past several years, precisely because it increases download speed. Now that I'm using broadband the speed increase isn't noticeable, but on dialup it was - somehow Squid is actually more efficient than Netscape or IE at downloading pages. Of course, with multiple computers you get the caching benefits on top of that.
If your downloads go slower through the proxy, it's because the proxy server is overloaded and your ISP needs to upgrade it - not because proxy servers inherently slow things down.
Plus, if the proxy goes down, you can still use the web.
Of course routing web traffic through a proxy server adds an additional single point of failure, in addition to the other points of failure already in place.
Compression.. Now there's something! I have in the past used an ssh tunnel (with compression switched on) to my university's web proxy, and that sped up things quite a bit! Why isn't this switched on by default on my PPPoA connection? Doesn't apache handle gzip'ing these days? Doesn't seem to be used much, though..
Compression is the main thing these "web accelerators" do. Your ISP runs the server and you run the client, and data exchanged between the two is compressed.
To compress your PPPoA connection would require the router where your connection terminates to compress and decompress ATM cells as they're tunneled inside a PPP link. That's not necessarily very efficient. It would require extra processing at both ends, and may not result in that significant a speed improvement. You probably want higher level compression - fitting more data into a packet, instead of making the packets smaller.
And yes, mod_gzip is available for Apache. Slashdot is using it, didn't you notice? No? Then quit complaining about how little it's used.
Re:because IIS's is garbage (Score:3, Informative)
I used to host Slashcode based sites. The default home page was about 50k. With mod_gzip, it literally got down to about 6k. Really sweet!
Re:Non-Free software? (Score:2, Informative)
Get Broadband (Score:2, Informative)
Where I live, DSL is $30 a month and cable is $40 a month. How can you argue with that?
Because (Score:2, Informative)
Re:I still don't understand (Score:3, Informative)
--jeff++
Re:You mean... -- You miss the point (Score:2, Informative)
A service like this that acts as a compression proxy can dramatically knock down the size of content. We implemented this at my last client and saw 78-93% compression of everything other than images. That includes css, javascript, dynamic web content, etc. I don't know about images, but this alone is very significant for today's clumsily table-laden pages.
Useful for portable connections unlike DSL (Score:3, Informative)
Likewise I often find myslef in some crappy hotel where the connection is so noiesy I can barely squeeze a 14.4K connection out of it. I just want to check my web based e-mail not download the encyclopedia britannica
so anything that can make a dialup work painlessly on common web pages is a good thing.
Re: Awwww boo hoo (Score:5, Informative)
Running Squid [squid-cache.org] with a 256mb ram disk cache is all the speedup we need, and it does so without altering the data being fed from upstream.
Re:I still don't understand (Score:2, Informative)
But you're right: there's no reason not to use it if you can. The bandwidth savings can be massive, saving $$.
Re:Faster porn? (Score:4, Informative)
2) If you're already using compression (stac, predictor, MPPC, etc.), this will make ZERO difference. The cache has to be on the near side of the slowest link -- which is the dialup user's modem. Now, in the instances where the ISP disables software compression -- like, for instance, the "idiots" at Bellsouth.Net who disable CCP to "speed up connection times" [exact words of the Cisco engineer who helped them set things up] (the time it takes to connect and pass traffic is 100% modem training. For us ISDN users, 3 of the 3 seconds it takes to connect are IPCP; I'll accept that as they do tend to return the same IP most of the time) -- it'll help some.
3) A lot of what's moving around the internet isn't measurablly compressable... GIFs, JPGs, mpegs, zip files, etc. (I shall have to perform an analysis.)
Re:You mean... (Score:5, Informative)
Because v.42bis has a maximum compression ratio of 4:1 (MNP5 only does 2:1).
Now, for a file of all zeros, hey, I agree, you can do a lot better. So, how often do you download files containing nothing but zero? For a typical text file, you might get better than 90% with gzip (while still only 75% from v.42bis). But from binary content? very rarely better than 50%.
In any case, web content consists of five basic types of information - Text, graphics, sound, multimedia (flash, MPEGs, AVIs, etc), and already-Zipped packages.
Of those, only the first benefits from any lossless method, and only the second really leaves any room for saving bits via lossy compression without horrible loss of quality at the same time. (Some of the fourth type could also possibly endure lossy compression, but takes far too long to recompress on-the-fly).
Unfortunately, text comprises the least bothersome (in terms of relative size) of all of those major types of web content.
Don't get me wrong, I fully encourage people to turn mod_gzip on in their Apache installs. But When a company hawks its product with claims that simply cannot occur in a normal webbrowsing situation, I have to call foul.
I see only two situations whereby they could claim 5:1 compression - Either VERY text-heavy material, such as reading something from Project Gutenberg, or they strip every possible non-critical image from a page. I already do the latter via my hosts file and a paranoid userContent.css, so what does that leave?
Hope you only like reading text, in which case, have you ever heard of "Gopher"?
Re:Well, actually . . . (Score:2, Informative)
Re:I still don't understand (Score:2, Informative)
I wrote/am writing a rather nifty chat script [no-hype.com] that runs realtime through http just by keeping the connection open and flush()ing at the end of the main loop, with mod_gzip installed you would never see anything until you interrupted the transfer (by hitting stop, or back - effectively ending the script and freeing up the server to transmit the data).
mod_gzip is cool for cats, but it's not without its drawbacks.
Comment removed (Score:2, Informative)
compression with PHP (Score:5, Informative)
If you have a recent version of PHP, you don't even need mod_gzip. Just put the following lines in your .htaccess file:
php_flag zlib.output_compression onDoes everything on the fly. I once had a shell script that would wget a url with the accept encoding gzip header, and then wget it again without and show the percent savings. Was fairly interesting to see what sites were using compression, and what sites that weren't could have saved in bandwidth by using compression.
Congratulations: you're clueless. (Score:3, Informative)
Wow. I've never encountered anyone who has spoken so confidently on a topic without knowing a damn thing about it it. You are so absolutely wrong that the example you gave is the exact oposite of what you think it is. You've demonstrated precisely what not to do when writing good markup.
I don't even know where to begin to correct your cluelessness. Here it goes...
When I talk about writing good markup, I'm speaking a purely structural sense. In otherwords, markup text based on what it is, not what it looks like. Separation of content from presentation is the principle to follow.
For example, this is bad HTML:
That's bad for many reasons.
Now, here's good HTML:
And then a stylesheet:
Now, the user can easily define what stuff should look like based on what it is. A user could supply a stylesheet that increases the size of paragraph text by a percentage. A user could specify their own margins for list items. A user could eliminate colors or adjust contrast to their liking.
On top of that, a machine can read that markup and know precisely what it is. This is a header, that's a paragraph, this is a list, etc. A search engines and screenreaders would know precisely what to do with the text.
Furthermore, the styling data is in one location, so if you have 1 or a 1000 pages, the work is the same to make them look different.
Of course, this is only the tip of the iceberg. If you use XHTML and CSS properly, you get more visual flexibility than otherwise. (This also implies discarding tables for use in layout in favor of layers. Tables are for organizing tabular data, not positioning things visually.) XHTML also gives you forwards c
Re:Cache the Suckage (Score:3, Informative)
I worked at a local ISP who managed to get a demo for a cache server a while back. (I don't anymore.) The machine arrived. We plugged it in, and started to take tech calls.
Sounds like this was your mistake. You "managed to get a demo" speaks volumes. Sounds like an expensive proprietary product from a small company. If you had just downloaded Squid, I don't think you would have encountered all these problems.
The server had a difficult time with virtual hosting of any kind. About 4 out of 5 requests to a virtual host would go through. About 20% of the time, there was some critical piece of information that the cache server would mangle so that the vhost mechanism would be unable to serve the right data. This was a couple years ago, so bugfixes might have happened. Maybe.
Sounds like it only supported IP-based virtual hosting. Back in the day, most sites would have had separate IPs for everything they hosted, so that percentage sounds right. The 20% that failed needed a "Host: www.virtual.com" header.
The server definitely had a hard time with dynamic content that wasn't built with a GET url (thus triggering the pass-thru proxy). If the request was posted, encrypted, hashed, or referenced a server side directive of some kind (server-side redirects were a nasty) the cache would fail. A server side link equating something like "http://www.server.net/redirect/" to a generated URL or dynamic content of some kind was the most frequent case we rean into with this. The server simply couldn't parse each and every http request or every variety and try to decide if it should pass-thru or not. I can't think of a logical way around this that wouldn't break any given implimentation. Can you?
First, I think you mean a POST query would trigger the pass-through. GET is the normal method.
Second, there are pretty simple ways for triggering a cache or not. The full rules are here [w3.org], but very roughly a page should be cached if and only if:
If one is in the cache other than the Expires (which doesn't even need to be checked), it will query the server and check if the content is the same as before (with a special header that instructs the server not to send anything if it's unchanged).
This is a nice rule because webservers tend to automatically set the Last-modified: date on plain files and never do on dynamic stuff, so you have to explicitly add it in your dynamic code after considering it. So it gets most of the static stuff automatically (and that's generally the big stuff - images) but is cautious with anything dynamic. That's the correct approach.
This scheme really only breaks down when clock used by the requester (cache server in this case, browser also) or by the content generator (possibly the webserver, but also maybe the desktop used to generate stuff and then upload to the webserver) is skewed. And even then, it just gives stale data; not one person logged in as someone else. And this problem occurs even without a caching server, since browsers implement caching by themselves. Really, having a correct time is important to lots of things on computers. For example, don't ever try to do development without your clock being correct; build tools will be completely unable to tell what's up-to-date and what's not. Ticket-based network authentication systems (like Kerberos implementations, including Microsoft's shiny new ActiveDirectory) will refuse to log you on. etc, etc.
We used dynamically assigned IPs at the time, so proxy requests made
Free software for faster browsing (Score:3, Informative)
If you have a shell account on another machine, and that machine has access to a proxy server, then you can tunnel port 3128 or 8080 (common http proxy ports) through ssh. This makes browsing a lot quicker because there is only a single TCP/IP connection going over the modem link - you don't have to connect separately for each page downloaded. Unfortunately I found that while this gave very fast browsing for half an hour or so, eventually it would freeze up and the ssh connection would have to be killed and restarted. Perhaps this has been fixed with newer OpenSSH releases.
RabbIT [sourceforge.net] is a proxy server you can run on the upstream host which compresses text and images (lossily).
The author of rsync mentioned something about an rsync-based web proxy where only differences in pages would be sent, but I don't know if this program was ever released.
MS does this already on any connection (Score:2, Informative)
A drawback used to be that the server at the provider side was often overloaded, so I set up several "accounts" to switch between hard- and software compression with and without proxy. Now that my ISP is no longer "free", I haven't seen the server become overloaded any longer, so I use software compression and their proxy all the time. HTML and text download at 20kB/s over a 48kbps connection. Off course, there's no gain in already-compressed content like images and audio.
rproxy (rsync) (Score:2, Informative)
The idea is to store Web pages on your hard drive upon your first visit to the pages, and then to limit the information you download on subsequent visits to those pages to only the data that changes, making for a faster download.
Makes me think about: http://rproxy.sourceforge.net/ [sourceforge.net].