New Breed Of Web Accelerators Actually Work - Slashdot

Follow Slashdot blog updates by subscribing to our blog RSS feed

×

New Breed Of Web Accelerators Actually Work 323

Posted by timothy on Wednesday September 10, 2003 @05:38PM from the os-dependency dept.

axlrosen writes "Web accelerators first came around years ago, and they didn't live up to the hype. Now TV commercials are advertising accelerators that speed up your dial-up connection by up to 5 times, they say. AOL and EarthLink throw them in for free; some ISPs charge a monthly fee. Tests by PC World, PC Magazine and CNET show that they do speed up your surfing quite a bit. They work by using improved compression and caching. The downside is they don't help streaming video or audio." And they require non-Free software on the client's end, too.

This discussion has been archived. No new comments can be posted.

New Breed Of Web Accelerators Actually Work

Search 323 Comments Log In/Create an Account

Comments Filter:

rproxy -- also actually works, and open source (Score:5, Informative)

by mattbee ( 17533 ) writes: <matthew@bytemark.co.uk> on Wednesday September 10, 2003 @05:46PM (#6925384) Homepage

rproxy [samba.org] is a really interesting project, and back when I tried it over a 56K dial-up connection, it did actually work to speed things up. You sit an rproxy web cache at each end of the dial-up connection (so you need somewhere to deply your custom proxy to make it work, but bear with me...) and then request web pages as usual. Each end caches the pages that pass through it, but the clever part is that when you re-request a page, the proxy at the far end (on the fast connection) can fetch the page and compare with the last copy in the cache. Then it transmits only the differences using the rsync algorithm [samba.org]. Unforunately it's not being actively developed any more given the increasing availability of high-bandwidth connections, and the decreasing fraction of web traffic that is suitable for delta-compression. Shame, since it did seem to be a real "web accelerator" without any of the illusory techniques used by the garish banner-ad accelerators.

Share
twitter facebook
Just remember (Score:5, Informative)

by El ( 94934 ) writes: on Wednesday September 10, 2003 @05:47PM (#6925398)

GIFs, JPEGs, MPEGs, and MP3s are already compressed, so compression doesn't make them any smaller. That really leaves only HTTP, HTML and CSS to benefit from compression. And caching only helps if you're in the habit of looking at the same pages multiple times... so where's the benefit for the average porn-downloading, RIAA-infringing geek? Does it speculatively preread links before I click on them?

Share
twitter facebook
Methods (Score:3, Informative)

by wfberg ( 24378 ) writes: on Wednesday September 10, 2003 @05:51PM (#6925431)

Caching webpages in a proxy is something all ISPs do. The downside is that whenever I've used an ISPs squid proxy, it slowed things down! Turning proxies off almost invariably helps speeds, in stead of hurting them. Plus, if the proxy goes down, you can still use the web. I have no idea why ISP's proxies are so craptastic (YMMV), but in my experience, they are. (BTW, it would help if windowsupdate was cacheable..)

Compression.. Now there's something! I have in the past used an ssh tunnel (with compression switched on) to my university's web proxy, and that sped up things quite a bit! Why isn't this switched on by default on my PPPoA connection? Doesn't apache handle gzip'ing these days? Doesn't seem to be used much, though.. This speed up might be less pronounced on dial-up links though, because POTS modems usually switch on compression anyway (again YMMV).

Some download accelerators simply download different chunks of the same file in multiple sessions from either one server (shouldn't matter - unless with roundrobin DNS) or even from mirrors (better!). That's quite effective as well, but we know this, and that's why we use bittorrent for big files, don't we? ;-) Not such a good approach for webbrowsing btw.

But it has to be said.. Most download accelerators are just bloaty spyware and don't do *zilch* to help your download speed.. Feh!

Didn't AOL use to convert GIF graphics to their own, lossy, .ART format when you used their client? Do they still?

Share
twitter facebook
Re:Just remember (Score:5, Informative)

by MerlynEmrys67 ( 583469 ) writes: on Wednesday September 10, 2003 @05:57PM (#6925495)

GIFs, JPEGs, MPEGs, and MP3s are already compressed
For a given representation these are all compressed. However in all cases these have lossy compression, where you can degrade the quality of the final output and send a smaller bitrate over the wire. Want me to prove my point... Take your favorite CD quality MP3 - lets say the track is 100 K. Now take it and convert the quality to minimum quality - the file will be like 20 K now (if even that much)... you can still hear what is going on... but the quality will suck. Can do the same thing with the rest of the compressed formats as well.

Parent Share
twitter facebook
Cache the Suckage (Score:3, Informative)

by Bonker ( 243350 ) writes: on Wednesday September 10, 2003 @05:58PM (#6925504)

I worked at a local ISP who managed to get a demo for a cache server a while back. (I don't anymore.) The machine arrived. We plugged it in, and started to take tech calls.

Basically, it proxied all requests through that ISP on port 80. If it found a request to an IP or sitename it had visited before, it tried to serve it out of cache. If it didn't, it proxied the result through and returned the results from the requested IP or sitename.

The problems:

The server had a difficult time with virtual hosting of any kind. About 4 out of 5 requests to a virtual host would go through. About 20% of the time, there was some critical piece of information that the cache server would mangle so that the vhost mechanism would be unable to serve the right data. This was a couple years ago, so bugfixes might have happened. Maybe.

The server definitely had a hard time with dynamic content that wasn't built with a GET url (thus triggering the pass-thru proxy). If the request was posted, encrypted, hashed, or referenced a server side directive of some kind (server-side redirects were a nasty) the cache would fail. A server side link equating something like "http://www.server.net/redirect/" to a generated URL or dynamic content of some kind was the most frequent case we rean into with this. The server simply couldn't parse each and every http request or every variety and try to decide if it should pass-thru or not. I can't think of a logical way around this that wouldn't break any given implimentation. Can you?

We used dynamically assigned IPs at the time, so proxy requests made from one PC were often returned erroneously to another assuming the IP changed between usage. Say a modem hangup, etc. This was a rare event, but I listened to at least one person complaining that he was getting someone else's Hotmail. The fix to this is either to blacklist sites from being cached-- infeasible for every site that could possibly be requested-- or assign static IPs. DHCP broadband users may have similar problems, especially for those who have new IPs every so often.

Finally, if something got corrpted on the cache server due to disk error, stalled transfer, or some other reason, the sever had little or no way to throw out the bad data. It would throw out data that it *knew* was corrupt due to unfinished downloads, etc... , but often times this check failed or data was assumed to be correct even when it wasn't. Everyone who requested the same piece of corrupt data got it. I had to answer this statement a few times. "I downloaded it on one computer connected to your ISP and got a bad download. I downloaded it on my other computer from the same ISP and got the same bad download. Then I connected to another ISP from the first computer and got a complete download. What's up wit' dat, yo?"

Cache servers are a bad idea. The very idea is to try to be an end-all be-all to everyone who uses them. There are bug-fixes to some of the problem, but no way to solve the essential problem of the fact that MOST data on the web is dynamic now. Using cache servers with dynamic data is inviting difficulty and problem.

Share
twitter facebook
Free Web Accelerators (Score:3, Informative)

by wang232 ( 601242 ) writes: on Wednesday September 10, 2003 @06:04PM (#6925564)

There are two free software projects building web accelerator proxies. One is RabbIT [sourceforge.net] . The other is ziproxy [sourceforge.net]. They are both web proxies which do not require any special software on the client side. They both compress HTML by gzip, and compress images into lower quality JPEG's. RabbIT is written in JAVA whereas ziproxy is written in C. RabbIT has more features than ziproxy, such as caching and removing ads. Give them a try if you're using a slow line! Disclaimer: I'm a ziproxy user and developer.

Share
twitter facebook
Re:Cache the Suckage (Score:1, Informative)

by Anonymous Coward writes: on Wednesday September 10, 2003 @06:09PM (#6925611)

Pity that your experience with commercial stuff has been so bad... we have run a SQUID proxy at work for the past 5 years, and the problems you mention have not bothered us.
Try free software next time, it often works better.

Parent Share
twitter facebook
Re:You mean... (Score:5, Informative)

by pla ( 258480 ) writes: on Wednesday September 10, 2003 @06:12PM (#6925653) Journal

"Web accelerators"...You mean highly-advanced technology like mod-gzip?

Sounds pretty much like that... Which Apache already supports, and the major browsers already support, making something like this redundant.

Moreover, dialup modems already use a fairly high level of compression at the hardware layer. While not exactly "gzip -9" quality, you can only realistically squeeze another 10% out of those streams no matter how much CPU power you devote to the task.

Others have mentioned image recompression, which has traditionally used VERY poor implementations, nothing more than converting everything to a low quality JPEG. I would point out that a more intelligent approach to image compression could yield a 2-3x savings without noticeable loss of quality (smoothing undifferentiated backgrounds, stripping headers, drop the quality a tad (ie, to 75-85%, not the 20-40% AOL tried to pass off as acceptible), downgrading the subsampling on anything better than 2:2, etc). But no, not a 5x savings.

Parent Share
twitter facebook
There is an RFC for this (Score:1, Informative)

by Anonymous Coward writes: on Wednesday September 10, 2003 @06:18PM (#6925704)

At least one spyware internet accelerator had indications that it did payload compression via RFC 2394. The really evil thing was that it also acted as a proxy for all HTTP and HTTPS traffic and analyzed it all. It knew exactly what you shopped for online and what you paid for it, it also knew your credit card information and online banking information (if you used those services). They definitely used the shopping information, I'm not sure if they used the other information.

RFC 2394 - IP Payload Compression Using DEFLATE
www.faqs.org/rfcs/rfc2394.htm

Share
twitter facebook
Re:tradeoff (Score:3, Informative)

by afidel ( 530433 ) writes: on Wednesday September 10, 2003 @06:19PM (#6925706)

This technology is NOT lossless, read the article.
If you choose the highest speed (and hence the greatest compression), the image quality is downright poor. On PCMag .com, for example, the photo accompanying John C. Dvorak's column became a featureless blob.
Doesn't sound lossless at all to me. If admins knew what they were doing the HTML,CSS, etc would already be compressed with mod_gzip or the little compression checkbox in IIS.

Parent Share
twitter facebook
Re:You mean... (Score:3, Informative)

by Anonymous Coward writes: on Wednesday September 10, 2003 @06:23PM (#6925735)

Uh, no. mod_gzip will compress _much_ better than your modem. Your modem can only compress in small chunks, and so there is less redundancy to remove. Even small(ish) web pages can be compressed, as a whole, to a much, much smaller size than what your modem can.

In fact, using mod_gzip I've experienced a very discernible speed-up in access and display over DSL!

Parent Share
twitter facebook
Re:Methods (Score:3, Informative)

by Phroggy ( 441 ) * writes: <slashdot3@@@phroggy...com> on Wednesday September 10, 2003 @06:41PM (#6925878) Homepage

Caching webpages in a proxy is something all ISPs do.

Wrong. Most of the larger residential ISPs probably do, but mine certainly doesn't, and of the last four ISPs I worked for, only two did any caching at all, and one of those only did caching in certain limited situations.

The downside is that whenever I've used an ISPs squid proxy, it slowed things down! Turning proxies off almost invariably helps speeds, in stead of hurting them.

Hogwash. I've been using a caching proxy server on my LAN for the past several years, precisely because it increases download speed. Now that I'm using broadband the speed increase isn't noticeable, but on dialup it was - somehow Squid is actually more efficient than Netscape or IE at downloading pages. Of course, with multiple computers you get the caching benefits on top of that.

If your downloads go slower through the proxy, it's because the proxy server is overloaded and your ISP needs to upgrade it - not because proxy servers inherently slow things down.

Plus, if the proxy goes down, you can still use the web.

Of course routing web traffic through a proxy server adds an additional single point of failure, in addition to the other points of failure already in place.

Compression.. Now there's something! I have in the past used an ssh tunnel (with compression switched on) to my university's web proxy, and that sped up things quite a bit! Why isn't this switched on by default on my PPPoA connection? Doesn't apache handle gzip'ing these days? Doesn't seem to be used much, though..

Compression is the main thing these "web accelerators" do. Your ISP runs the server and you run the client, and data exchanged between the two is compressed.

To compress your PPPoA connection would require the router where your connection terminates to compress and decompress ATM cells as they're tunneled inside a PPP link. That's not necessarily very efficient. It would require extra processing at both ends, and may not result in that significant a speed improvement. You probably want higher level compression - fitting more data into a packet, instead of making the packets smaller.

And yes, mod_gzip is available for Apache. Slashdot is using it, didn't you notice? No? Then quit complaining about how little it's used.

Parent Share
twitter facebook
Re:because IIS's is garbage (Score:3, Informative)

by Micah ( 278 ) writes: on Wednesday September 10, 2003 @06:42PM (#6925882) Homepage Journal

I agree.

I used to host Slashcode based sites. The default home page was about 50k. With mod_gzip, it literally got down to about 6k. Really sweet!

Parent Share
twitter facebook
Re:Non-Free software? (Score:2, Informative)

by orasio ( 188021 ) writes: on Wednesday September 10, 2003 @06:57PM (#6926016) Homepage

I am an RMS fanboy, and I am not _obsessed_ with avoiding proprietary software. I _prefer_ avoiding it, its an ethical issue. Its a shame that the US people pays no attention to RMS, given how visionary he has been all these years. He has an interesting point of view, even if he is a bit rough when defending it.

Parent Share
twitter facebook
Get Broadband (Score:2, Informative)

by batkins ( 602341 ) writes: on Wednesday September 10, 2003 @07:03PM (#6926057) Homepage

Seriously, it's a lot easier to just get broadband. I mean, look at it this way - these people are paying in the neighborhood of $19.95 a month (or $22.95 if they're letting AOL rape them) and also around $15 for an extra phone line. So in most cases, you're paying around $35 a month for crap service.

Where I live, DSL is $30 a month and cable is $40 a month. How can you argue with that?

Share
twitter facebook
Because (Score:2, Informative)

by ZorbaTHut ( 126196 ) writes: on Wednesday September 10, 2003 @07:10PM (#6926120) Homepage

The CPU time isn't always cheaper than the bandwidth. Some sites get cheap bandwidth and serve a ludicrous number of database-driven pages that are slightly different from each other - they're designed so that the database servers can handle the load, but gzipping every single page absolutely slaughters the webservers, which are barely doing anything more than relaying requests back to the database servers and are *still* running at high load.

Parent Share
twitter facebook
Re:I still don't understand (Score:3, Informative)

by statusbar ( 314703 ) writes: <jeffk@statusbar.com> on Wednesday September 10, 2003 @07:19PM (#6926216) Homepage Journal

That can't be true. mod_gzip isnt used unless the browser tells the server via the Accept-Encoding header.

--jeff++

Parent Share
twitter facebook
Re:You mean... -- You miss the point (Score:2, Informative)

by eduardodude ( 122967 ) writes: on Wednesday September 10, 2003 @07:22PM (#6926242) Homepage

Yes apache and all 4.0+ browser support the gzip/mod-gzip combination. But the fact is very few websites implement this. Lots of reasons for that (mod-gzip conflicts with many other modules for one) although many more should than do.

A service like this that acts as a compression proxy can dramatically knock down the size of content. We implemented this at my last client and saw 78-93% compression of everything other than images. That includes css, javascript, dynamic web content, etc. I don't know about images, but this alone is very significant for today's clumsily table-laden pages.

Parent Share
twitter facebook
Useful for portable connections unlike DSL (Score:3, Informative)

by goombah99 ( 560566 ) writes: on Wednesday September 10, 2003 @07:24PM (#6926260)

DSL is only useful if you dont move around. When you travel or if like my parents you are retired and semi nomadic, you cant afford to get DSL connection at every place you might live.
Likewise I often find myslef in some crappy hotel where the connection is so noiesy I can barely squeeze a 14.4K connection out of it. I just want to check my web based e-mail not download the encyclopedia britannica
so anything that can make a dialup work painlessly on common web pages is a good thing.

Parent Share
twitter facebook
Re: Awwww boo hoo (Score:5, Informative)

by Master Bait ( 115103 ) writes: on Wednesday September 10, 2003 @07:24PM (#6926261) Homepage Journal

Well, why don't you go ahead and write some Free software to accomplish the same thing?
Running Squid [squid-cache.org] with a 256mb ram disk cache is all the speedup we need, and it does so without altering the data being fed from upstream.

Parent Share
twitter facebook
Re:I still don't understand (Score:2, Informative)

by eduardodude ( 122967 ) writes: on Wednesday September 10, 2003 @07:26PM (#6926270) Homepage

One reason is that mod-gzip sometimes conflicts with other modules when used in apache 1.3. Another is that users behind proxies typically don't see any benefit because most proxies are by default set to strip out the accepts-encoding: gzip request header (I think to simplify the job of caching compressed/non-compressed content?).

But you're right: there's no reason not to use it if you can. The bandwidth savings can be massive, saving $$.

Parent Share
twitter facebook
Re:Faster porn? (Score:4, Informative)

by Cramer ( 69040 ) writes: on Wednesday September 10, 2003 @07:42PM (#6926357) Homepage

1) If you're using dialup, the speed of the internet will not be a bottleneck (slashdot effect not withstanding.)

2) If you're already using compression (stac, predictor, MPPC, etc.), this will make ZERO difference. The cache has to be on the near side of the slowest link -- which is the dialup user's modem. Now, in the instances where the ISP disables software compression -- like, for instance, the "idiots" at Bellsouth.Net who disable CCP to "speed up connection times" [exact words of the Cisco engineer who helped them set things up] (the time it takes to connect and pass traffic is 100% modem training. For us ISDN users, 3 of the 3 seconds it takes to connect are IPCP; I'll accept that as they do tend to return the same IP most of the time) -- it'll help some.

3) A lot of what's moving around the internet isn't measurablly compressable... GIFs, JPGs, mpegs, zip files, etc. (I shall have to perform an analysis.)

Parent Share
twitter facebook
Re:You mean... (Score:5, Informative)

by pla ( 258480 ) writes: on Wednesday September 10, 2003 @08:04PM (#6926498) Journal

why is it that downloading a 1meg file full of ZEROS still takes ages on a modem?

Because v.42bis has a maximum compression ratio of 4:1 (MNP5 only does 2:1).

Now, for a file of all zeros, hey, I agree, you can do a lot better. So, how often do you download files containing nothing but zero? For a typical text file, you might get better than 90% with gzip (while still only 75% from v.42bis). But from binary content? very rarely better than 50%.

In any case, web content consists of five basic types of information - Text, graphics, sound, multimedia (flash, MPEGs, AVIs, etc), and already-Zipped packages.

Of those, only the first benefits from any lossless method, and only the second really leaves any room for saving bits via lossy compression without horrible loss of quality at the same time. (Some of the fourth type could also possibly endure lossy compression, but takes far too long to recompress on-the-fly).

Unfortunately, text comprises the least bothersome (in terms of relative size) of all of those major types of web content.

Don't get me wrong, I fully encourage people to turn mod_gzip on in their Apache installs. But When a company hawks its product with claims that simply cannot occur in a normal webbrowsing situation, I have to call foul.

I see only two situations whereby they could claim 5:1 compression - Either VERY text-heavy material, such as reading something from Project Gutenberg, or they strip every possible non-critical image from a page. I already do the latter via my hosts file and a paranoid userContent.css, so what does that leave?

Hope you only like reading text, in which case, have you ever heard of "Gopher"?

Parent Share
twitter facebook
Re:Well, actually . . . (Score:2, Informative)

by erikharrison ( 633719 ) writes: on Wednesday September 10, 2003 @08:09PM (#6926530)

Damnit! Replying to myself. I forgot about the coolest feature. The proxy checks for inline ads by a similar algorithm that Moz uses, and removes them. Not only are the ads gone, but since it is server side, it cuts down on download size.

Parent Share
twitter facebook
Re:I still don't understand (Score:2, Informative)

by Fr33z0r ( 621949 ) writes: on Wednesday September 10, 2003 @08:13PM (#6926568)

Why so many content providers aren't using gzip compression? The cpu time required is MUCH cheaper than the bandwidth, AND it makes users happiers because they get it faster. Oh, and it's free (for Apache anyway) and easy to set up. It even works with 99% of browsers these days.

The problem with mod_gzip is it makes the webserver wait until a script is finished executing before it sends *anything* to the browser - no fancy "please wait while we do blah blah blah" pages with mod_gzip installed.

I wrote/am writing a rather nifty chat script [no-hype.com] that runs realtime through http just by keeping the connection open and flush()ing at the end of the main loop, with mod_gzip installed you would never see anything until you interrupted the transfer (by hitting stop, or back - effectively ending the script and freeing up the server to transmit the data).

mod_gzip is cool for cats, but it's not without its drawbacks.

Parent Share
twitter facebook
Comment removed (Score:2, Informative)

by account_deleted ( 4530225 ) writes: on Wednesday September 10, 2003 @09:02PM (#6926823)

Comment removed based on user account deletion

Share
twitter facebook
compression with PHP (Score:5, Informative)

by mboedick ( 543717 ) writes: on Wednesday September 10, 2003 @09:23PM (#6926921)

If you have a recent version of PHP, you don't even need mod_gzip. Just put the following lines in your .htaccess file:
php_flag zlib.output_compression on
Does everything on the fly. I once had a shell script that would wget a url with the accept encoding gzip header, and then wget it again without and show the percent savings. Was fairly interesting to see what sites were using compression, and what sites that weren't could have saved in bandwidth by using compression.

Parent Share
twitter facebook
Congratulations: you're clueless. (Score:3, Informative)

by Jerk City Troll ( 661616 ) writes: on Wednesday September 10, 2003 @10:00PM (#6927133) Homepage
In my experience, standards-compliant HTML is *less* space-efficient than ad-hoc HTML. Compare
font size="+1"
vs.
font size=+1
or the extra trailing slash on atomic tags, or /p tags (which you can pretty much omit entirely without confusing any browsers).
Wow. I've never encountered anyone who has spoken so confidently on a topic without knowing a damn thing about it it. You are so absolutely wrong that the example you gave is the exact oposite of what you think it is. You've demonstrated precisely what not to do when writing good markup.
I don't even know where to begin to correct your cluelessness. Here it goes...
When I talk about writing good markup, I'm speaking a purely structural sense. In otherwords, markup text based on what it is, not what it looks like. Separation of content from presentation is the principle to follow.
For example, this is bad HTML:
... <p><font size="12pt">Header</font> <p><font color="navy">We're going to list some <i>items</i>.</font> <p>Here 's a list:<br>- Item 0<br>- Item 1<br> ...

That's bad for many reasons.
1. The font tag eliminates the user's ability to define their own visual style for the page. One of the central reasons for writing structural markup as opposed to presentational markup is so that the user, not the author, has final say on what the page will look like. From time to time, users will supply their own stylesheet to help with readability. If there is no structural nature to the markup, or it defines appearance, the user is defeated. A web developer also has to keep in mind that the browser makes NO guarantees about how something will be rendered.
2. No computer system could interpret that markup and understand *what* it was. What's paragraph text? What's a list? What's a header? What text has emphasis on it? There would be no prioritizing of the text. This defeats search engines and screenreaders (for the blind). Basically, there is nothing about that markup which is machine readable.
3. Changing the visual style of this context requires someone to go and edit the markup. Now what if the visual style was defined in the markup across 1000 pages. That's 1000 complex changes that must be made.
Now, here's good HTML:
... <h1>Header</h1> <p class="P0">We're going to list some <em>items</em>.</p> <p class="P1">Here's a list.</p> <ul> <li>Item 0</li> <li>Item 1</li> </ul> ...

And then a stylesheet:
h1 { font-size: 12pt; } p.P0 { color: navy; } p.P1 { color: black; }

Now, the user can easily define what stuff should look like based on what it is. A user could supply a stylesheet that increases the size of paragraph text by a percentage. A user could specify their own margins for list items. A user could eliminate colors or adjust contrast to their liking.
On top of that, a machine can read that markup and know precisely what it is. This is a header, that's a paragraph, this is a list, etc. A search engines and screenreaders would know precisely what to do with the text.
Furthermore, the styling data is in one location, so if you have 1 or a 1000 pages, the work is the same to make them look different.
Of course, this is only the tip of the iceberg. If you use XHTML and CSS properly, you get more visual flexibility than otherwise. (This also implies discarding tables for use in layout in favor of layers. Tables are for organizing tabular data, not positioning things visually.) XHTML also gives you forwards c
Read the rest of this comment...
Parent Share
twitter facebook
Re:Cache the Suckage (Score:3, Informative)

by slamb ( 119285 ) writes: on Wednesday September 10, 2003 @11:24PM (#6927577) Homepage
Sounds like you had a horrible experience with one. But the problems you saw were bugs in the software, not fundamental problems with the concept. One by one...
I worked at a local ISP who managed to get a demo for a cache server a while back. (I don't anymore.) The machine arrived. We plugged it in, and started to take tech calls.
Sounds like this was your mistake. You "managed to get a demo" speaks volumes. Sounds like an expensive proprietary product from a small company. If you had just downloaded Squid, I don't think you would have encountered all these problems.
The server had a difficult time with virtual hosting of any kind. About 4 out of 5 requests to a virtual host would go through. About 20% of the time, there was some critical piece of information that the cache server would mangle so that the vhost mechanism would be unable to serve the right data. This was a couple years ago, so bugfixes might have happened. Maybe.
Sounds like it only supported IP-based virtual hosting. Back in the day, most sites would have had separate IPs for everything they hosted, so that percentage sounds right. The 20% that failed needed a "Host: www.virtual.com" header.
The server definitely had a hard time with dynamic content that wasn't built with a GET url (thus triggering the pass-thru proxy). If the request was posted, encrypted, hashed, or referenced a server side directive of some kind (server-side redirects were a nasty) the cache would fail. A server side link equating something like "http://www.server.net/redirect/" to a generated URL or dynamic content of some kind was the most frequent case we rean into with this. The server simply couldn't parse each and every http request or every variety and try to decide if it should pass-thru or not. I can't think of a logical way around this that wouldn't break any given implimentation. Can you?
First, I think you mean a POST query would trigger the pass-through. GET is the normal method.
Second, there are pretty simple ways for triggering a cache or not. The full rules are here [w3.org], but very roughly a page should be cached if and only if:
- There is an "Expires", "Last-modified", or "E-tag" header set
- There is no "Cache-control: no-cache" header set
- There was no authorization domain required (HTTP authorization; this doesn't catch forms of course)
If one is in the cache other than the Expires (which doesn't even need to be checked), it will query the server and check if the content is the same as before (with a special header that instructs the server not to send anything if it's unchanged).
This is a nice rule because webservers tend to automatically set the Last-modified: date on plain files and never do on dynamic stuff, so you have to explicitly add it in your dynamic code after considering it. So it gets most of the static stuff automatically (and that's generally the big stuff - images) but is cautious with anything dynamic. That's the correct approach.
This scheme really only breaks down when clock used by the requester (cache server in this case, browser also) or by the content generator (possibly the webserver, but also maybe the desktop used to generate stuff and then upload to the webserver) is skewed. And even then, it just gives stale data; not one person logged in as someone else. And this problem occurs even without a caching server, since browsers implement caching by themselves. Really, having a correct time is important to lots of things on computers. For example, don't ever try to do development without your clock being correct; build tools will be completely unable to tell what's up-to-date and what's not. Ticket-based network authentication systems (like Kerberos implementations, including Microsoft's shiny new ActiveDirectory) will refuse to log you on. etc, etc.
We used dynamically assigned IPs at the time, so proxy requests made
Read the rest of this comment...
Parent Share
twitter facebook
Free software for faster browsing (Score:3, Informative)

by Ed Avis ( 5917 ) writes: <ed@membled.com> on Thursday September 11, 2003 @04:42AM (#6928777) Homepage

I use a modem for web browsing. I've found that wwwoffle [demon.co.uk] is a good proxy server, because it can operate in both online and offline modes - when offline, it serves the most recent version of each page, and if you try to view a non-cached page, it's marked to be downloaded next time you connect. If you want to speed up your browsing some more at the expense of having to hit 'reload' occasionally, you can configure wwwoffle to always use an available cached version even when online.

If you have a shell account on another machine, and that machine has access to a proxy server, then you can tunnel port 3128 or 8080 (common http proxy ports) through ssh. This makes browsing a lot quicker because there is only a single TCP/IP connection going over the modem link - you don't have to connect separately for each page downloaded. Unfortunately I found that while this gave very fast browsing for half an hour or so, eventually it would freeze up and the ssh connection would have to be killed and restarted. Perhaps this has been fixed with newer OpenSSH releases.

RabbIT [sourceforge.net] is a proxy server you can run on the upstream host which compresses text and images (lossily).

The author of rsync mentioned something about an rsync-based web proxy where only differences in pages would be sent, but I don't know if this program was ever released.

Share
twitter facebook
MS does this already on any connection (Score:2, Informative)

by milosoftware ( 654147 ) writes: on Thursday September 11, 2003 @04:47AM (#6928789) Homepage

If you and your ISP both use a Windows system, turning on "software compression" (and don't forget to turn off the modem's hardware compression) basically gives you mod_gzip on ALL your incoming AND outgoing data.

A drawback used to be that the server at the provider side was often overloaded, so I set up several "accounts" to switch between hard- and software compression with and without proxy. Now that my ISP is no longer "free", I haven't seen the server become overloaded any longer, so I use software compression and their proxy all the time. HTML and text download at 20kB/s over a 48kbps connection. Off course, there's no gain in already-compressed content like images and audio.

Share
twitter facebook
rproxy (rsync) (Score:2, Informative)

by Lennie ( 16154 ) writes: on Thursday September 11, 2003 @06:43AM (#6929093)

From one of the articles:
The idea is to store Web pages on your hard drive upon your first visit to the pages, and then to limit the information you download on subsequent visits to those pages to only the data that changes, making for a faster download.

Makes me think about: http://rproxy.sourceforge.net/ [sourceforge.net].

Share
twitter facebook

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Related Links Top of the: day, week, month.

413 commentsChatGPT Leans Liberal, Research Shows
347 commentsAmazon CEO Says 'It's Probably Not Going To Work Out' For Employees Who Defy Return-to-Office Policy
327 commentsHotel Owners Start To Write Off San Francisco as Business Nosedives
323 commentsChina is Building Nuclear Reactors Faster Than Any Other Country
315 commentsChina is Calling in Loans To Dozens of Countries

He has not acquired a fortune; the fortune has acquired him. -- Bion