Gzip Encoding of Web Pages? 42
Both Brendan Quinn and msim were curious about the ability to send gzip-encoded Web pages. Brendan asks: "It's possible to make Apache detect the "Accept-encoding: gzip" field sent by NS 4.7+, IE 4+ and Lynx, and send a gzip-encoded page, thus saving lots of bandwidth all over the place. So why don't people do it?
Here is a module written by the Mozilla guys a couple of years ago that -almost- does what I want, and I could change it pretty easily... but I thought someone else would have done it by now? eXcite do it, does anyone know of any other large-scale sites that use gzip encoding?"
"If you have LWP installed, you can check with:
GET -p '<my proxy>' -H 'Accept-encoding: gzip' -e http://www.site.com/ | less
Try that with 'www.excite.com' and you'll get binary (gzipped) data. That's what I want to do."
Re:err... (Score:1)
Why do it at all? (Score:1)
Why bother compressing data? Face it, 99% of all web pages out there consist of the following:
Of course, for high-text, heavy traffic sites (for example, right here on /.), this may make some sense. But for the majority of sites, it doesn't seem to make sense to me.
On the other hand, I might just be a grumpy old man who can't understand all these new-fangled things... :=]
________________________
Re:Security through obfuscation (Score:1)
Solutions are numerous... (Score:1)
Have the PHP script make a html and gzipped image of its output whenever it is called (there's a bunch of ob_* fonctions in PHP who can help you do that). Then use mod_rewrite to have the server serve :
gziped image if available and supported by client
html image if available and gzip not supported
php file if no file is available
you can refresh the content of the file by deleting the html and gzip image... that way you have optimal load on the server and a bandwidth-friendly site.
Re:Why do it at all? (Score:1)
Of course, I could be incorrect about Slashdot archives, in which case I will look like a complete idiot.
Would you do the whole thing? (Score:1)
What about parts and peices??
Here is a tar command I use to move files around from system to system occasionally:
tar -cf -
it goes in chuncks - not the whole thing, maybe you should think about incorporating this type of duck movement...
Re:Why do it at all? (Score:1)
Re:Would you do the whole thing? (Score:1)
Re:Does it work with Windows? (Score:1)
I was thinking more along the lines of NS for *nix might be able to handle it, but the Win version might not be able to. I just went from a T1 in my college dorm to a 56k on my dad's computer at home, and anything to speed up the downloads would keep hair in my head. So my original question stands,
This is something I don't know how to test, and I don't know where to start an intelligent search, so if anyone has a good place for me to start looking, I would be grateful. Thanks.BTW, my criteria for a new place to live just grew to include DSL/cable modem access. How do people live on 56k?
Louis Wu
"Where do you want to go ...
Does it work with Windows? (Score:1)
Does this trick need gzip installed already, or is it included in the huge download of NS?
Louis Wu
"Where do you want to go ...
Static HTML can be gzipped in advance. (Score:1)
I realised a long long time ago that I could save space on my Linux box's hard drive by goinf into the html documentation directories and doing a gzip -9 `find . -name "*.html"` .
Since I was opening these files through the file system, not via http, Netscape had no problem whatsoever opening and displaying them.
I just tried this using Netscape on an SGI with http, ( like this http://server/path/page.html.gz ) and it still works... I seem to remember that when I tried this at home with Linux, it didn't work...
I'm running a server, dishing up static HTML batch generated from source files once per month. The saving can be enormous... two HTML files of 25kB and 13kB were reduced to just 2kB each! Admittedly, the body of files only takes up 100MB, to I'm not going to run out of space anytime soon...
Now surely the server would fetch a small file off the disc faster than it could fetch a bigger file. And since I'm not compressing these files on the fly, there's no overhead on the server side. The LAN should get some benefit, too, since there is less data being whizzed around. There's going to be some overhead on the client side, as Netscape needs to gunzip the data at some point...
However, I was under the impression that analog modems already had some dedicated data compression hardware... so if you have a server grabbing gzipped data off its discs, pushing that out to an analog modem, then the hardware of the modem won't be able to compress it (much) further in any case. And if your server is generating the HTML on the fly, maybe it would be better to just push uncompressed data to the modem, and let the hardware compression take care of things.
s errare humanum est, sed merda futare machinem necessit
Re:Does it work with Windows? (Score:1)
Re:err... (Score:1)
Will you have my babies?
Re:err... (Score:1)
Re:err... (Score:1)
Re:Does it work with Windows? (Score:1)
Slightly off topic, but interesting: ever notice that it claims to accept these whether or not the applications to handle them are actually installed ? Its probably the same with gzip.
Re:Does it work with Windows? (Score:1)
Ah... but that's the rub. IE *says* it accepts gzip, but it can't handle it. And with their market share, that means that the % of hits that can accept gzip is probably not big enough to make it worthwhile.
Security through obfuscation (Score:1)
err... (Score:2)
- A.P.
--
* CmdrTaco is an idiot.
Re:err... (Score:2)
The only place where it might not make sense is in an academic environment where (for artificial reasons) the bandwidth is very cheap, and the servers might still be overwhelmed.
--
Re:err... (Score:2)
I'm talking about people on slow links USING your web site. People with modems. I don't care how fast your pipe is outgoing - these people on slow modems can effectively crush your site, shocking as it may sound, because they end up spawning more httpd's, eventually either forcing you to your httpd limit (if you've taken the time to set it sensibly), or forcing your server into swap. And you don't want that to happen.
Please go and read some real quality information on people who have worked with these high end solutions before thinking about replying again. Such as the mod_perl guide, at http://perl.apache.org/guide/
Re:err... (Score:2)
Not if you don't want them! ;-)
AxKit does this (Score:2)
Yes, its not much help for images, but then you just shouldn't enable this concept for images.
Apache::GzipChain can also provide this option for people working with static pages on mod_perl enabled servers, but it has a serious memory leak in it that I found last week (and posted details of to the mod_perl mailing list).
Re:err... (Score:2)
People easily forget that, and assume that their bandwidth is big enough that the file will just instantly disappear down the pipe. Your server will get overloaded an awful lot quicker if every httpd is waiting on a slow client to download 700K when they could be downloading 100K.
How do you get Netscape to do this? (Score:2)
When ever I try to open a file that's been gzipped, Netscape (4.75 on linux) automatically prompts me with a file dialog box. This is even if I'm reading it straight from the file system. Thanks
Re:file-by-file is okay, but all together is bette (Score:2)
You just made it so that pages can't incrementally load any more. The browser would have to wait until the whole .pak was downloaded before it could start laying out the page.
compression of compressed files problem (Score:2)
Yes, there are many places along the transmission lines where compression is attempted, but like the standard setting in most disk compression packages it's a little simple and typically does the worst job of compression in the system. Since compression in a modem is handled independent of any CPU, if you can do better somewhere else it then it doesn't really matter if the modem's efforts are wasted.
In addition, people have been saying it isn't worth compressing .gif or .jpg files. While that's typically true with .gif files, .jpgs can usually have 10-15% of their bulk squeezed out even with the humble zip program.
I'm a huge fan of compression and I strongly believe that transmission of compressed HTML files will have a major positive impact on the 'Net. Don't just think of the lower serving overhead on the servers, think of all the (caching) proxies and other routers and gateways. HTML files seriously lose 80% of their bulk when compressed.
But we need to go further. We need to start bringing in a new highly compressed image format now so it's in popular use before 2005. There are a couple of nice fractal formats around that result in smaller files than the equivalent zipped .jpg -- we need to get at least one into the standard installation of the next IE or NS.
Re:file-by-file is okay, but all together is bette (Score:2)
file-by-file is okay, but all together is better (Score:2)
Something like;
Re:file-by-file is okay, but all together is bette (Score:2)
Doesn't Keep-Alive in HTTP/1.1 take care of the problem of sending multiple resources for one page?
Though I definitely agree with you about the whole multiple-version of a single resource thing (foopic2.jpg/foopic2.fractal)
We've been doing this in production for a year (Score:2)
GZIP Compression is supported in NS4.5 and higher, IE4.01 and higher, and all versions of Mozilla. We have, in the past year, never had a reported problem with the GZIP compression. There are some known bugs if you try to compress other mimetypes other then html.
On a side note in probably about a month or so, I will be releasing into open source a java servlet web application framework. Included, among other goodies, is a layer which can automatically do GZIP encoding if the browser supports it. So anybody writing a web application using this automatically gets the benefits. Eventually coming to http://www.projectapollo.org
Re:err... (Score:2)
I wasn't either...
...that was just sort of an extra thought I tacked on at the end, the rest of it wasn't directed in that fashion...
I agree with your points here, as I had said, my previous posts were coming from a viewpoint where everyone had fairly high bandwidth, especially considering the increasing availability of DSL/cable modems. I know there are a lot of lower bandwidth links out there, and from a time perspective, they spend megapercentages more connected to each httpd. If you can afford the hardware to throw at it for gzipping and large dynamic generation, that's fine. I've found that you fill a (even large) pipe faster than you run out of CPU time on a fairly powerful system (which agrees with your assesments more than mine). I was trying to provide a different viewpoint (since almost none of the people who use my site are on anything slower than 256k DSL, and it runs with a small amount of mem/CPU reserve). If you have a quad-xeon with a couple of gigs of memory, or an S80, then, by all means, go right ahead - it can and will save the slow people time. I haven't run anything to the scale that would need any of this (mostly since I have 80% static content), and my end-user demographic is much more bandwith-enabled than the typical cross-section.
>Please go and read some real quality information on people who have worked with these high end solutions before thinking about replying again.
Thanks for the kind comment... I have read the mod_perl guide... relax a little, will ya?
--
Re:err... (Score:2)
I'll have to see if I can get one of those modules, and give something a shot with webbench.
--
Re:err... (Score:2)
--
Re:Why do it at all? (Score:2)
Ah, but (like I mentioned in another comment) when you have a page that is say 500k of text (a hundred or so comments), dynamically generated for each hit, the overhead of compression is rather dangerous, and if a server is already somewhat near capacity, it could slow it dramatically... if you can't cache it, and have high traffic, it's a big problem.
[Insert your own joke about Jon Katz wasting even more time with compression]
--
Re:Does it work with Windows? (Score:2)
Also, IE4+ does work correctly with gzipped pages.
Re:Does it work with Windows? (Score:2)
I set up a program to listen on port 80 and told NS to browse to localhost. It sent the "Accept-encoding: gzip". I then telnetted to www.excite.com:80 and sent that data. I got gzipped data in return. I then browsed the site using Netscape, and it loaded properly; therefore, Netscape 4.75 can handle gzipped downloads.
I then tricked IE 5.5 into sending the same HTTP request; I connected to a proxy (127.0.0.1) which would transparently forward to excite.com, filtered out the HTTP request, pasted in Netscape's; it also loaded properly.
So yes, gzip downloads work fine under Windows systems using Netscape 4.75 or IE5.5 (not sure about older versions, though), though IE5.5 sends an odd "Accept-encoding: gzip, deflate" which results in some sites not compressing it at all.
-- Sig (120 chars) --
Your friendly neighborhood mIRC scripter.
Re:Does it work with Windows? (Score:2)
-- Sig (120 chars) --
Your friendly neighborhood mIRC scripter.
Re:Why do it at all? (Score:2)
For conventional web pages, I agree. The slowness of most web sites is either due to graphics, or they are using some slow CGI on the server side. Compression of HTML wouldn't help them much.
There are also cases where the HTML is just plain resource-intensive for the browser to render (lots of nested tables, for example). Adding in the extra step of de-compressing wouldn't help there either.
However, I could see clients (not necessarily browsers) sucking down large chunks of XML in a gzipped form. It could be used for things like sending thousands of raw database records to a client application for further processing and presentation to the end user.
How about this? (Score:3)
Re:Does it work with Windows? (Score:3)
GET / HTTP/1.1
Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, application/vnd.ms-excel, application/msword, application/vnd.ms-powerpoint, */*
Accept-Language: en-us
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/4.0 (compatible; MSIE 5.5; Windows 98)
Host: 127.0.0.1
Connection: Keep-Alive
In comparison, Netscape 4.75:
GET / HTTP/1.0
Connection: Keep-Alive
User-Agent: Mozilla/4.75 [en] (Win98; U)
Host: 127.0.0.1
Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, image/png, */*
Accept-Encoding: gzip
Accept-Language: en
Accept-Charset: iso-8859-1,*,utf-8
The main points of interest are that IE5.5 can handle HTTP/1.1 while Netscape only requests HTTP/1.0, and that IE5.5 also claims to handle gzip AND deflate encoding, even though they're exactly the same (last time I checked, gzip used the deflate algorithm).
I also tried sending the IE5.5 HTTP request via telnet to www.excite.com; it returned plain text, whereas Netscape's HTTP request returned gzipped data.
-- Sig (120 chars) --
Your friendly neighborhood mIRC scripter.
Re:Why do it at all? (Score:4)
Incidentally, no extra load would be neccessary on the server for static content if it was pre-compressed.