Optimizing Page Load Times 186
John Callender writes, "Google engineer Aaron Hopkins has written an interesting analysis of optimizing page load time. Hopkins simulated connections to a web page consisting of many small objects (HTML file, images, external javascript and CSS files, etc.), and looked at how things like browser settings and request size affect perceived performance. Among his findings: For web pages consisting of many small objects, performance often bottlenecks on upload speed, rather than download speed. Also, by spreading static content across four different hostnames, site operators can achieve dramatic improvements in perceived performance."
HTTP Pipelining (Score:5, Informative)
For those that don't know what that means: http://www.mozilla.org/projects/netlib/http/pipel
I've had it switched on for ages. I sometimes wonder why it's off by default.
Re:Erm.. huh? (Score:4, Informative)
Firstly if the ISP has a proxy server then using it will reduce the trip time for some stored content meaning it only has to go over a few hops than prehaps all the way across the world. You can also look at something like Onspeed [onspeed.com] which is a paid for product but compresses images (though makes them look worse) and content and can give a decent boost on very slow (GPRS/3G) connections and also get more out of your transfer quota.
Simulation software available? (Score:4, Informative)
What (free) simulation is available for this? I only know dummynet which requires a linux server and some advanced routing. But surely there is more. Is there?
Css and Scripts (Score:5, Informative)
I've done some benchmarks and measurements in the past which will never be made public (I work for Yahoo!). And the most important bits in those have been CSS and Scripts. A lot of performance has been squeezed out of the HTTP layers (akamai, Expires headers), but not enough attention has been paid to the render section of the experience. You could possibly reproduce the benchmarks with a php script which does a sleep() for a few seconds to introduce delays at various points and with a weekend to waste [dotgnu.info].
The page does not start rendering till the last CSS stream is completed, which means if your css has @import url() entries, the delay before render increases (until that file is pulled & parsed too). It really pays to have the quickest load for the css data over anything else - because without it, all you'll get it a blank page for a while.
Scripts marked defer do not always defer and a lot of inline code in <script> tags depend on such scripts that a lot of browsers just pull the scripts as and when they find it. There seems to be just two threads downloading data in parallel (from one hostname), which means a couple of large (but rarely used) scripts in the code will block the rest of the css/image fetches. See flickr's organizr [flickr.com] for an example of that in action.
You should understand that these resources have different priorities in the render land and you should really only venture here after you've optimized the other bits (server [yahoo.com] and application [php.net]).
All said and done, good tutorial by Aaron Hopkins - a lot of us have had to rediscover all that (& more) by ourselves.
Re:Spreading content across hostnames... (Score:1, Informative)
Why would a sub-domain confuse anyone?
rss.slashdot.org
apple.slashdot.org
ask.slashdot.org
backslash.slashdot.org
Those tenths of seconds add up (Score:5, Informative)
Re:Css and Scripts (Score:3, Informative)
I have also found that cached CSS and Javascript can play with you a little bit. When developing a site you tend to see an expected set of behaviors based on your own experience with a site - but you can find later that having the external files either cached or not cached can have an effect on things. (i.e. a cached javascript file with a load event may be triggered before the DOM is ready if you aren't checking for the readiness of the DOM itself)
ETAG headers are very important as well. Running "tail -f access.log" while you browse your own site will show a lot of redundant calls to javascript, css, and image files that should be cached but aren't. IE has a setting of "Check for new content" or something like that that really fouls up css background images without proper expiration headers (lots of flickering).
There is still a significant portion of the web community that utilizes dialup connections. These users are seemingly ignored by many popular sites. I try to get pages to load in under 8 seconds for dialup users, but with any significant javascript or CSS it is sometimes a difficult task. It's much easier on consecutive page loads by forcing cacheing, but that doesn't matter one bit if the user goes elsewhere because the initial page load was too slow.
There are certainly a plethora of optimization techniques not even touched on in this article. I know that Google and Yahoo are very keen on these subjects and it's worth taking a look at the source of some of their pages for ideas. Last I checked, they could care less about validation, though. But with the bandwidth they must utilize saving a few bytes here and there can mean significant dollar differences at the end of the month and what truly matters is whether or not the browser renders the page correctly.
Re:Pipelining (Score:2, Informative)
Pipeling means "multiple requests can be sent before any responses are received. "
Re:Erm.. huh? (Score:4, Informative)
1 - keepalive/pipelining connections means only 1 dns lookup is performed, often cached on your local machine means this delay is minimal.
2 - the dns lookup can be happening for the second host while connections to the first host are still downloading, rather than stopping everything while the second host is looked up. This hides the latency of the second lookup.
3 - most browsers limit the number of connections to each server to 2. If you're loading loads of images, this means you can only be loading two at once (or one while the rest of the page is still downloading). If you put images on a different host, you can get extra connections to it. Also, cookies will usually stop an object from taking advantage of proxies/caches. Putting images on a different host is an easy to way make sure they're not cookied.
Re:HTTP/1.1 Design (Score:3, Informative)
3. SHOULD This word, or the adjective "RECOMMENDED", mean that there may exist valid reasons in particular circumstances to ignore a particular item, but the full implications must be understood and carefully weighed before choosing a different course. 4. SHOULD NOT This phrase, or the phrase "NOT RECOMMENDED" mean that there may exist valid reasons in particular circumstances when the particular behavior is acceptable or even useful, but the full implications should be understood and the case carefully weighed before implementing any behavior described with this label. They don't say DO NOT or MUST NOT. Like they say, the behavior can is useful... and they could see this would be the case IN 1997!
It is time we updated things. It's particularly funny that Microsoft found this RFC, of all things, to obey.
Re:Pipelining (Score:4, Informative)
Keep-alive no:
Open connection
-Request
-Response
Close Connection
Open connection
-Request
-Response
Close Connection
-Repeat-
Keep-alive yes:
Open connection
-Request
-Response
-Request
-Response
-Repeat-
Close Connection
Pipe-lining yes:
Open connection
-Request
-Request
-Repeat-
-Response
-Response
-Repeat-
Close Connection
Re:Connection Limits (Score:3, Informative)
If you have a second webserver for all static data, that can be a simpeler http deamon with 1 Mb/connection or less. You can handle more parallel connactions (and Akamai the setup if needed!)
Yes, it's best to avoid inline images, Google text ad objects, etc. But allowing parallel loading of the objects (and that's the trick with using several separate hosts for images) you can take 8 or 16 roundtrips at the same time; here is your perceived speedup.
Re:Pipelining (Score:3, Informative)
Pipelining sends requests out without having to wait for the previous to complete (this does also require a Content-length: header. This is fine for static files, such as images, but many scripts where output is sent straight to the browser as it's being generated will break this, as it won't know the content length until generated has completed).
Some reasons (Score:3, Informative)
Some reasons against pipelining [mozillazine.org].