Wednesday, June 6, 2007

Testing Page Load Speed

One of the most problematic tasks when working on a Web browser is getting an accurate measurement of how long you're taking to load Web pages. In order to understand why this is tricky, we'll need to understand what exactly browsers do when you ask them to load a URL.

So what happens when you go to a URL like Well, the first step is to start fetching the data from the network. This is typically done on a thread other than the main UI thread.

As the data for the page comes in, it is fed to an HTML tokenizer. It's the tokenizer's job to take the data stream and figure out what the individual tokens are, e.g., a start tag, an attribute name, an attribute value, an end tag, etc. The tokenizer then feeds the individual tokens to an HTML parser.

The parser's job is to build up the DOM tree for a document. Some DOM elements also represent subresources like stylesheets, scripts, and images, and those loads need to be kicked off when those DOM nodes are encountered.

In addition to building up a DOM tree, modern CSS2-compliant browsers also build up separate rendering trees that represent what is actually shown on your screen when painting. It's important to note two things about the rendering tree vs. the DOM tree.

(1) If stylesheets are still loading, it is wasteful to construct the rendering tree, since you don't want to paint anything at all until all stylesheets have been loaded and parsed. Otherwise you'll run into a problem called FOUC (the flash of unstyled content problem), where you show content before it's ready.

(2) Image loads should be kicked off as soon as possible, and that means they need to happen from the DOM tree rather then the rendering tree. You don't want to have to wait for a CSS file to load just to kick off the loads of images.

There are two options for how to deal with delayed construction of the render tree because of stylesheet loads. You can either block the parser until the stylesheets have loaded, which has the disadvantage of keeping you from parallelizing resource loads, or you can allow parsing to continue but simply prevent the construction of the render tree. Safari does the latter.

External scripts must block the parser by default (because they can document.write). An exception is when defer is specified for scripts, in which case the browser knows it can delay the execution of the script and keep parsing.

What are some of the relevant milestones in the life of a loading page as far as figuring out when you can actually reliably display content?

(1) All stylesheets have loaded.
(2) All data for the HTML page has been received.
(3) All data for the HTML page has been parsed.
(4) All subresources have loaded (the onload handler time).

Benchmarks of page load speed tend to have one critical flaw, which is that all they typically test is (4). Take, for example, the aforementioned Frequently is capable of displaying virtually all of its content at about the 350ms mark, but because it can't finish parsing until an external script that wants to load an advertisement has completed, the onload handler typically doesn't fire until the 2-3 second mark!

A browser could clearly optimize for only overall page load speed and show nothing until 2-3 seconds have gone by, thus enabling a single layout and paint. That browser will likely load the overall page faster, but feel literally 10 times slower than the browser that showed most of the page at the 300 ms mark, but then did a little more work as the remaining content came in.

Furthermore benchmarks have to be very careful if they measure only for onload, because there's no rule that browsers have to have done any layout or painting by the time onload fires. Sure, they have to have parsed the whole page in order to find all the subresources, and they have to have loaded all of those subresources, but they may have yet to lay out the objects in the rendering tree.

It's also wise to wait for the onload handler to execute before laying out anyway, because the onload handler could redirect you to another page, in which case you don't really need to lay out or paint the original page at all, or it could alter the DOM of the page (and if you'd done a layout before the onload, you'd then see the changes that the onload handler made happen in the page, such as flashy DHTML menu initialization).

Benchmarks that test only for onload are thus fundamentally flawed in two ways, since they don't measure how quickly a page is initially displayed and they rely on an event (onload) that can fire before layout and painting have occurred, thus causing those operations to be omitted from the benchmark.

i-bench 4 suffers from this problem. i-bench 5 actually corrected the problem by setting minimal timeouts to scroll the page to the offsetTop of a counter element on the page. In order to compute offsetTop browsers must necessarily do a layout, and by setting minimal timers, all browsers paint as well. This means i-bench 5 is doing an excellent job of providing an accurate assessment of overall page load time.

Because tests like i-bench only measure overall page load time, there is a tension between performing well on these sorts of tests and real-world perception, which typically involves showing a page as soon as possible.

A naive approach might be to simply remove all delays and show the page as soon as you get the first chunk of data. However, there are drawbacks to showing a page immediately. Sure, you could try to switch to a new page immediately, but if you don't have anything meaningful to show, you'll end up with a "flashy" feeling, as the old page disappears and is replaced by a blank white canvas, and only later does the real page content come in. Ideally transitions between pages should be smooth, with one page not being replaced by another until you can know reliably that the new page will be reasonably far along in its life cycle.

In Safari 1.2 and in Mozilla-based browsers, the heuristic for this is quite simple. Both browsers use a time delay, and are unwilling to switch to the new page until that time threshold has been exceeded. This setting is configurable in both browsers (in the former using WebKit preferences and in the latter using about:config).

When I implemented this algorithm (called "paint suppression" in Mozilla parlance) in Mozilla I originally used a delay of 1 second, but this led to the perception that Mozilla was slow, since you frequently didnt see a page until it was completely finished. Imagine for example that a page is completely done except for images at the 50ms mark, but that because you're a modem user or DSL user, the images aren't finished until the 1 second mark. Despite the fact that all the readable content could have been shown at the 50ms mark, this delay of 1 second in Mozilla caused you to wait 950 more ms before showing anything at all.

One of the first things I did when working on Chimera (now Camino) was lower this delay in Gecko to 250ms. When I worked on Firefox I made the same change. Although this negatively impacts page load time, it makes the browser feel substantially faster, since the user clicks a link and sees the browser react within 250ms (which to most users is within a threshold of immediacy, i.e., it makes them feel like the browser reacted more or less instantly to their command).

Firefox and Camino still use this heuristic in their latest releases. Safari actually uses a delay of one second like older Mozilla builds used to, and so although it is typically faster than Mozilla-based browsers on overall page load, it will typically feel much slower than Firefox or Camino on network connections like cable modem/modem/DSL.

However, there is also a problem with the straight-up time heuristic. Suppose that you hit the 250ms mark but all the stylesheets haven't loaded or you haven't even received all the data for a page. Right now Firefox and Camino don't care and will happily show you what they have so far anyway. This leads to the "white flash" problem, where the browser gets flashy as it shows you a blank white canvas (because it doesn't yet know what the real background color for the page is going to be, it just fills in with white).

So what I wanted to achieve in Safari was to replicate the rapid response feel of Firefox/Camino, but to temper that rapid response when it would lead to gratuitous flashing. Here's what I did.

(1) Create two constants, cMinimumLayoutThreshold and cTimedLayoutDelay. At the moment the settings for these constants are 250ms and 1000ms respectively.

(2) Don't allow layouts/paints at all if the stylesheets haven't loaded and if you're not over the minimum layout threshold (250ms).

(3) When all data is received for the main document, immediately try to parse as much as possible. When you have consumed all the data, you will either have finished parsing or you'll be stuck in a blocked mode waiting on an external script.

If you've finished parsing or if you at least have the body element ready and if all the stylesheets have loaded, immediately lay out and schedule a paint for as soon as possible, but only if you're over the minimum threshold (250ms).

(4) If stylesheets load after all data has been received, then they should schedule a layout for as soon as possible (if you're below the minimum layout threshold, then schedule the timer to fire at the threshold).

(5) If you haven't received all the data for the document, then whenever a layout is scheduled, you set it to the nearest multiple of the timed layout delay time (so 1000ms, 2000ms, etc.).

(6) When the onload fires, perform a layout immediately after the onload executes.

This algorithm completely transforms the feel of Safari over DSL and modem connections. Page content usually comes screaming in at the 250ms mark, and if the page isn't quite ready at the 250ms, it's usually ready shortly after (at the 300-500ms mark). In the rare cases where you have nothing to display, you wait until the 1 second mark still. This algorithm makes "white flashing" quite rare (you'll typically only see it on a very slow site that is taking a long time to give you data), and it makes Safari feel orders of magnitude faster on slower network connections.

Because Safari waits for a minimum threshold (and waits to schedule until the threshold is exceeded, benchmarks won't be adversely affected as long as you typically beat the minimum threshold. Otherwise the overall page load speed will degrade slightly in real-world usage, but I believe that to be well-worth the decrease in the time required to show displayable content.