Thursday, November 16, 2006

What Real People Use On The Web

At last year's Web 2.0 Conference, a much discussed panel was one featuring a group of teenagers telling everyone what Web products they use. This year the concept has been take an extra level, by inviting the parents of the teenagers as well. The panel was moderated by Safa Rashtchy.

Most of the panel has Google as their main search engine. One adult panelist uses, because she can put questions into the search box. One (adult) panelist says she uses Google out of habit. Another adult panelist uses Yahoo for the maps and other information. One of the teenagers says she uses Google because her school wants her to. Most of the panelists did not know MSN had a search engine.

In regards to online video, one adult panelist said she spends 3-4 hours per week on YouTube. A teenager spends 2-3 hours a day at the library with his friends watching YouTube. One teenager says he uses Google Video. Another says she uses some download software (Shakespeare something??). Safa asks would any of them pay $1 to watch video, e.g. Lost. The majority opinion is no. As for free but with videos, one teenager says she would and in fact it's better than download.

Website recognition

Safa rattles off some names of websites:

  • Skype - just two people know what it is and use it, both for international calls. One says "it rarely works" in terms of sound quality.
  • craigslist - yes
  • yelp - no
  • judyslist - no
  • blogs - yes, about half read them read them; a couple mentioned reading them on myspace, but also for their interests/hobbies

Re uploading, most have done this. One adult panelist says pictures.

Internet Companies

Safa now asks about companies:

  • Yahoo: most of the adult panelists use them - e.g. for horoscopes. a lot of them use Yahoo Mail (a few use Gmail); one teenager thinks Yahoo is "silly", as in entertaining. One adult panelist says she likes Yahoo for things like games and emails, and scans the news.
  • Google: one teenager "seems more like a friend" and he also uses Gmail because it's easy and user-friendly; another says he uses Google Video and he likes Google; all the teenagers would trust Google over Yahoo!
  • MSN: nobody has much to say about it; one teenager says he likes Xbox; one says they like the little (cartoon) characters. Word was mentioned.
  • eBay: some of them use it, e.g. for concert tickets, books etc.
  • Amazon: a few people, one says for "mostly media type things" like books, CDs.

Instant Messaging

One teenager uses AOL "all day" to talk to his friends. Same for another teenage boy. Three most mentioned by teenagers were AIM, MSN and Yahoo. One says 2-3 hours per day.


One teenager compares MySpace to xmas presents, because he sees something new or a new friend every day - he spends around 3 hours per day. Another says 2-3 hours per day - "making sure my profile's good". One mother signed up to monitor what her child was doing - she found out her 14 year old son was 17 on MySpace.

Troubles with Asynchronous Ajax Requests and PHP Sessions

Troubles with Asynchronous Ajax Requests and PHP Sessions

As I sit here watching “The Muppets Take Manhattan” in Spanish in the middle of a Costa Rican thunderstorm, I find my mind drifting back to a recent project where I spent a day debugging a frustratingly annoying problem: A user would visit the web application I was working on, and after a given page was loaded, all of the session data associated with their visit would be suddenly gone. The user would no longer be logged into the site, and any changes they made (which were logged in session data) were lost.

I spent tonnes of time in the debugger (while at times unreliable and frustrating on huge projects, the Zend debugger is still an invaluable aid for the PHP application developer) and kept seeing the same thing: the session data were simply being erased at some point, and the storage in the database would register ’’ as the data for the session.

It was driving me crazy. I would sit there in the debugger and go through the same sequence each time:

  • Debug the request for the page load.
  • Debug the request for the first Ajax request that the page load fired off.
  • Debug the request for the second Ajax request that the page load simultaneously fired off.
  • Debug the request for the third Ajax request that the page initiated

In retrospect, looking at the above list, it seems blindingly obvious what I had been running into, but it was very late in a very long contract, and I blame the fatigue for missing what now seems patently obvious: A race condition.

For those unfamiliar with what exactly this is, a race condition is seen most often in applications involving multiple “threads of execution” – which include either separate processes or threads within a process – when two of these threads (which are theoretically executing at the same time) try to modify the same piece of data.

If two threads of execution that are executing more or less simultaneously (but never in exactly the same way, because of CPU load, other processes, and chance) try to write to the same variable or data storage location, the value of that storage location depends on which thread got there first. Given that it is impossible to predict which one got there first, you end up not knowing the value of the variable after the threads of execution are finished (in effect, “the last one to write, wins”) (see Figure 1).

Racing to destroy values

Normally, when you write web applications in PHP, this is really not an issue, as each page request gets their own execution environment, and a user is only visiting one page at a time. Each page request coming from a particular user arrives more or less sequentially and shares no data with other page requests.

Ajax changes all of this, however: suddenly, one page visit can result in a number of simultaneous requests to the server. While the separate PHP processes cannot directly share data, a solution with which most PHP programmers are familiar exists to get around this problem: Sessions. The session data that the various requests want to modify are now susceptible to being overwritten by other ones with bad data after a given request thinks it has written out updated and correct data (See Figure 2).

When requests go bad - clobbering data

In the web application I was working on, all of the Ajax requests were being routed through the same code that called session_start() and implicitly session_write_close() (when PHP ends and there is a running session, this function is called). One of the Ajax requests would, however, set some session data to help the application “remember” which data the user was browsing. Depending on the order in which the various requests were processed by the server, sometimes those data would overwrite other session data and the user data would be “forgotten”.

As an example of this problem, consider the following example page, which when fully loaded, will execute two asynchronous Ajax requests to the server.

The code is divided into three main sections:
  • The first contains the call to session_start() and opens the HTML headers.
  • The second contains the Javascript code to execute the asynchronous requests to the server. The biggest function, getNewHTTPObject() is used to create new objects. The onLoadFunction() is executed when the page finishes loading and starts the ball rolling, while the other two functions are simply used to wait for and handle the responses and results from the asynchronous requests.
  • In the final section, we just write out the section of the document, which contains a single
    element to hold the results and an attribute on the element to make sure that the onLoadFunction() is called when the document finishes loading.

The asynchronous Ajax requests are then made to race2.php and are processed by the following code, which can handle two different Ajax work requests:

This PHP script handles the two request types differently, and creates the race condition by having the second request type req1 set the sesion data to ’’. (In a real world application, you might have accidentally had this request set some value you thought was meaningful).

If you install the two files race1.php and race2.php on your server, and then load race1.php into your borwser, you will periodically see that the test string is set after the page is completely loaded, and other times it will be “(empty)”, indicating that the second Ajax request has clobbered the value.

Now that we are aware of this problem and how it can manifest itself, the next question is, of course, how do we solve it? Unfortunately, I think this is one of those problems best solved by avoiding it. Building in logic and other things into our web application to lock the threads of execution (i.e. individual requests) would be prohibitively expensive and eliminate much of the fun and many of the benefits of asynchronous requests via Ajax. Instead, we will avoid modifying session data when we are executing multiple session requests.

Please note that this is much more specific than saying simply that we will avoid modifying session data during any Ajax request. Indeed, this would be a disaster: In a Web 2.0 application, we are mostly likely using Ajax for form submission and updating the state of the user data (i.e. session data) as the data are processed and we are responding to the changes. However, for those requests we are using to update parts of pages dynamically, we should be careful to avoid modifying the session data in these, or at least do so in a way that none of the other requests are going to see changes in their results depending on these session data.

Ajax requests and session data do not have be problematic when used together: With a little bit of care and attention, we can write web applications that are powerful, dynamic, and not plagued by race condition-type bugs.

Optimize your Page Load Time

Optimizing Page Load Time

It is widely accepted that fast-loading pages improve the user experience. In recent years, many sites have started using AJAX techniques to reduce latency. Rather than round-trip through the server retrieving a completely new page with every click, often the browser can either alter the layout of the page instantly or fetch a small amount of HTML, XML, or javascript from the server and alter the existing page. In either case, this significantly decreases the amount of time between a user click and the browser finishing rendering the new content.

However, for many sites that reference dozens of external objects, the majority of the page load time is spent in separate HTTP requests for images, javascript, and stylesheets. AJAX probably could help, but speeding up or eliminating these separate HTTP requests might help more, yet there isn't a common body of knowledge about how to do so.

While working on optimizing page load times for a high-profile AJAX application, I had a chance to investigate how much I could reduce latency due to external objects. Specifically, I looked into how the HTTP client implementation in common browsers and characteristics of common Internet connections affect page load time for pages with many small objects.

I found a few things to be interesting:

  • IE, Firefox, and Safari ship with HTTP pipelining disabled by default; Opera is the only browser I know of that enables it. No pipelining means each request has to be answered and its connection freed up before the next request can be sent. This incurs average extra latency of the round-trip (ping) time to the user divided by the number of connections allowed. Or if your server has HTTP keepalives disabled, doing another TCP three-way handshake adds another round trip, doubling this latency.

  • By default, IE allows only two outstanding connections per hostname when talking to HTTP/1.1 servers or eight-ish outstanding connections total. Firefox has similar limits. Using up to four hostnames instead of one will give you more connections. (IP addresses don't matter; the hostnames can all point to the same IP.)

  • Most DSL or cable Internet connections have asymmetric bandwidth, at rates like 1.5Mbit down/128Kbit up, 6Mbit down/512Kbit up, etc. Ratios of download to upload bandwidth are commonly in the 5:1 to 20:1 range. This means that for your users, a request takes the same amount of time to send as it takes to receive an object of 5 to 20 times the request size. Requests are commonly around 500 bytes, so this should significantly impact objects that are smaller than maybe 2.5k to 10k. This means that serving small objects might mean the page load is bottlenecked on the users' upload bandwidth, as strange as that may sound.

Using these, I came up with a model to guesstimate the effective bandwidth of users of various flavors of network connections when loading various object sizes. It assumes that each HTTP request is 500 bytes and that the HTTP reply includes 500 bytes of headers in addition to the object requested. It is simplistic and only covers connection limits and asymmetric bandwidth, and doesn't account for the TCP handshake of the first request of a persistent (keepalive) connection, which is amortized when requesting many objects from the same connection. Note that this is best-case effective bandwidth and doesn't include other limitations like TCP slow-start, packet loss, etc. The results are interesting enough to suggest avenues of exploration but are no substitute for actually measuring the difference with real browsers.

To show the effect of keepalives and multiple hostnames, I simulated a user on net offering 1.5Mbit down/384Kbit up who is 100ms away with 0% packet loss. This roughly corresponds to medium-speed ADSL on the other side of the U.S. from your servers. Shown here is the effective bandwidth while loading a page with many objects of a given size, with effective bandwidth defined as total object bytes received divided by the time to receive them:

[1.5megabit 100ms graph]

Interesting things to note:

  • For objects of relatively small size (the left-hand portion of the graph), you can see from the empty space above the plotted line how little of the user's downstream bandwidth is being used, even though the browser is requesting objects as fast as it can. This user has to be requesting objects larger than 100k before he's mostly filling his available downstream bandwidth.

  • For objects under roughly 8k in size, you can double his effective bandwidth by turning keepalives on and spreading the requests over four hostnames. This is a huge win.

  • If the user were to enable pipelining in his browser (such as setting Firefox's network.http.pipelining in about:config), the number of hostnames we use wouldn't matter, and he'd make even more effective use of his available bandwidth. But we can't control that server-side.

Perhaps more clearly, the following is a graph of how much faster pages could load for an assortment of common access speeds and latencies with many external objects spread over four hostnames and keepalives enabled. Baseline (0%) is one hostname and keepalives disabled.

[Speedup of 4 hostnames and keepalives on]

Interesting things from that graph:

  • If you load many objects smaller than 10k, both local users and ones on the other side of the world could see substantial improvement from enabling keepalives and spreading requests over 4 hostnames.

  • There is a much greater improvement for users further away.

  • This will matter more as access speeds increase. The user on 100meg ethernet only 20ms away from the server saw the biggest improvement.

One more thing I examined was the effect of request size on effective bandwidth. The above graphs assumed 500 byte requests and 500 bytes of reply headers in addition to the object contents. How does changing that affect performance of our 1.5Mbit down/384Kbit up and 100ms away user, assuming we're already using four hostnames and keepalives?

[Effective bandwidth at various request sizes]

This shows that at small object sizes, we're bottlenecked on the upstream bandwidth. The browser sending larger requests (such as ones laden with lots of cookies) seems to slow the requests down by 40% worst-case for this user.

As I've said, these graphs are based on a simulation and don't account for a number of real-world factors. But I've unscientifically verified the results with real browsers on real net and believe them to be a useful gauge. I'd like to find the time and resources to reproduce these using real data collected from real browsers over a range of object sizes, access speeds, and latencies.

Tips to reduce your page load time

After you gather some page-load times and effective bandwidth for real users all over the world, you can experiment with changes that will improve those times. Measure the difference and keep any that offer a substantial improvement.

Try some of the following:

  • Turn on HTTP keepalives for external objects. Otherwise you add an extra round-trip for every HTTP request. If you are worried about hitting global server connection limits, set the keepalive timeout to something short, like 5-10 seconds. Also look into serving your static content from a different webserver than your dynamic content. Having thousands of connections open to a stripped down static file webserver can happen in like 10 megs of RAM total, whereas your main webserver might easily eat 10 megs of RAM per connection.

  • Load fewer external objects. Due to request overhead, one bigger file just loads faster than two smaller ones half its size. Figure out how to globally reference the same one or two javascript files and one or two external stylesheets instead of many; if you have more, try preprocessing them when you publish them. If your UI uses dozens of tiny GIFs all over the place, consider switching to a much cleaner CSS-based design which probably won't need so many images. Or load all of your common UI images in one request using a technique called "CSS sprites".

  • If your users regularly load a dozen or more uncached or uncacheable objects per page, consider evenly spreading those objects over four hostnames. This usually means your users can have 4x as many outstanding connections to you. Without HTTP pipelining, this results in their average request latency dropping to about 1/4 of what it was before.

    When you generate a page, evenly spreading your images over four hostnames is most easily done with a hash function, like MD5. Rather than having all tags load objects from, create four hostnames (e.g.,,, and use two bits from an MD5 of the image path to choose which of the four hosts you reference in the tag. Make sure all pages consistently reference the same hostname for the same image URL, or you'll end up defeating caching.

    Beware that each additional hostname adds the overhead of an extra DNS lookup and an extra TCP three-way handshake. If your users have pipelining enabled or a given page loads fewer than around a dozen objects, they will see no benefit from the increased concurrency and the site may actually load more slowly. The benefits only become apparent on pages with larger numbers of objects. Be sure to measure the difference seen by your users if you implement this.

  • Possibly the best thing you can do to speed up pages for repeat visitors is to allow static images, stylesheets, and javascript to be unconditionally cached by the browser. This won't help the first page load for a new user, but can substantially speed up subsequent ones.

    Set an Expires header on everything you can, with a date days or even months into the future. This tells the browser it is okay to not revalidate on every request, which can add latency of at least one round-trip per object per page load for no reason.

    Instead of relying on the browser to revalidate its cache, if you change an object, change its URL. One simple way to do this for static objects if you have staged pushes is to have the push process create a new directory named by the build number, and teach your site to always reference objects out of the current build's base URL. (Instead of you'd use . When you do another build next week, all references change to .) This also nicely solves problems with browsers sometimes caching things longer than they should -- since the URL changed, they think it is a completely different object.

    If you conditionally gzip HTML, javascript, or CSS, you probably want to add a "Cache-Control: private" if you set an Expires header. This will prevent problems with caching by proxies that won't understand that your gzipped content can't be served to everyone. (The Vary header was designed to do this more elegantly, but you can't use it because of IE brokenness.)

    For anything where you always serve the exact same content when given the same URL (e.g. static images), add "Cache-Control: public" to give proxies explicit permission to cache the result and serve it to different users. If a local cache has the content, it is likely to have much less latency than you; why not let it serve your static objects if it can?

    Avoid the use of query params in image URLs, etc. At least the Squid cache refuses to cache any URL containing a question mark by default. I've heard rumors that other things won't cache those URLs at all, but I don't have more information.

  • On pages where your users are often sent the exact same content over and over, such as your home page or RSS feeds, implementing conditional GETs can substantially improve response time and save server load and bandwidth in cases where the page hasn't changed.

    When serving a static files (including HTML) off of disk, most webservers will generate Last-Modified and/or ETag reply headers for you and make use of the corresponding If-Modified-Since and/or If-None-Match mechanisms on requests. But as soon as you add server-side includes, dynamic templating, or have code generating your content as it is served, you are usually on your own to implement these.

    The idea is pretty simple: When you generate a page, you give the browser a little extra information about exactly what was on the page you sent. When the browser asks for the same page again, it gives you this information back. If it matches what you were going to send, you know that the browser already has a copy and send a much smaller 304 (Not Modified) reply instead of the contents of the page again. And if you are clever about what information you include in an ETag, you can usually skip the most expensive database queries that would've gone into generating the page.

  • Minimize HTTP request size. Often cookies are set domain-wide, which means they are also unnecessarily sent by the browser with every image request from within that domain. What might've been a 400 byte request for an image could easily turn into 1000 bytes or more once you add the cookie headers. If you have a lot of uncached or uncacheable objects per page and big, domain-wide cookies, consider using a separate domain to host static content, and be sure to never set any cookies in it.

  • Minimize HTTP response size by enabling gzip compression for HTML and XML for browsers that support it. For example, the 17k document you are reading takes 90ms of the full downstream bandwidth of a user on 1.5Mbit DSL. Or it will take 37ms when compressed to 6.8k. That's 53ms off of the full page load time for a simple change. If your HTML is bigger and more redundant, you'll see an even greater improvement.

    If you are brave, you could also try to figure out which set of browsers will handle compressed Javascript properly. (Hint: IE4 through IE6 asks for its javascript compressed, then breaks badly if you send it that way.) Or look into Javascript obfuscators that strip out whitespace, comments, etc and usually get it down to 1/3 to 1/2 its original size.

  • Consider locating your small objects (or a mirror or cache of them) closer to your users in terms of network latency. For larger sites with a global reach, either use a commercial Content Delivery Network, or add a colo within 50ms of 80% of your users and use one of the many available methods for routing user requests to your colo nearest them.

  • Regularly use your site from a realistic net connection. Convincing the web developers on my project to use a "slow proxy" that simulates bad DSL in New Zealand (768Kbit down, 128Kbit up, 250ms RTT, 1% packet loss) rather than the gig ethernet a few milliseconds from the servers in the U.S. was a huge win. We found and fixed a number of usability and functional problems very quickly.

    To implement the slow proxy, I used the netem and HTB kernel modules available in the Linux 2.6 kernel, both of which are set up with the tc command line tool. These offer the most accurate simulation I could find, but are definitely not for the faint of heart. I've not used them, but supposedly Tamper Data for Firefox, Fiddler for Windows, and Charles for OSX can all rate-limit and are probably easier to set up, but they may not simulate latency properly.

  • Use Google's Load Time Analyzer extension for Firefox from a realistic net connection to see a graphical timeline of what it is doing during a page load. This shows where Firefox has to wait for one HTTP request to complete before starting the next one and how page load time increases with each object loaded. The Tamper Data extension can offer a similar graph if you make use of its hidden graphing feature. And the Safari team offers a tip on a hidden feature in their browser that offers some timing data too.

    Or if you are familiar with the HTTP protocol and TCP/IP at the packet level, you can watch what is going on using tcpdump, ngrep, or ethereal. These tools are indispensible for all sorts of network debugging.

  • Try benchmarking common pages on your site from a local network with ab, which comes with the Apache webserver. If your server is taking longer than 5 or 10 milliseconds to generate a page, you should make sure you have a good understanding of where it is spending its time.

    If your latencies are high and your webserver process (or CGI if you are using that) is eating a lot of CPU during this test, it is often a result of using a scripting language that needs to recompile your scripts with every request. Software like eAccelerator for PHP, mod_perl for perl, mod_python for python, etc can cache your scripts in a compiled state, dramatically speeding up your site. Beyond that, look at finding a profiler for your language that can tell you where you are spending your CPU. If you improve that, your pages will load faster and you'll be able to handle more traffic with fewer machines.

    If your site relies on doing a lot of database work or some other time-consuming task to generate the page, consider adding server-side caching of the slow operation. Most people start with writing a cache to local memory or local disk, but that starts to fall down if you expand to more than a few web server machines. Look into using memcached, which essentially creates an extremely fast shared cache that's the combined size of the spare RAM you give it off of all of your machines. It has clients available in most common languages.

  • (Optional) Petition browser vendors to turn on HTTP pipelining by default on new browsers. Doing so will remove some of the need for these tricks and make much of the web feel much faster for the average user. (Firefox has this disabled supposedly because some proxies, some load balancers, and some versions of IIS choke on pipelined requests. But Opera has found sufficient workarounds to enable pipelining by default. Why can't other browsers do similarly?)

The above list covers improving the speed of communication between browser and server and can be applied generally to many sites, regardless of what web server software they use or what language the code behind you site is written in. There is, unfortunately, a lot that isn't covered.

While the tips above are intended to improve your page load times, a side benefit of many of them is a reduction in server bandwidth and CPU needed for the average page view. Reducing your costs while improving your user experience seems it should be worth spending some time on.

-- Amol Kulkarni