Thursday, November 16, 2006

Troubles with Asynchronous Ajax Requests and PHP Sessions

Troubles with Asynchronous Ajax Requests and PHP Sessions

As I sit here watching “The Muppets Take Manhattan” in Spanish in the middle of a Costa Rican thunderstorm, I find my mind drifting back to a recent project where I spent a day debugging a frustratingly annoying problem: A user would visit the web application I was working on, and after a given page was loaded, all of the session data associated with their visit would be suddenly gone. The user would no longer be logged into the site, and any changes they made (which were logged in session data) were lost.

I spent tonnes of time in the debugger (while at times unreliable and frustrating on huge projects, the Zend debugger is still an invaluable aid for the PHP application developer) and kept seeing the same thing: the session data were simply being erased at some point, and the storage in the database would register ’’ as the data for the session.

It was driving me crazy. I would sit there in the debugger and go through the same sequence each time:

  • Debug the request for the page load.
  • Debug the request for the first Ajax request that the page load fired off.
  • Debug the request for the second Ajax request that the page load simultaneously fired off.
  • Debug the request for the third Ajax request that the page initiated

In retrospect, looking at the above list, it seems blindingly obvious what I had been running into, but it was very late in a very long contract, and I blame the fatigue for missing what now seems patently obvious: A race condition.

For those unfamiliar with what exactly this is, a race condition is seen most often in applications involving multiple “threads of execution” – which include either separate processes or threads within a process – when two of these threads (which are theoretically executing at the same time) try to modify the same piece of data.

If two threads of execution that are executing more or less simultaneously (but never in exactly the same way, because of CPU load, other processes, and chance) try to write to the same variable or data storage location, the value of that storage location depends on which thread got there first. Given that it is impossible to predict which one got there first, you end up not knowing the value of the variable after the threads of execution are finished (in effect, “the last one to write, wins”) (see Figure 1).

Racing to destroy values

Normally, when you write web applications in PHP, this is really not an issue, as each page request gets their own execution environment, and a user is only visiting one page at a time. Each page request coming from a particular user arrives more or less sequentially and shares no data with other page requests.

Ajax changes all of this, however: suddenly, one page visit can result in a number of simultaneous requests to the server. While the separate PHP processes cannot directly share data, a solution with which most PHP programmers are familiar exists to get around this problem: Sessions. The session data that the various requests want to modify are now susceptible to being overwritten by other ones with bad data after a given request thinks it has written out updated and correct data (See Figure 2).

When requests go bad - clobbering data

In the web application I was working on, all of the Ajax requests were being routed through the same code that called session_start() and implicitly session_write_close() (when PHP ends and there is a running session, this function is called). One of the Ajax requests would, however, set some session data to help the application “remember” which data the user was browsing. Depending on the order in which the various requests were processed by the server, sometimes those data would overwrite other session data and the user data would be “forgotten”.

As an example of this problem, consider the following example page, which when fully loaded, will execute two asynchronous Ajax requests to the server.

The code is divided into three main sections:
  • The first contains the call to session_start() and opens the HTML headers.
  • The second contains the Javascript code to execute the asynchronous requests to the server. The biggest function, getNewHTTPObject() is used to create new objects. The onLoadFunction() is executed when the page finishes loading and starts the ball rolling, while the other two functions are simply used to wait for and handle the responses and results from the asynchronous requests.
  • In the final section, we just write out the section of the document, which contains a single
    element to hold the results and an attribute on the element to make sure that the onLoadFunction() is called when the document finishes loading.

The asynchronous Ajax requests are then made to race2.php and are processed by the following code, which can handle two different Ajax work requests:



This PHP script handles the two request types differently, and creates the race condition by having the second request type req1 set the sesion data to ’’. (In a real world application, you might have accidentally had this request set some value you thought was meaningful).

If you install the two files race1.php and race2.php on your server, and then load race1.php into your borwser, you will periodically see that the test string is set after the page is completely loaded, and other times it will be “(empty)”, indicating that the second Ajax request has clobbered the value.

Now that we are aware of this problem and how it can manifest itself, the next question is, of course, how do we solve it? Unfortunately, I think this is one of those problems best solved by avoiding it. Building in logic and other things into our web application to lock the threads of execution (i.e. individual requests) would be prohibitively expensive and eliminate much of the fun and many of the benefits of asynchronous requests via Ajax. Instead, we will avoid modifying session data when we are executing multiple session requests.

Please note that this is much more specific than saying simply that we will avoid modifying session data during any Ajax request. Indeed, this would be a disaster: In a Web 2.0 application, we are mostly likely using Ajax for form submission and updating the state of the user data (i.e. session data) as the data are processed and we are responding to the changes. However, for those requests we are using to update parts of pages dynamically, we should be careful to avoid modifying the session data in these, or at least do so in a way that none of the other requests are going to see changes in their results depending on these session data.

Ajax requests and session data do not have be problematic when used together: With a little bit of care and attention, we can write web applications that are powerful, dynamic, and not plagued by race condition-type bugs.

No comments: