Wednesday, November 22, 2006

Client Side Page Caching

Client Side Page Caching

This Issue

In this issue we will discuss Page Caching, including different browsers cache schemas. We will also discuss how Microsoft Proxy page caching works, how to get your pages cached, and how not to. Examples will be given in the Active Server pages about manipulating the Cache-Control header.

Page Caching

Client side page caching is where the client (browser) caches pages to the hard drive. When a page is requested from the server, the response is written to the hard drive as a file by the browser. If the page is needed again, the client uses the page from the cache, as long as the page hasn't expired. If the page has expired, then the browser asks the server if there is a newer page on the site and rewrites the cache with the response. The reason for client side page caching, as for any caching, is for performance. It is much faster to read the file from the hard drive then to wait for the page or graphic to download from the web server. The theory is that pages that are accessed are more likely to be accessed again in the near future. The hard drive cache is limited in size through a setting in the browser. When the cache fills, the older files in the cache are erased to make room for the newer ones.

You can view Internet Explorer's cache by looking in this directory:

Windows NT

c:\winnt\Temporary Internet Files

Windows 95

c:\windows\Temporary Internet Files

Many browsers let you modify how they use the cache. Internet Explorer works this way as well.

Configuring the Browser

You can configure how the browser caches files also. By default, the Internet Explorer checks each file in the cache once after Explorer is started. This means that on the first viewing of the page, the browser requests the page from the server using If-Since-Modified header. The If-Since-Modified header was discussed in the last Issue. If the browser returns "304 Not Modified" then the page is used from the cache and every page after that is used from the cache unless the page expires or the user "refreshes." If a new page is returned, it's last modified date and the expiration date are written to the cache.

You can also configure the browser to never request the page from the server again once cached unless it either expires or the user requests a refresh. Or you can have the page requested from the server every time. In the case where the page is requested every time, the cached is used every time if a "304 Last Modified" status is returned. To set the Internet Explorer browser configuration:

From within Internet Explorer 3.0:

* Click on Tools | Options. and the Options Dialog will come up.
* Click on the Advanced Tab.
* With the Temporary Internet Files group click on Settings and the Settings Dialog will appear.
* From Check for newer version of stored file choose either Every visit to the page, Every time you start Internet Explorer, or Never.
* Click on OK to close the Settings Dialog and again to close the Options Dialog.

Caching and the Expires Header

In the last issue, we discussed the Expires header and how to set it from the server. By setting the Expires header to the current date, pages that are either accessed by typing in a URL into the address box, using the browser's navigation buttons, or through a link, get requested again from the server. This is true even if the page is in the cache. To set the Expires header to the current date in a Active Server page add the following line:

Example 1

Response.Expires = 0

It might seem like there is no difference between a page that has expired and a page that is not cached. If a page is not cached on the client side and the page is accessed by typing in a URL into the address box, using the browser's navigation buttons, or through a link, the page is requested again from the server.

However there is a difference, in the last issue we discussed how a cached page in Internet Explorer that has the same last modified date as the server response uses the cached page and Netscape uses the page in the response. In the case where the page is expired and the last modified date is the same, you will see the page from the cache using Internet Explorer. However if the page is not cached, you will always see the page in the response using Internet Explorer. Because of this difference, you might want to force the browser not to cache the page, the result will look very much like setting the Expires header to the current date.

Pragma

In HTTP 1.0 there is a pragma command that is documented to control page caching. The pragma is a command header that is passed back from the server to the client in the response. To send the command not to cache the page in Active Server pages, add this line to the top of the Active Server page:

Example 2

Response.AddHeader "Pragma","no-cache"

This will prevent the Netscape browser from caching the page on the disk, however Internet Explorer 3.x browser will continue to cache the page even if with the pragma.

Proxy Caching

The following information about proxy caching is based on the functionality of a majority of proxies, including Microsoft Proxy Server. However, there are a wide range of proxy available and they do not all function the same. Also note that not all proxies cache, nor do all of them have proxy caching turned on.

Proxies will cache files in hopes of better network performance. When a response from a web server returns from a request through the proxy, the proxy takes the page and caches it. If another browser makes the same request, the proxy uses the cached file and the request never makes it to the server. Like the browser cache, the proxy has a limited amount of room so most proxies remove pages from the cache based on inactivity.

The caching mechanism for the proxy uses the Last-modified header and the Expires header to determine when and for how long to cache the proxy information. Because proxies caches are all programmed different there is not a broad sweeping statement that can be used for all proxies. Instead let us look at a specific proxy, the Microsoft Proxy Server.

If an expiration date exists, the Microsoft Proxy Server honors that expiration date. When the cached file has expired, the proxy server removes the page from the cache. The next request then passes through the proxy and the response is cached with the new expiration date. If the expiration date is equal to the current date and time the page is not cached at all. Such is the case with the example 3:

Example 3

<% Response.Expires=0 %>
<HTML>
&ltBODY>
Example 3



If there is no Expires header, then the proxy server bases the expiration of the cached page off of the Last-Modified header. The Microsoft Proxy Server caches the page twenty percent of the difference between the Last-Modified date and the currently date. If the page is five days old, the proxy server will set the expiration date to be twenty percent of five days; the page expires in a day.

If there is neither the Last-Modified header nor the Expires header, then Microsoft Proxy Server sets the expiration date to 10 minutes.

Manipulating the Proxy

One of the problems with proxies is that they assume a response for one person is going to be the same for all people requesting the page. This isn't true in certain situations, especially when you are returning a dynamic page that contains information specific to the user requesting the page. For this reason you can tell the Proxies that certain responses are private and should not be shared publicly. The way to do this is to send a Cache-Control header of private like this:

Cache-Control: private

Fortunately, the Internet Information Server by default automatically adds this header to all Active Server pages. There might be instances that you want to change the Cache-Control header in an Active Server page, for example if you are returning content that is dynamically generated but not based on the individual. IIS 3.0 does not let you change the Cache-Control header. However IIS 4.0 does, you can set the Cache-Control like this:

Response.CacheControl = Public

If you don't set the Cache-control header in IIS 4.0, it will default to private for Active Server pages.

Notice that there is only one way to manipulate the Last-Modified header and the Expires header to have the proxy not cache the page. That is to set the Expires header to the current time, in other words Reponse.Expires=0. However, this adversely effects the client. With the Expires header set to the current date, the page is loaded every time the user views it. For this reason, we advise you to use the cache control header.

Active Server Pages

The real knowledge that you need to take away from the last issue and this issue is how to manipulate the HTTP headers to get the results that you want. There are two extreme cases, either you want the view to be "new" every time the user visits the Active Server page, or you want the browser to cache the Active Server page and save server resources and network bandwidth.

Default

Without modifying or adding any headers, the default setting for an Active Server page is only to send the Cache-Control header as private. This means that the page will cache and will not be requested from the server again unless the user "refreshes." The exception is if the browser is closed and restarts since by default the browser request the page again upon the first view

No Caching

If you are writing Active Server pages and want to make the view on the browser change every time the user navigates to the page or refreshes the page you must set the Expires header to the current date for both Internet Explorer and Netscape 4.0. You must also set the no-cache pragma header for Netscape 3.x. You either must not use a Last-Modified header that doesn't change, or use a Last-Modified header that is constantly newer then the last one sent. Because the Last Modified header is optional, we recommend that you do not use a Last Modified header at all, if you want the view to change every time the user navigates to the page. Make sure to leave the Cache-Control header set to private, the default of Internet Information Server.

Caching

If you want your active server page to cache on the browser, set the Last-Modified header and the Expires headers. Make sure not to send the no-cache pragma command. You will also need to follow the instructions for returning the status of "304 Not Modified" documented in the last issue. If you want to share the cache on the proxy between multiple users, set the Cache-Control to header public. Remember you can not set the Cache-Control header in IIS 3.0, only IIS 4.0.

No comments: