Приглашаем посетить
Спорт (www.sport-data.ru)

Cache-Friendly PHP Applications

Previous
Table of Contents
Next

Cache-Friendly PHP Applications

To take advantage of caches, PHP applications must be made cache friendly. A cache-friendly application understands how the caching policies in browsers and proxies work and how cacheable its own data is. The application can then be set to send appropriate cache-related directives with browsers to achieve the desired results.

There are four HTTP headers that you need to be conscious of in making an application cache friendly:

  • Last-Modified

  • Expires

  • Pragma: no-cache

  • Cache-Control

The Last-Modified HTTP header is a keystone of the HTTP 1.0 cache negotiation ability. Last-Modified is the Universal Time Coordinated (UTC; formerly GMT) date of last modification of the page. When a cache attempts a revalidation, it sends the Last-Modified date as the value of its If-Modified-Since header field so that it can let the server know what copy of the content it should be revalidated against.

The Expires header field is the nonrevalidation component of HTTP 1.0 revalidation. The Expires value consists of a GMT date after which the contents of the requested documented should no longer be considered valid.

Many people also view Pragma: no-cache as a header that should be set to avoid objects being cached. Although there is nothing to be lost by setting this header, the HTTP specification does provide an explicit meaning for this header, so its usefulness is regulated by it being a de facto standard implemented in many HTTP 1.0 caches.

In the late 1990s, when many clients spoke only HTTP 1.0, the cache negotiation options for applications where rather limited. It used to be standard practice to add the following headers to all dynamic pages:

function http_1_0_nocache_headers()
{
    $pretty_modtime = gmdate('D, d M Y H:i:s') . 'GMT';
        header("Last-Modified: $pretty_modtime");
        header("Expires: $pretty_modtime");
        header("Pragma: no-cache");
}

This effectively tells all intervening caches that the data is not to be cached and always should be refreshed.

When you look over the possibilities given by these headers, you see that there are some glaring deficiencies:

  • Setting expiration time as an absolute timestamp requires that the client and server system clocks be synchronized.

  • The cache in a client's browser is quite different than the cache at the client's ISP. A browser cache could conceivably cache personalized data on a page, but a proxy cache shared by numerous users cannot.

These deficiencies were addressed in the HTTP 1.1 specification, which added the Cache-Control directive set to tackle these problems. The possible values for a Cache-Control response header are set in RFC 2616 and are defined by the following syntax:

Cache-Control = "Cache-Control" ":"; l#cache-response-directive

cache-response-directive =
           "public"
         | "private"
         | "no-cache"
         | "no-store"
         | "no-transform"
         | "must-revalidate"
         | "proxy-revalidate"
         | "max-age" "=" delta-seconds
         | "s-maxage" "=" delta-seconds

The Cache-Control directive specifies the cacheability of the document requested. According to RFC 2616, all caches and proxies must obey these directives, and the headers must be passed along through all proxies to the browser making the request.

To specify whether a request is cacheable, you can use the following directives:

  • public The response can be cached by any cache.

  • private The response may be cached in a nonshared cache. This means that the request is to be cached only by the requestor's browser and not by any intervening caches.

  • no-cache The response must not be cached by any level of caching. The no-store directive indicates that the information being transmitted is sensitive and must not be stored in nonvolatile storage. If an object is cacheable, the final directives allow specification of how long an object may be stored in cache.

  • must-revalidate All caches must always revalidate requests for the page. During verification, the browser sends an If-Modified-Since header in the request. If the server validates that the page represents the most current copy of the page, it should return a 304 Not Modified response to the client. Otherwise, it should send back the requested page in full.

  • proxy-revalidate This directive is like must-revalidate, but with proxy-revalidate, only shared caches are required to revalidate their contents.

  • max-age This is the time in seconds that an entry is considered to be cacheable without revalidation.

  • s-maxage This is the maximum time that an entry should be considered valid in a shared cache. Note that according to the HTTP 1.1 specification, if max-age or s-maxage is specified, they override any expirations set via an Expire header.

The following function handles setting pages that are always to be revalidated for freshness by any cache:

function validate_cache_headers($my_modtime)
{
    $pretty_modtime = gmdate('D, d M Y H:i:s', $my_modtime) . ' GMT';
    if($_SERVER['IF_MODIFIED_SINCE'] == $gmt_mtime) {
        header("HTTP/1.1 304 Not Modified");
        exit;
    }
    else {
        header("Cache-Control: must-revalidate");
        header("Last-Modified: $pretty_modtime");
    }
}

It takes as a parameter the last modification time of a page, and it then compares that time with the Is-Modified-Since header sent by the client browser. If the two times are identical, the cached copy is good, so a status code 304 is returned to the client, signifying that the cached copy can be used; otherwise, the Last-Modified header is set, along with a Cache-Control header that mandates revalidation.

To utilize this function, you need to know the last modification time for a page. For a static page (such as an image or a "plain" nondynamic HTML page), this is simply the modification time on the file. For a dynamically generated page (PHP or otherwise), the last modification time is the last time that any of the data used to generate the page was changed.

Consider a Web log application that displays on its main page all the recent entries:

$dbh = new DB_MySQL_Prod();
$result = $dbh->execute("SELECT max(timestamp)
               FROM weblog_entries");
if($results) {
    list($ts) = $result->fetch_row();
    validate_cache_headers($ts);
}

The last modification time for this page is the timestamp of the latest entry.

If you know that a page is going to be valid for a period of time and you're not concerned about it occasionally being stale for a user, you can disable the must-revalidate header and set an explicit Expires value. The understanding that the data will be somewhat stale is important: When you tell a proxy cache that the content you served it is good for a certain period of time, you have lost the ability to update it for that client in that time window. This is okay for many applications.

Consider, for example, a news site such as CNN's. Even with breaking news stories, having the splash page be up to one minute stale is not unreasonable. To achieve this, you can set headers in a number of ways.

If you want to allow a page to be cached by shared proxies for one minute, you could call a function like this:

function cache_novalidate($interval = 60)
{
    $now = time();
    $pretty_lmtime = gmdate('D, d M Y H:i:s', $now) . ' GMT';
    $pretty_extime = gmdate('D, d M Y H:i:s', $now + $interval) . 'GMT';
    // Backwards Compatibility for HTTP/1.0 clients
    header("Last Modified: $pretty_lmtime");
    header("Expires: $pretty_extime");
    // HTTP/1.1 support
    header("Cache-Control: public,max-age=$interval");
}

If instead you have a page that has personalization on it (say, for example, the splash page contains local news as well), you can set a copy to be cached only by the browser:

function cache_browser($interval = 60)
{
    $now = time();
    $pretty_lmtime = gmdate('D, d M Y H:i:s', $now) . ' GMT';
    $pretty_extime = gmdate('D, d M Y H:i:s', $now + $interval) . ' GMT';
    // Backwards Compatibility for HTTP/1.0 clients
    header("Last Modified: $pretty_lmtime");
    header("Expires: $pretty_extime");
    // HTTP/1.1 support
    header("Cache-Control: private,max-age=$interval,s-maxage=0");
}

Finally, if you want to try as hard as possible to keep a page from being cached anywhere, the best you can do is this:

function cache_none($interval = 60)
{
  // Backwards Compatibility for HTTP/1.0 clients
  header("Expires: 0");
  header("Pragma: no-cache");
  // HTTP/1.1 support
  header("Cache-Control: no-cache,no-store,max-age=0,s-maxage=0,must-revalidate");
}

The PHP session extension actually sets no-cache headers like these when session_start() is called. If you feel you know your session-based application better than the extension authors, you can simply reset the headers you want after the call to session_start().

The following are some caveats to remember in using external caches:

  • Pages that are requested via the POST method cannot be cached with this form of caching.

  • This form of caching does not mean that you will serve a page only once. It just means that you will serve it only once to a particular proxy during the cacheability time period.

  • Not all proxy servers are RFC compliant. When in doubt, you should err on the side of caution and render your content uncacheable.


Previous
Table of Contents
Next