Ïðèãëàøàåì ïîñåòèòü
ßçûêîâ (yazykov.lit-info.ru)

Server-Side Sessions

Previous
Table of Contents
Next

Server-Side Sessions

In designing a server-side session system that works in a distributed environment, it is critical to guarantee that the machine that receives a request will have access to its session information.

Returning to our analogy of medical records, a server side, or office-managed, implementation has two options: The user can be brought to the data or the data can be brought to the user. Lacking a centralized data store, we must require the user to always return to the same server. This is like requiring a patient to always return to the same doctor's office. While this methodology works well for small-town medical practices and single-server setups, it is not very scalable and breaks down when you need to service the population at multiple locations. To handle multiple offices, HMOs implement centralized patient information databases, where any of their doctors can access and update the patient's record.

In content load balancing, the act of guaranteeing that a particular user is always delivered to a specific server, is known as session stickiness. Session stickiness can be achieved by using a number of hardware solutions (almost all the "Level 7" or "content switching" hardware load balancers support session stickiness) or software solutions (mod_backhand for Apache supports session stickiness). Just because we can do something, however, doesn't mean we should. While session stickiness can enhance cache locality, too many applications rely on session stickiness to function correctly, which is bad design. Relying on session stickiness exposes an application to a number of vulnerabilities:

  • Undermined resource/load balancing Resource balancing is a difficult task. Every load balancer has its own approach, but all of them attempt to optimize the given request based on current trends. When you require session stickiness, you are actually committing resources for that session for perpetuity. This can lead to sub-optimal load balancing and undermines many of the "smart" algorithms that the load balancer applies to distribute requests.

  • More prone to failure Consider this mathematical riddle: All things being equal, which is safera twin-engine plane that requires both engines to fly or a single-engine plane. The single-engine plane is safer because the chance of one of two engines failing is greater than the chance of one of one engines failing. (If you prefer to think of this in dice, it is more likely that you will get at least one 6 when rolling two dice than one 6 on one die.) Similarly, a distributed system that breaks when any one of its nodes fails is poorly designed. You should instead strive to have a system that is fault tolerant as long as one of its nodes functions correctly. (In terms of airplanes, a dual-engine plane that needs only one engine to fly is probabilistically safer than a single-engine plane.)

The major disadvantage of ensuring that client data is available wherever it is needed is that it is resource intensive. Session caches by their very nature tend to be updated on every request, so if you are supporting a site with 100 requests per second, you need a storage mechanism that is up to that task. Supporting 100 updates and selects per second is not a difficult task for most modern RDBMS solutions; but when you scale that number to 1,000, many of those solutions will start to break down. Even using replication for this sort of solution does not provide a large scalability gain because it is the cost of the session updates and not the selects that is the bottleneck, and as discussed earlier, replication of inserts and updates is much more difficult than distribution of selects. This should not necessarily deter you from using a database-backed session solution; many applications will never reasonably grow to that level, and it is silly to avoid something that is unscalable if you never intend to use it to the extent that its scalability breaks down. Still, it is good to know these things and design with all the potential limitations in mind.

PHP Sessions and Reinventing the Wheel

While writing this chapter, I will admit that I have vacillated a number of times on whether to focus on custom session management or PHP's session extension. I have often preferred to reinvent the wheel (under the guise of self-education) rather than use a boxed solution that does much of what I want. For me personally, sessions sit on the cusp of features I would rather implement myself and those that I would prefer to use out of the box. PHP sessions are very robust, and while the default session handlers fail to meet a number of my needs, the ability to set custom handlers enables us to address most of the deficits I find.


The following sections focus on PHP's session extension for lightweight sessions. Let's start by reviewing basic use of the session extension.

Tracking the Session ID

The first hurdle you must overcome in tracking the session ID is identifying the requestor. Much as you must present your health insurance or Social Security number when you go to the doctor's office so that the doctor can retrieve your records, a session must present its session ID to PHP so that the session information can be retrieved. As discussed in Chapter 13, session hijacking is a problem that you must always consider. Because the session extension is designed to operate completely independently of any authentication system, it uses random session ID generation to attempt to deter hijacking.

Native Methods for Tracking the Session ID

The session extension natively supports two methods for transmitting a session ID:

  • Cookies

  • Query string munging

The cookies method uses a dedicated cookie to manage the session ID. By default the name of the cookie is PHPSESSIONID, and it is a session cookie (that is, it has an expiration time of 0, meaning that it is destroyed when the browser is shut down). Cookie support is enabled by setting the following in your php.ini file (it defaults to on):

session.use_cookies=1

The query string munging method works by automatically adding a named variable to the query string of tags present in the document. Query munging is off by default, but you can enable it by using the following php.ini setting:

session.use_trans_sid=1

In this setting, trans_sid stands for "transparent session ID," and it is so named because tags are automatically rewritten when it is enabled. For example, when use_trans_id is true, the following:

<?php
  session_start();
?>
<a href="/foo.php">Foo</a>

will be rendered as this:

<a href="/foo.php?PHPSESSIONID=12345">foo</a>

Using cookie-based session ID tracking is preferred to using query string munging for a couple reasons, which we touched on in Chapter 13:

  • Security It is easy for a user to accidentally mail a friend a URL with his or her active session ID in it, resulting in an unintended hijacking of the session. There are also attacks that trick users into authenticating a bogus session ID by using the same mechanism.

  • Aesthetics Adding yet another parameter to a query string is ugly and produces cryptic-looking URLs.

For both cookie- and query-managed session identifiers, the name of the session identifier can be set with the php.ini parameter session.name. For example, to use MYSESSIONID as the cookie name instead of PHPSESSIONID, you can simply set this:

session.name=MYSESSIONID

In addition, the following parameters are useful for configuring cookie-based session support:

  • session.cookie_lifetime Defaults to 0 (a pure session cookie). Setting this to a nonzero value enables you to set sessions that expire even while the browser is still open (which is useful for "timing out" sessions) or for sessions that span multiple browser sessions. (However, be careful of this for both security reasons as well as for maintaining the data storage for the session backing.)

  • session.cookie_path Sets the path for the cookie. Defaults to /.

  • session.cookie_domain Sets the domain for the cookie. Defaults to "", which sets the cookie domain to the hostname that was requested by the client browser.

  • session.cookie_secure Defaults to false. Determines whether cookies should only be sent over SSL sessions. This is an anti-hijacking setting that is designed to prevent your session ID from being read, even if your network connection is being monitored. Obviously, this only works if all the traffic for that cookie's domain is over SSL.

Similarly, the following parameters are useful for configuring query string session support:

  • session.use_only_cookies Disables the reading of session IDs from the query string. This is an additional security parameter that should be set when use_trans_sid is set to false.

  • url_rewriter.tags Defaults to a=href,frame=src,input=src,form=fakeentry. Sets the tags that will be transparently rewritten with the session parameters if use_trans_id is set to true. For example, to have session IDs also sent for images, you would add img=src to the list of tags to be rewritten.

A Brief Introduction to PHP Sessions

To use basic sessions in a script, you simply call session_start() to initialize the session and then add key/value pairs to the $_SESSION autoglobals array. The following code snippet creates a session that counts the number of times you have visited the page and displays it back to you. With default session settings, this will use a cookie to propagate the session information and reset itself when the browser is shut down.

Here is a simple script that uses sessions to track the number of times the visitor has seen this page:

<?php
  session_start();
  if(isset($_SESSION['viewnum'])) {
    $_SESSION['viewnum']++;
  } else {
    $_SESSION['viewnum'] = 1;
  }
?>
<html>
<body>
Hello There.<br>
This is  <?= $_SESSION['viewnum'] ?> times you have seen a page on this site.<br>
</body>
</html>

session_start() initializes the session, reading in the session ID from either the specified cookie or through a query parameter. When session_start() is called, the data store for the specified session ID is accessed, and any $_SESSION variables set in previous requests are reinstated. When you assign to $_SESSION, the variable is marked to be serialized and stored via the session storage method at request shutdown.

If you want to flush all your session data before the request terminates, you can force a write by using session_write_close(). One reason to do this is that the built-in session handlers provide locking (for integrity) around access to the session store. If you are using sessions in multiple frames on a single page, the user's browser will attempt to fetch them in parallel; but the locks will force this to occur serially, meaning that the frames with session calls in them will be loaded and rendered one at a time.

Sometimes you might want to permanently end a session. For example, with a shopping cart application that uses a collection of session variables to track items in the cart, when the user has checked out, you might want to empty the cart and destroy the session. Implementing this with the default handlers is a two-step process:

...
// clear the $_SESSION globals
$_SESSION = array();
// now destroy the session backing
session_destroy();
...

While the order in which you perform these two steps does not matter, it is necessary to perform both. session_destroy() clears the backing store to the session, but if you do not unset $_SESSION, the session information will be stored again at request shutdown.

You might have noticed that we have not discussed how this session data is managed internally in PHP. You have seen in Chapters 9, "External Performance Tunings," 10, "Data Component Caching," and 11 "Computational Reuse," that it is easy to quickly amass a large cache in a busy application. Sessions are not immune to this problem and require cleanup as well. The session extension chooses to take a probabilistic approach to garbage collection. On every request, it has a certain probability of invoking its internal garbage-collection routines to maintain the session cache. The probability that the garbage collector is invoked is set with this php.ini setting:

// sets the probability of garbage collection on a give request to 1%
session.gc_probability=1

The garbage collector also needs to know how old a session must be before it is eligible for removal. This is also set with a php.ini setting (and it defaults to 1,440 secondsthat is, 24 minutes):

// sessions can be collected after 15 minutes (900 seconds)
session.gc_maxlifetime=900

Figure 14.1 shows the actions taken by the session extension during normal operation. The session handler starts up, initializes its data, performs garbage collection, and reads the user's session data. Then the page logic after session_start() is processed. The script may use or modify the $_SESSION array to its choosing. When the session is shut down, the information is written back to disk and the session extension's internals are cleaned up.

Figure 14.1. Handler callouts for a session handler.

Server-Side Sessions


Custom Session Handler Methods

It seems a shame to invest so much effort in developing an authentication system and not tie it into your session data propagation. Fortunately, the session extension provides the session_id function, which allows for setting custom session IDs, meaning that you can integrate it directly into your authentication system.

If you want to tie each user to a unique session, you can simply use each user's user ID as the session ID. Normally this would be a bad idea from a security standpoint because it would provide a trivially guessable session ID that is easy to exploit; however, in this case you will never transmit or read the session ID from a plaintext cookie; you will grab it from your authentication cookie.

To extend the authentication example from Chapter 13, you can change the page visit counter to this:

try {
    $cookie = new Cookie();
    $cookie->validate();
    session_id($cookie->userid);
    session_start();
}
catch (AuthException $e) {
  header("Location: /login.php?originating_uri=$_SERVER['REQUEST_URI']");
    exit;
}
if(isset($_SESSION['viewnum'])) {
  $_SESSION['viewnum']++;
} else {
  $_SESSION['viewnum'] = 1;
}
?>
<html>
<body>
Hello There.<br>
This is  <?= $_SESSION['viewnum'] ?> times you have seen a page on this site.<br>
</body>
</html>

Note that you set the session ID before you call session_start(). This is necessary for the session extension to behave correctly. As the example stands, the user's user ID will be sent in a cookie (or in the query string) on the response. To prevent this, you need to disable both cookies and query munging in the php.ini file:

session.use_cookies=0
session.use_trans_sid=0

And for good measure (even though you are manually setting the session ID), you need to use this:

session.use_only_cookies=1

These settings disable all the session extension's methods for propagating the session ID to the client's browser. Instead, you can rely entirely on the authentication cookies to carry the session ID.

If you want to allow multiple sessions per user, you can simply augment the authentication cookie to contain an additional property, which you can set whenever you want to start a new session (on login, for example). Allowing multiple sessions per user is convenient for accounts that may be shared; otherwise, the two users' experiences may become merged in strange ways.

Note

We discussed this at length in Chapter 13, but it bears repeating: Unless you are absolutely unconcerned about sessions being hijacked or compromised, you should always encrypt session data by using strong cryptography. Using ROT13 on your cookie data is a waste of time. You should use a proven symmetric cipher such as Triple DES, AES, or Blowfish. This is not paranoiajust simple common sense.


Now that you know how to use sessions, let's examine the handlers by which they are implemented. The session extension is basically a set of wrapper functions around multiple storage back ends. The method you choose does not affect how you write your code, but it does affect the applicability of the code to different architectures. The session handler to be used is set with this php.ini setting:

session.save_handler='files'

PHP has two prefabricated session handlers:

  • files The default, files uses an individual file for storing each session.

  • mm This is an implementation that uses BSD shared memory, available only if you have libmm installed and build PHP by using the with-mm configure flag.

We've looked at methods similar to these in Chapters 9, 10, and 11. They work fine if you are running on a single machine, but they don't scale well with clusters. Of course, unless you are running an extremely simple setup, you probably don't want to be using the built-in handlers anyway. Fortunately, there are hooks for userspace session handlers, which allow you to implement your own session storage functions in PHP. You can set them by using session_set_save_handler. If you want to have distributed sessions that don't rely on sticky connections, you need to implement them yourself.

The user session handlers work by calling out for six basic storage operations:

  • open

  • close

  • read

  • write

  • destroy

  • gc

For example, you can implement a MySQL-backed session handler. This will give you the ability to access consistent session data from multiple machines.

The table schema is simple, as illustrated in Figure 14.2. The session data is keyed by session_id. The serialized contents of $_SESSION will be stored in session_data. You use the CLOB (character large object) column type text so that you can store arbitrarily large amounts of session data. modtime allows you to track the modification time for session data for use in garbage collection.

Figure 14.2. An updated copy of Figure 14.1 that shows how the callouts fit into the session life cycle.

Server-Side Sessions


For clean organization, you can put the custom session handlers in the MySession class:

class MySession {
  static $dbh;

MySession::open is the session opener. This function must be prototyped to accept two arguments: $save_path and $session_name. $save_path is the value of the php.ini parameter session.save_path. For the files handler, this is the root of the session data caching directory. In a custom handler, you can set this parameter to pass in location-specific data as an initializer to the handler. $session_name is the name of the session (as specified by the php.ini parameter session.session_name). If you maintain multiple named sessions in distinct hierarchies, this might prove useful. For this example, you do not care about either of these, so you can simply ignore both passed parameters and open a handle to the database, which you can store for later use. Note that because open is called in session_start() before cookies are sent, you are not allowed to generate any output to the browser here unless output buffering is enabled. You can return true at the end to indicate to the session extension that the open() function completed correctly:

function open($save_path, $session_name) {
  MySession::$dbh = new DB_MySQL_Test();
  return(true);
}

MySession::close is called to clean up the session handler when a request is complete and data is written. Because you are using persistent database connections, you do not need to perform any cleanup here. If you were implementing your own file-based solution or any other nonpersistent resource, you would want to make sure to close any resources you may have opened. You return TRue to indicate to the session extension that we completed correctly:

function close() {
    return(true);
  }

MySession::read is the first handler that does real work. You look up the session by using $id and return the resulting data. If you look at the data that you are reading from, you see session_data, like this:

count|i:5;

This should look extremely familiar to anyone who has used the functions serialize() and unserialize(). It looks a great deal like the output of the following:

<?php
        $count = 5;
        print serialize($count);
?>
> php ser.php
i:5;

This isn't a coincidence: The session extension uses the same internal serialization routines as serialize and deserialize.

After you have selected your session data, you can return it in serialized form. The session extension itself handles unserializing the data and reinstantiating $_SESSION:

   function read($id) {
     $result = MySession::$dbh->prepare("SELECT session_data
                          FROM sessions
                          WHEREsession_id = :1")->execute($id);
     $row = $result->fetch_assoc();
     return $row['session_data'];
   }

MySession::write is the companion function to MySession::read. It takes the session ID $id and the session data $sess_data and handles writing it to the backing store. Much as you had to hand back serialized data from the read function, you receive pre-serialized data as a string here. You also make sure to update your modification time so that you are able to accurately dispose of idle sessions:

function write($id, $sess_data) {
  $clean_data = mysql_escape_string($sess_data);
  MySession::$dbh->execute("REPLACE INTO
                              sessions
                            (session_id, session_data, modtime)
                            VALUES('$id', '$clean_data', now())");
}

MySession::destroy is the function called when you use session_destroy(). You use this function to expire an individual session by removing its data from the backing store. Although it is inconsistent with the built-in handlers, you can also need to destroy the contents of $_SESSION. Whether done inside the destroy function or after it, it is critical that you destroy $_SESSION to prevent the session from being re-registered automatically.

Here is a simple destructor function:

function destroy($id) {
  MySession::$dbh->execute("DELETE FROM sessions
                            WHERE session_id = '$id'");
  $_SESSION = array();
}

Finally, you have the garbage-collection function, MySession::gc. The garbage-collection function is passed in the maximum lifetime of a session in seconds, which is the value of the php.ini setting session.gc_maxlifetime. As you've seen in previous chapters, intelligent and efficient garbage collection is not trivial. We will take a closer look at the efficiency of various garbage-collection methods in the following sections. Here is a simple garbage-collection function that simply removes any sessions older than the specified $maxlifetime:

function gc($maxlifetime) {
  $ts = time() - $maxlifetime;
  MySession::$dbh->execute("DELETE FROM sessions
                            WHERE modtime < from_unixtimestamp($ts)");
  }
}

Garbage Collection

Garbage collection is tough. Overaggressive garbage-collection efforts can consume large amounts of resources. Underaggressive garbage-collection methods can quickly overflow your cache. As you saw in the preceding section, the session extension handles garbage collection by calling the save_handers gc function every so often. A simple probabilistic algorithm helps ensure that sessions get collected on, even if children are short-lived.

In the php.ini file, you set session.gc_probability. When session_start() is called, a random number between 0 and session.gc_dividend (default 100) is generated, and if it is less than gc_probability, the garbage-collection function for the installed save handler is called. Thus, if session.gc_probability is set to 1, the garbage collector will be called on 1% of requeststhat is, every 100 requests on average.

Garbage Collection in the files Handler

In a high-volume application, garbage collection in the files session handler is an extreme bottleneck. The garbage-collection function, which is implemented in C, basically looks like this:

function files_gc_collection($cachedir, $maxlifetime)
{
    $now = time();
    $dir = opendir($cachedir);
    while(($file = readdir($dir)) !== false) {
        if(strncmp("sess_", $file, 5)) {               continue;
        }
        if($now - filemtime($cachedir."/".$file)  > $maxlifetime) {
            unlink($cachedir."/".$file);
        }
    }
}

The issue with this cleanup function is that extensive input/output (I/O) must be performed on the cache directory. Constantly scanning that directory can cause serious contention.

One solution for this is to turn off garbage collection in the session extension completely (by setting session.gc_probability = 0) and then implement a scheduled job such as the preceding function, which performs the cleanup completely asynchronously.

Garbage Collection in the mm Handler

In contrast to garbage collection in the files handler, garbage collection in the mm handler is quite fast. Because the data is all stored in shared memory, the process simply needs to take a lock on the memory segment and then recurse the session hash in memory and expunge stale session data.

Garbage Collection in the MySession Handler

So how does the garbage collection in the MySession handler stack up against garbage collection in the files and mm handlers? It suffers from the same problems as the files handler. In fact, the problems are even worse for the MySession handler.

MySQL requires an exclusive table lock to perform deletes. With high-volume traffic, this can cause serious contention as multiple processes attempt to maintain the session store simultaneously while everyone else is attempting to read and update their session information. Fortunately, the solution from the files handler works equally well here: You can simply disable the built-in garbage-collection trigger and implement cleanup as a scheduled job.

Choosing Between Client-Side and Server-Side Sessions

In general, I prefer client-side managed sessions for systems where the amount of session data is relatively small. The magic number I use as "relatively small" is 1KB of session data. Below 1KB of data, it is still likely that the client's request will fit into a single network packet. (It is likely below the path maximum transmission unit [MTU] for all intervening links.) Keeping the HTTP request inside a single packet means that the request will not have to be fragmented (on the network level), and this reduces latency.

When choosing a server-side session-management strategy, be very conscious of your data read/update volumes. It is easy to overload a database-backed session system on a high-traffic site. If you do decide to go with such a system, use it judiciouslyonly update session data where it needs to be updated.

Implementing Native Session Handlers

If you would like to take advantage of the session infrastructure but are concerned about the performance impact of having to run user code, writing your own native session handler in C is surprisingly easy. Chapter 22, "Detailed Examples and Applications," demonstrates how to implement a custom session extension in C.


Previous
Table of Contents
Next