Tuning Database Access

In almost all the applications I have worked with, database access has consistently been the number-one bottleneck in application performance. The reason for this is pretty simple: In many Web applications, a large portion of content is dynamic and is contained in a database. No matter how fast your database access is, reaching across a network socket to pull data from your database is slower than pulling it from local process memory. Chapters 9, "External Performance Tunings," 10, "Data Component Caching," and 11, "Computational Reuse," you show various ways to improve application performance by caching data. Caching techniques aside, you should ensure that your database interactions are as fast as possible. The following sections discuss techniques for improving query performance and responsiveness.

Limiting the Result Set

One of the simplest techniques for improving query performance is to limit the size of your result sets. A common mistake is to have a forum application from which you need to extract posts N through N+M. The forum table looks like this:

CREATE TABLE forum_entries (
  id int not null auto increment,
  author varchar(60) not null,
  posted_at timestamp not null default now().
  data text
);

The posts are ordered by timestamp, and entries can be deleted, so a simple range search based on the posting ID won't work. A common way I've seen the range extraction implemented is as follows:

function returnEntries($start, $numrows)
{
 $entries = array();
 $dbh = new DB_Mysql_Test;
 $query = "SELECT * FROM forum_entries ORDER BY posted_at";
 $res = $dbh->execute($query);
 while($data = $res->fetch_assoc()) {
   if ( $i++ < $start || $i > $start + $numrows ) {
     continue;
   }
   array_push($entries, new Entry($data));
 }
 return $entries;
}

The major problem with this methodology is that you end up pulling over every single row in forum_entries. Even if the search is terminated with $i > $end, you have still pulled over every row up to $end. When you have 10,000 forum entry postings and are trying to display records 9,980 to 10,000, this will be very, very slow. If your average forum entry is 1KB, running through 10,000 of them will result in 10MB of data being transferred across the network to you. That's quite a bit of data for the 20 entries that you want.

A better approach is to limit the SELECT statement inside the query itself. In MySQL this is extremely easy; you can simply use a LIMIT clause in the SELECT, as follows:

function returnEntries($start, $numrows)
{
  $entries = array();
  $dbh = new DB_Mysql_Test;
  $query = "SELECT * FROM forum_entries ORDER BY posted_at LIMIT :1, :2";
  $res = $dbh->prepare($query)->execute($start, $numrows);
  while($data = $res->fetch_assoc()) {
    array_push($entries, new Entry($data));
  }
  return $entries;
}

The LIMIT syntax is not part of the SQL92 language syntax definition for SQL, so it might not be available on your platform. For example, on Oracle you need to write the query like this:

$query = "SELECT a.* FROM
            (SELECT * FROM forum_entries ORDER BY posted_at) a
          WHERE rownum BETWEEN :1 AND :2";

This same argument applies to the fields you select as well. In the case of forum_entries, you most likely need all the fields. In other cases, especially were a table is especially wide (meaning that it contains a number of large varchar or LOB columns), you should be careful not to request fields you don't need.

SELECT * is also evil because it encourages writing code that depends on the position of fields in a result row. Field positions are subject to change when a table is altered (for example, when you add or remove a column). Fetching result rows into associative arrays mitigates this problem.

Remember: Any data on which you use SELECT will need to be pulled across the network and processed by PHP. Also, memory for the result set is tied up on both the server and the client. The network and memory costs can be extremely high, so be pragmatic in what you select .

Lazy Initialization

Lazy initialization is a classic tuning strategy that involves not fetching data until you actually need it. This is particularly useful where the data to be fetched is expensive and the fetching is performed only occasionally. A typical example of lazy initialization is lookup tables. If you wanted a complete two-way mapping of ISO country codes to country names, you might create a Countries library that looks like this:

class Countries {
  public static $codeFromName = array();
  public static $nameFromCode = array();

  public static function populate()
  {
    $dbh = new DB_Mysql_Test;
    $query = "SELECT name, countrycode FROM countries";
    $res = $dbh->execute($query)->fetchall_assoc();
    foreach($res as $data) {
      self::$codeFromName[$data['name']] = $data['countrycode'];
      self::$nameFromCode[$data['countrycode']] = $data['name'];
    }
  }
}
Countries::populate();

Here, populate() is called when the library is first loaded, to initialize the table.

With lazy initialization, you do not perform the country lookup until you actually need it. Here is an implementation that uses accessor functions that handle the population and caching of results:

class Countries {
  private static $nameFromCodeMap = array();

  public static function nameFromCode($code)
  {
    if(!in_array($code, self::$nameFromCodeMap)) {
      $query = "SELECT name FROM countries WHERE countrycode = :1";
      $dbh = new DB_Mysql_Test;
      list ($name) = $dbh->prepare($query)->execute($code)->fetch_row();
      self::$nameFromCodeMap[$code] = $name;
      if($name) {
        self::$codeFromNameMap[$name] = $code;
      }
    }
    return self::$nameFromCodeMap[$code];
  }
  public static function codeFromName($name)
  {
    if(!in_array($name, self::$codeFromNameMap)) {
      $query = "SELECT countrycode FROM countries WHERE name = :1";
      $dbh = new DB_Mysql_Test;
      list ($code) = $dbh->prepare($query)->execute($name)->fetch_row();
      self::$codeFromNameMap[$name] = $code;
      if($code) {
        self::$nameFromCodeMap[$code] = $name;
      }
    }
    return self::$codeFromNameMap[$name];
  }
}

Another application of lazy initialization is in tables that contain large fields. For example, my Web logging software uses a table to store entries that looks like this:

CREATE TABLE entries (
  id int(10) unsigned NOT NULL auto_increment,
  title varchar(200) default NULL,
  timestamp int(10) unsigned default NULL,
  body text,
  PRIMARY KEY  (id)
);

I have an Active Record pattern class Entry that encapsulates individual rows in this table. There are a number of contexts in which I use the timestamp and title fields of an Entry object but do not need its body. For example, when generating an index of entries on my Web log, I only need their titles and time of posting. Because the body field can be very large, it is silly to pull this data if I do not think I will use it. This is especially true when generating an index, as I may pull tens or hundreds of EnTRy records at one time.

To avoid this type of wasteful behavior, you can use lazy initialization body. Here is an example that uses the overloaded attribute accessors __get() and __set() to make the lazy initialization of body completely transparent to the user:

class Entry {
  public $id;
  public $title;
  public $timestamp;
  private $_body;

  public function __construct($id = false)
  {
    if(!$id) {
      return;
    }
    $dbh = new DB_Mysql_Test;
    $query = "SELECT id, title, timestamp
              FROM entries
              WHERE id = :1";
    $data = $dbh->prepare($query)->execute($id)->fetch_assoc();
    $this->id = $data['id'];
    $this->title = $data['title'];
    $this->timestamp = $data['timestamp'];
  }

  public function __get($name) {
    if($name == 'body') {
      if($this->id && !$this->_body) {
        $dbh = new DB_Mysql_Test;
        $query = "SELECT body FROM entries WHERE id = :1";
        list($this->_body) =
          $dbh->prepare($query)->execute($this->id)->fetch_row();
      }
      return $this->_body;
    }
  }
  public function __set($name, $value)
  {
  if($name == 'body') {
    $this->_body = $value;
  }
 }
 /** Active Record update() delete() and insert() omitted below **/
}

When you instantiate an Entry object by id, you get all the fields except for body. As soon as you request body, though, the overload accessors fetch it and stash it in the private variable $_body. Using overloaded accessors for lazy initialization is an extremely powerful technique because it can be entirely transparent to the end user, making refactoring simple .

Table of Contents