Приглашаем посетить
Бианки (bianki.lit-info.ru)

Building a Benchmarking Harness

Previous
Table of Contents
Next

Building a Benchmarking Harness

Because you plan on benchmarking a lot of code, you should build a benchmarking harness to help automate the testing process. Having a benchmarking infrastructure not only helps to standardize benchmarks, it also makes it easy to incorporate benchmarks into a unit testing framework so that you can test the performance effects of library changes and PHP version changes.

The following are some of the features required in a usable harness:

  • Ease of use Obviously, if the suite is hard to use, you will not use it. In particular, the benchmarking suite should not require you to modify your code in order to test it.

  • Low or measurable overhead The benchmarking harness itself takes resources to run. You need the ability to either minimize this overhead or (better yet) measure it so that you can remove it from the measured results.

  • Good ability to select initial data A benchmark is only as good as the data you use to run it against. The ability to be able to specify arbitrary input data is crucial.

  • Extensibility It would be nice to be able to extend or modify the statistics that are gathered.

PEAR's Benchmarking Suite

PEAR has a built-in benchmarking suite, Benchmark_Iterate, that satisfies almost all the needs described in the preceding section. Benchmark_Iterate is suitable for many simple benchmarking tasks.

Benchmark_Iterate works by running a function in a tight loop, recording execution times around each execution, and providing accessors for getting summary information on the results.

To start, you need to install the Benchmark libraries. Prior to PHP 4.3, the Benchmark class suite was packaged with PHP. After version 4.3, you need to either download the classes from http://pear.php.net or use the PEAR installer for a one-step installation:

# pear install Benchmark

To benchmark the performance of the function foo() over 1,000 iterations, you create a Benchmark_Iterate object, invoke the run method that specifies 1,000 iterations, and report the average runtime:

require 'Benchmark/Iterate.php';
$benchmark = new Benchmark_Iterate;
$benchmark->run(1000, foo);
$result = $benchmark->get();
print "Mean execution time for foo: $result[mean]\n";

A simple example of this is to use the suite to compare the speed of the built-in function max() with the PHP userspace implementation my_max(). This is a simple example of how iterating over arrays with built-in functions can be significantly faster than using a userspace implementation.

The my_max() function will work identically to the built-in max() function, performing a linear search over its input array and keeping track of the largest element it has seen to date:

Function my_max(&$array) {
    $max = $array[0];
    Foreach ($array as $el) {
    If($element > $max) {
        $max = $element;
     }
    }
    return $max;
}

For testing array functions, it is nice to have random test data. You can write a convenience function for generating such arrays and add it to the include test_data.inc so that you can reuse it later down the road:

Function random_array($size) {
    For($I=0; $I<$size; $I++) {
        $array[] = mt_rand();
    }
    return $array;
}

Now that the basics are done, it is simple to use Benchmark_Iterate to put together a quick comparison on a number of different array sizes:

<?
require "test_data.inc";
require "Benchmark/Iterate.php";

$benchmark = new Benchmark_Iterate;
print " size        my_max          max   my_max/max\n";
foreach (array(10, 100, 1000) as $size) {
    // Generate a test array.  Benchmark_Iterate does not
    // support generating random data for each iteration,
    // so we need to be careful to use the same $test_array
    // for testing both functions.
    $test_array = random_array($size);
    foreach (array('my_max', 'max') as $func ) {
        $benchmark->run(1000, $func, $test_array);
        $result = $benchmark->get();
        $summary[$func][$size] = $result['mean'];
    }
    printf("%5d %6.6f%6.6f       %3.2f\n", $size,
           $summary['my_max'][$size],
           $summary['max'][$size],
           $summary['my_max'][$size]/$summary['max'][$size]);
}
?>

On my laptop this yields the following:

size       my_max          max   my_max/max
   10     0.000303     0.000053         5.74
  100     0.001604     0.000072        22.43
 1000     0.015813     0.000436        36.28

This example is clearly contrived. (You would never implement your own max() function, if for no reason other than laziness.) However, it illustrates a few important ideas.

Built-in functions, when properly used, will always be faster than userspace functions. This is because an interpreted language (such as PHP) works basically by converting user code into a set of instructions and then executing them on its own virtual machine. Stepping through code in the PHP executor will always have significant overhead compared to stepping through instructions in a compiled language such as C.

Benchmark_Iterate does not allow for data randomization on each iteration in the benchmark. Although this does not affect this benchmark in particular, it easily could. Imagine that you decided to test another max candidate, sort_max, which works by using the built-in asort() function to sort the test array and then just pops off the first element:

function sort_max($array) {
  return array_pop(asort($array));
}

Many sorting algorithms (including quicksort, which is the sorting algorithm used internally in all the PHP sorting functions) exhibit very different best-case and worst-case times. An unlucky "random" data choice can generate misleading results. One solution to this problem is to run benchmarks multiple times to eliminate edge cases. Of course, a robust benchmarking suite should handle that for you.

Benchmark_Iterate is slow. Very slow. This is because Benchmark_Iterate does much more work than is strictly necessary. The main loop of the run() method looks like this:

for ($i = 1; $i <= $iterations; $i++) {
  $this->setMarker('start_' . $i);
  call_user_func_array($function_name, $arguments);
  $this->setMarker('end_' . $i);
}

setMarker(), in this case, is a method inherited from Benchmark_Timer, which basically just calls microtime() (which is a front end for the system call gettimeofday()). Accessing the system clock is not a particularly cheap operation in any language. You recognize this overhead here, and it is unnecessary. Unless you are interested in calculating more complex statistical metrics than the mean runtime, you do not need to record the runtime for every individual iteration.

Benchmark_Iterate returns wall clock timings. Sometimes you might like to collect more detailed information, such as augmenting the collected statistics with getrusage() statistics.

Calling userspace functions and class methods is not cheap. For extremely quickly executing functions, or for testing a code block that is not contained in a function, the act of calling a userspace wrapper for the timing functions may introduce overhead that obscures the result.

Building a Testing Harness

Because this book is decidedly not about reinventing the wheel, I presume that you would like to address as many issues as possible without writing a harness by hand. Fortunately, Benchmark_Iterate has a clean object-oriented design that makes extending its functionality relatively quick and easy.

First, you should look closer at the Benchmark_Timer and Benchmark_Iterate class diagram. Figure 19.1 is a stripped-down version of the UML diagram for Benchmark_Iterate and its parent classes. Attributes and methods not used by Benchmark_Iterate have been culled from the figure.

Figure 19.1. A class diagram of Benchmark_Iterate that shows the major class methods you might want to override to build a custom testing harness.

Building a Benchmarking Harness


As you can see in Figure 19.1, the main methods used in a benchmarking case are run() and get(). Under the hood, run() calls setMarker() immediately before and after every call to the function being benchmarked. setMarker() calls microtime to get the current time, with microsecond accuracy, and adds a marker to the markers array with that time.

The get() method uses the timeElapsed() method to track the time changes between markers. get() returns an array consisting of the execution time for every iteration, plus two additional keys: iterations, which is the number of times the function was executed, and mean, which is the mean execution time across all the iterations.

Adding Data Randomization on Every Iteration

Random data is a good thing. When you are authoring a function, you can seldom be sure of exactly what data is going to be passed to it. Being able to test random data minimizes the chance of hitting performance edge cases. The problem with the stock benchmark classes, though, is that they require you to specify your inputs before you enter the execution loop. If you generate random data once and pass it to your function, you are not testing a range of data at all, but just a single (albeit random) case. This does little but create confusing and inconsistent initial conditions. What you would like is to be able to randomize the data on every iteration. This way you could really test a wide distribution of potential inputs.

The ideal API would be if you could specify your own random data-generation function and have it called before each iteration. Here is an extension of Benchmark_Iterate that allows for randomized data:

require 'Benchmark/Iterate.php';F

class RandomBench extends Benchmark_Iterate {
  function run_random() {
    $arguments     = func_get_args();
    $iterations    = array_shift($arguments);
    $function_name = array_shift($arguments);
    $argument_generator = array_shift($arguments);
    if (strstr($function_name, '::')) {
      $function_name = explode('::', $function_name);
      $objectmethod = $function_name[1];
    }
    if (strstr($function_name, '->')) {
      $function_name = explode('->', $function_name);
      $objectname = $function_name[0];
      global ${$objectname};
      $objectmethod = $function_name[1];
      for ($i = 1; $i <= $iterations; $i++) {
        $random_data = $argument_generator();
        $this->setMarker('start_' . $i);
        call_user_method_array($function_name[1], ${$objectname}, $random_data);
        $this->setMarker('end_' . $i);
      }
      return(0);
    }
    for ($i = 1; $i <= $iterations; $i++) {
      $random_data = $argument_generator();
      $this->setMarker('start_' . $i);
      call_user_func_array($function_name, $random_data);
      $this->setMarker('end_' . $i);
    }
  }
}

Removing Harness Overhead

To remove the overhead of the harness itself, you just need to measure the time it takes to benchmark nothing and deduct that from your averages. You can accomplish this by creating your own class that extends Benchmark_Iterate and replaces the run method with your own, which also calculates the overhead of doing a no-op (that is, no operation) between setting the start and stop timers. Here's how it would look:

<?
require_once 'Benchmark/Iterate.php';
class MyBench extends Benchmark_Iterate {
  public function run() {
    $arguments     = func_get_args();
    $iterations = array_shift($arguments);
    $function_name = array_shift($arguments);
    $arguments = array_shift($arguments);
    parent::run($iterations, $function_name, $arguments);
    $oh = new Benchmark_Iterate;
    for ($i = 1; $i <= $iterations; $i++) {
      $oh->setMarker('start_' . $i);
      $oh->setMarker('end_' . $i);
    }
    $oh_result = $oh->get();
    $this->overhead = $oh_result['mean'] ;
    return(0);
  }
  public function get() {
    $result = parent::get();
    $result['mean'] -= $this->overhead;
    $result['overhead'] = $this->overhead;
    return $result;
  }
}
?>

You can use your new class by simply changing all the Benchmark_Iterate references in the sample test script:

require "test_data.inc";
require "MyBench.inc";

$benchmark = new MyBench;
print " size        my_max          max   my_max/max\n";
foreach (array(10, 100, 1000) as $size) {
  // Generate a test array.  Benchmark_Iterate does not
  // support generating random data for each iteration,
  // so we need to be careful to use the same $test_array
  // for testing both functions.
  $test_array = random_array($size);
  foreach (array('my_max', 'max') as $func ) {
    $benchmark->run(1000, $func, $test_array);
    $result = $benchmark->get();
    $summary[$func][$size] = $result['mean'] ;
  }
  printf("%5d %6.6f%6.6f       %3.2f\n", $size,
         $summary['my_max'][$size], $summary['max'][$size],
         $summary['my_max'][$size]/$summary['max'][$size]);
}

Interestingly, by using this process, you see that in fact the harness overhead did bias the results for the test case:

size       my_max          max   my_max/max
   10     0.000115     0.000007        16.41
  100     0.001015     0.000031        33.27
 1000     0.011421     0.000264        43.31

The benefit of using the built-in linear search over using a userspace search is even greater than you originally estimated, even for small arrays.

Timing Fast Functions

If you are timing very fast functionsfor example, functions that perform only a few basic operationsthe overhead might appear to be greater than the function call time itself (that is, it may show a negative mean time). Increasing the iteration count should improve the statistics by minimizing the effect of outliers.


Adding Custom Timer Information

Sometimes you would like to know more about a function's resource usage than just wall-clock times. On systems that support the getrusage() call (most modern Unix systems and on Windows systems via cygwin), you can get detailed process accounting information via the getrusage() PHP function, which returns an associative array containing the values described in Table 19.1.

Table 19.1. getrusage() Resource Values

Key Value

Description

[ru_oublock]

The number of input block operations

[ru_inblock]

The number of output block operations

[ru_msgsnd]

The number of SYS V IPC messages sent

[ru_msgrcv]

The number of SYS V IPC messages received

[ru_maxrss]

The maximum resident memory size

[ru_ixrss]

The shared memory size

[ru_idrss]

The data size

[ru_minflt]

The number of (memory_page) reclamations

[ru_majflt]

The number of (memory) page faults

[ru_nsignals]

The number of signals received by the process

[ru_nvcsw]

The number of voluntary context switches

[ru_nivcsw]

The number of involuntary context switches

[ru_utime.tv_sec]

The number of seconds of user time used

[ru_utime.tv_usec]

The number of microseconds of user time used

[ru_stime.tv_sec]

The number of seconds of system time used

[ru_stime.tv_usec]

The number of microseconds of system time used


Different systems implement these timers differently. On BSD systems the full set of statistics is available, while in Linux 2.4 kernels only ru_stime, ru_utime, ru_minflt, and ru_majflt are available. This information is still enough to make the exercise worthwhile, though. When using the standard microtime() timers, the information you get is wall-clock time, so called because it is the actual total "real" amount of time spent executing a function. If a system were only executing a single task at a time, this measure would be fine; however, the problem is that it is almost certainly handling multiple tasks concurrently. Again, because your benchmarks are all relative anyway, as long as the total amount of free processor time is the same between benchmarks, your results should be useful with the microtime() timers; but if there are peaks or lulls in system activity, significant skew can be introduced into these results. The system and user time statistics in getrusage track the actual amount of time that the process spends executing kernel-level system calls and userspace calls (respectively). This gives you a much better idea of the "true" CPU resources used by the function. Of course 10ms of uninterrupted processor time is very different from two 5ms blocks of processor time, and the getrusage statistics do not compensate for the effects of processor cache or register reuse, which vary under system load and can have a very beneficial impact on performance.

To incorporate these statistics into your benchmarking suite, you simply need to overload the setMarker() method (inherited from Benchmark_Timer), which handles statistics collection. You also need to overload the get method to handle organizing the statistics at the end of the run. Here's how you do this:

require_once 'Benchmark/Iterate.php';

class RusageBench extends Benchmark_Iterate {
  public function setMarker($name) {
    $this->markers[$name] = getrusage();
    $this->markers[$name]['ru_utime'] =
        sprintf("%6d.%06d",$this->markers[$name]['ru_utime.tv_sec'],
                           $this->markers[$name]['ru_utime.tv_usec']);

    $this->markers[$name]['ru_stime'] =
        sprintf("%6d.%06d",$this->markers[$name]['ru_stime.tv_sec'],
                           $this->markers[$name]['ru_stime.tv_usec']);

  }
  public function get() {
    $result = array();
    $total  = 0;

    $iterations = count($this->markers)/2;

    for ($i = 1; $i <= $iterations; $i++) {
      foreach( array_keys(getrusage()) as $key) {
        $temp[$key] =
          ($this->markers['end_'.$i][$key] - $this->markers['start_'.$i][$key]);
           $result['mean'][$key] +=
             ($this->markers['end_'.$i][$key] - $this->markers['start_'.$i][$key]);
         }
         foreach ( array( 'ru_stime', 'ru_utime' ) as $key ) {
           $result['mean'][$key] += ($this->markers['end_'.$i][$key] -
$this->markers['start_'.$i][$key]);
      }
      $result[$i] = $temp;
    }
    foreach( array_keys(getrusage()) as $key) {
      $result['mean'][$key] /= $iterations;
    }
    foreach ( array( 'ru_stime', 'ru_utime' ) as $key ) {
      $result['mean'][$key] /= $iterations;
    }
    $result['iterations'] = $iterations;

    return $result;
  }
}

Because all the additional resource information has been added, the API has been slightly broken because the format of the return value of the get() method has been changed. Instead of the mean array key containing the mean execution time of the function, it is now an associative array of average resource utilization values.

You can put your new suite to use by looking at what happened with parse_url between PHP 4.2.3 and 4.3.0. parse_url is a built-in function that takes a URL and breaks it into its primitive components: service type, URI, query string, and so on. Prior to PHP 4.3.0 a number of bug reports said that the parse_url function's performance was abysmally poor. For perspective, you can roll back the clocks to PHP 4.2.3 and benchmark parse_url against a userspace reimplementation:

require 'RusageBench.inc';

$fullurl =
  "http://george:george@www.example.com:8080/foo/bar.php?example=yes#here";

function preg_parse_url($url) {
  $regex = '!^(([^:/?#]+):)?(//(([^/:?#@]+):([^/:?#@]+)@)?([^/:?#]*)'.
    '(:(\d+))?)?([^?#]*)(\\?([^#]*))?(#(.*))?!';
  preg_match($regex, $url, $matches);
  list(,,$url['scheme'],,$url['user'],$url['pass'],$url['host'], ,
      	$url['port'],$url['path'],,$url['query']) = $matches;
  return $url;
}
foreach(array('preg_parse_url', 'parse_url') as $func) {
  $b = new RusageBench;
  $b->run('1000', $func, $fullurl);
  $result = $b->get();
  print "$func\t";
  printf("System + User Time: %1.6f\n",
         $result[mean][ru_utime] + $result[mean][ru_stime]);
}

When I run this under PHP version 4.2.3, my laptop returns the following:

PHP 4.2.3
preg_parse_url  System + User Time: 0.000280
parse_url       System + User Time: 0.002110

So much for built-in functions always being faster! The preg_match solution is a full order of magnitude faster than parse_url. What might be causing this problem? If you delve into the 4.2.3 source code for the parse_url function, you see that the function uses the system (POSIX-compatible) regular expression library and on every iteration uses the following:

/* pseudo-C code */
regex_t re;  /* locally scoped regular expression variable */
regmatch_t subs[11];  /* the equivalent of $matches in our userspace parser */
/* compile the pattern */
regcomp(&re,
pattern, REG_EXTENDED);
/* execute the regex on our input string and stick the matches in subs */
regexec(&re, string, stringlen, subs, 0)

So on each iteration, you are recompiling your regular expression before executing it. In the userspace reimplementation you use preg_match, which is smart enough to cache the compiled regular expression in case it wants to use it later.

In PHP 4.3.0, the parse_url function was fixed not by adding caching to the regular expression but by hand-coding a URL parser. Here is the same code as before, executed under PHP 4.3.0

PHP 4.3.0
preg_parse_url  System + User Time: 0.000210
parse_url       System + User Time: 0.000150

The built-in function is now faster, as well it should be. It is worth noting that the performance edge of the built-in function over your reimplementation is only about 30%. This goes to show that it is hard to beat the Perl-Compatible Regular Expression (PCRE) functions (the preg functions) for speed when you're parsing complex strings.

Writing Inline Benchmarks

Tracking benchmark results over time is a good way to keep an eye on the general health of an application as a whole. To make tracking long-term data useful, you need to standardize your tests. You could do this by creating a separate test case, or you could take a cue from your unit testing experiences and include the benchmarks inline in the same file as the library they test.

For include files, which are never executed directly, you can write a benchmark so that it is run if the file is run directly:

// url.inc
function preg_parse_url() {
    // ...
}
// add a check to see if we are being executed directly
if( $_SERVER['PHP_SELF'] == _ _FILE_ _) {
  // if so, run our benchmark
  require 'RusageBench.inc;
  $testurl =
    "http://george:george@www.example.com:8080/foo/bar.php?example=yes#here";
  $b = new RusageBench;
  $b->run(1000, 'preg_parse_url',
  $testurl);
  $result = $b->get();
  printf("preg_parse_url(): %1.6f execs/sec\n",
         $result['mean']['ru_utime'] + $result['mean']['ru_stime'] );
}

Now if you include url.inc, the benchmarking loop is bypassed and the code behaves normally. If you call the library directly, however, you get these benchmark results back:

$ php /home/george/devel/Utils/Uri.inc
preg_parse_url(): 0.000215 execs/sec


Previous
Table of Contents
Next