Building a Benchmarking HarnessBecause you plan on benchmarking a lot of code, you should build a benchmarking harness to help automate the testing process. Having a benchmarking infrastructure not only helps to standardize benchmarks, it also makes it easy to incorporate benchmarks into a unit testing framework so that you can test the performance effects of library changes and PHP version changes. The following are some of the features required in a usable harness:
PEAR's Benchmarking SuitePEAR has a built-in benchmarking suite, Benchmark_Iterate, that satisfies almost all the needs described in the preceding section. Benchmark_Iterate is suitable for many simple benchmarking tasks. Benchmark_Iterate works by running a function in a tight loop, recording execution times around each execution, and providing accessors for getting summary information on the results. To start, you need to install the Benchmark libraries. Prior to PHP 4.3, the Benchmark class suite was packaged with PHP. After version 4.3, you need to either download the classes from http://pear.php.net or use the PEAR installer for a one-step installation: # pear install Benchmark To benchmark the performance of the function foo() over 1,000 iterations, you create a Benchmark_Iterate object, invoke the run method that specifies 1,000 iterations, and report the average runtime: require 'Benchmark/Iterate.php'; $benchmark = new Benchmark_Iterate; $benchmark->run(1000, foo); $result = $benchmark->get(); print "Mean execution time for foo: $result[mean]\n"; A simple example of this is to use the suite to compare the speed of the built-in function max() with the PHP userspace implementation my_max(). This is a simple example of how iterating over arrays with built-in functions can be significantly faster than using a userspace implementation. The my_max() function will work identically to the built-in max() function, performing a linear search over its input array and keeping track of the largest element it has seen to date: Function my_max(&$array) { $max = $array[0]; Foreach ($array as $el) { If($element > $max) { $max = $element; } } return $max; } For testing array functions, it is nice to have random test data. You can write a convenience function for generating such arrays and add it to the include test_data.inc so that you can reuse it later down the road: Function random_array($size) { For($I=0; $I<$size; $I++) { $array[] = mt_rand(); } return $array; } Now that the basics are done, it is simple to use Benchmark_Iterate to put together a quick comparison on a number of different array sizes: <? require "test_data.inc"; require "Benchmark/Iterate.php"; $benchmark = new Benchmark_Iterate; print " size my_max max my_max/max\n"; foreach (array(10, 100, 1000) as $size) { // Generate a test array. Benchmark_Iterate does not // support generating random data for each iteration, // so we need to be careful to use the same $test_array // for testing both functions. $test_array = random_array($size); foreach (array('my_max', 'max') as $func ) { $benchmark->run(1000, $func, $test_array); $result = $benchmark->get(); $summary[$func][$size] = $result['mean']; } printf("%5d %6.6f%6.6f %3.2f\n", $size, $summary['my_max'][$size], $summary['max'][$size], $summary['my_max'][$size]/$summary['max'][$size]); } ?> On my laptop this yields the following: size my_max max my_max/max 10 0.000303 0.000053 5.74 100 0.001604 0.000072 22.43 1000 0.015813 0.000436 36.28 This example is clearly contrived. (You would never implement your own max() function, if for no reason other than laziness.) However, it illustrates a few important ideas. Built-in functions, when properly used, will always be faster than userspace functions. This is because an interpreted language (such as PHP) works basically by converting user code into a set of instructions and then executing them on its own virtual machine. Stepping through code in the PHP executor will always have significant overhead compared to stepping through instructions in a compiled language such as C. Benchmark_Iterate does not allow for data randomization on each iteration in the benchmark. Although this does not affect this benchmark in particular, it easily could. Imagine that you decided to test another max candidate, sort_max, which works by using the built-in asort() function to sort the test array and then just pops off the first element: function sort_max($array) { return array_pop(asort($array)); } Many sorting algorithms (including quicksort, which is the sorting algorithm used internally in all the PHP sorting functions) exhibit very different best-case and worst-case times. An unlucky "random" data choice can generate misleading results. One solution to this problem is to run benchmarks multiple times to eliminate edge cases. Of course, a robust benchmarking suite should handle that for you. Benchmark_Iterate is slow. Very slow. This is because Benchmark_Iterate does much more work than is strictly necessary. The main loop of the run() method looks like this: for ($i = 1; $i <= $iterations; $i++) { $this->setMarker('start_' . $i); call_user_func_array($function_name, $arguments); $this->setMarker('end_' . $i); } setMarker(), in this case, is a method inherited from Benchmark_Timer, which basically just calls microtime() (which is a front end for the system call gettimeofday()). Accessing the system clock is not a particularly cheap operation in any language. You recognize this overhead here, and it is unnecessary. Unless you are interested in calculating more complex statistical metrics than the mean runtime, you do not need to record the runtime for every individual iteration. Benchmark_Iterate returns wall clock timings. Sometimes you might like to collect more detailed information, such as augmenting the collected statistics with getrusage() statistics. Calling userspace functions and class methods is not cheap. For extremely quickly executing functions, or for testing a code block that is not contained in a function, the act of calling a userspace wrapper for the timing functions may introduce overhead that obscures the result. Building a Testing HarnessBecause this book is decidedly not about reinventing the wheel, I presume that you would like to address as many issues as possible without writing a harness by hand. Fortunately, Benchmark_Iterate has a clean object-oriented design that makes extending its functionality relatively quick and easy. First, you should look closer at the Benchmark_Timer and Benchmark_Iterate class diagram. Figure 19.1 is a stripped-down version of the UML diagram for Benchmark_Iterate and its parent classes. Attributes and methods not used by Benchmark_Iterate have been culled from the figure. Figure 19.1. A class diagram of Benchmark_Iterate that shows the major class methods you might want to override to build a custom testing harness.
As you can see in Figure 19.1, the main methods used in a benchmarking case are run() and get(). Under the hood, run() calls setMarker() immediately before and after every call to the function being benchmarked. setMarker() calls microtime to get the current time, with microsecond accuracy, and adds a marker to the markers array with that time. The get() method uses the timeElapsed() method to track the time changes between markers. get() returns an array consisting of the execution time for every iteration, plus two additional keys: iterations, which is the number of times the function was executed, and mean, which is the mean execution time across all the iterations. Adding Data Randomization on Every IterationRandom data is a good thing. When you are authoring a function, you can seldom be sure of exactly what data is going to be passed to it. Being able to test random data minimizes the chance of hitting performance edge cases. The problem with the stock benchmark classes, though, is that they require you to specify your inputs before you enter the execution loop. If you generate random data once and pass it to your function, you are not testing a range of data at all, but just a single (albeit random) case. This does little but create confusing and inconsistent initial conditions. What you would like is to be able to randomize the data on every iteration. This way you could really test a wide distribution of potential inputs. The ideal API would be if you could specify your own random data-generation function and have it called before each iteration. Here is an extension of Benchmark_Iterate that allows for randomized data: require 'Benchmark/Iterate.php';F class RandomBench extends Benchmark_Iterate { function run_random() { $arguments = func_get_args(); $iterations = array_shift($arguments); $function_name = array_shift($arguments); $argument_generator = array_shift($arguments); if (strstr($function_name, '::')) { $function_name = explode('::', $function_name); $objectmethod = $function_name[1]; } if (strstr($function_name, '->')) { $function_name = explode('->', $function_name); $objectname = $function_name[0]; global ${$objectname}; $objectmethod = $function_name[1]; for ($i = 1; $i <= $iterations; $i++) { $random_data = $argument_generator(); $this->setMarker('start_' . $i); call_user_method_array($function_name[1], ${$objectname}, $random_data); $this->setMarker('end_' . $i); } return(0); } for ($i = 1; $i <= $iterations; $i++) { $random_data = $argument_generator(); $this->setMarker('start_' . $i); call_user_func_array($function_name, $random_data); $this->setMarker('end_' . $i); } } } Removing Harness OverheadTo remove the overhead of the harness itself, you just need to measure the time it takes to benchmark nothing and deduct that from your averages. You can accomplish this by creating your own class that extends Benchmark_Iterate and replaces the run method with your own, which also calculates the overhead of doing a no-op (that is, no operation) between setting the start and stop timers. Here's how it would look: <? require_once 'Benchmark/Iterate.php'; class MyBench extends Benchmark_Iterate { public function run() { $arguments = func_get_args(); $iterations = array_shift($arguments); $function_name = array_shift($arguments); $arguments = array_shift($arguments); parent::run($iterations, $function_name, $arguments); $oh = new Benchmark_Iterate; for ($i = 1; $i <= $iterations; $i++) { $oh->setMarker('start_' . $i); $oh->setMarker('end_' . $i); } $oh_result = $oh->get(); $this->overhead = $oh_result['mean'] ; return(0); } public function get() { $result = parent::get(); $result['mean'] -= $this->overhead; $result['overhead'] = $this->overhead; return $result; } } ?> You can use your new class by simply changing all the Benchmark_Iterate references in the sample test script: require "test_data.inc"; require "MyBench.inc"; $benchmark = new MyBench; print " size my_max max my_max/max\n"; foreach (array(10, 100, 1000) as $size) { // Generate a test array. Benchmark_Iterate does not // support generating random data for each iteration, // so we need to be careful to use the same $test_array // for testing both functions. $test_array = random_array($size); foreach (array('my_max', 'max') as $func ) { $benchmark->run(1000, $func, $test_array); $result = $benchmark->get(); $summary[$func][$size] = $result['mean'] ; } printf("%5d %6.6f%6.6f %3.2f\n", $size, $summary['my_max'][$size], $summary['max'][$size], $summary['my_max'][$size]/$summary['max'][$size]); } Interestingly, by using this process, you see that in fact the harness overhead did bias the results for the test case: size my_max max my_max/max 10 0.000115 0.000007 16.41 100 0.001015 0.000031 33.27 1000 0.011421 0.000264 43.31 The benefit of using the built-in linear search over using a userspace search is even greater than you originally estimated, even for small arrays. Adding Custom Timer InformationSometimes you would like to know more about a function's resource usage than just wall-clock times. On systems that support the getrusage() call (most modern Unix systems and on Windows systems via cygwin), you can get detailed process accounting information via the getrusage() PHP function, which returns an associative array containing the values described in Table 19.1.
Different systems implement these timers differently. On BSD systems the full set of statistics is available, while in Linux 2.4 kernels only ru_stime, ru_utime, ru_minflt, and ru_majflt are available. This information is still enough to make the exercise worthwhile, though. When using the standard microtime() timers, the information you get is wall-clock time, so called because it is the actual total "real" amount of time spent executing a function. If a system were only executing a single task at a time, this measure would be fine; however, the problem is that it is almost certainly handling multiple tasks concurrently. Again, because your benchmarks are all relative anyway, as long as the total amount of free processor time is the same between benchmarks, your results should be useful with the microtime() timers; but if there are peaks or lulls in system activity, significant skew can be introduced into these results. The system and user time statistics in getrusage track the actual amount of time that the process spends executing kernel-level system calls and userspace calls (respectively). This gives you a much better idea of the "true" CPU resources used by the function. Of course 10ms of uninterrupted processor time is very different from two 5ms blocks of processor time, and the getrusage statistics do not compensate for the effects of processor cache or register reuse, which vary under system load and can have a very beneficial impact on performance. To incorporate these statistics into your benchmarking suite, you simply need to overload the setMarker() method (inherited from Benchmark_Timer), which handles statistics collection. You also need to overload the get method to handle organizing the statistics at the end of the run. Here's how you do this: require_once 'Benchmark/Iterate.php'; class RusageBench extends Benchmark_Iterate { public function setMarker($name) { $this->markers[$name] = getrusage(); $this->markers[$name]['ru_utime'] = sprintf("%6d.%06d",$this->markers[$name]['ru_utime.tv_sec'], $this->markers[$name]['ru_utime.tv_usec']); $this->markers[$name]['ru_stime'] = sprintf("%6d.%06d",$this->markers[$name]['ru_stime.tv_sec'], $this->markers[$name]['ru_stime.tv_usec']); } public function get() { $result = array(); $total = 0; $iterations = count($this->markers)/2; for ($i = 1; $i <= $iterations; $i++) { foreach( array_keys(getrusage()) as $key) { $temp[$key] = ($this->markers['end_'.$i][$key] - $this->markers['start_'.$i][$key]); $result['mean'][$key] += ($this->markers['end_'.$i][$key] - $this->markers['start_'.$i][$key]); } foreach ( array( 'ru_stime', 'ru_utime' ) as $key ) { $result['mean'][$key] += ($this->markers['end_'.$i][$key] - $this->markers['start_'.$i][$key]); } $result[$i] = $temp; } foreach( array_keys(getrusage()) as $key) { $result['mean'][$key] /= $iterations; } foreach ( array( 'ru_stime', 'ru_utime' ) as $key ) { $result['mean'][$key] /= $iterations; } $result['iterations'] = $iterations; return $result; } } Because all the additional resource information has been added, the API has been slightly broken because the format of the return value of the get() method has been changed. Instead of the mean array key containing the mean execution time of the function, it is now an associative array of average resource utilization values. You can put your new suite to use by looking at what happened with parse_url between PHP 4.2.3 and 4.3.0. parse_url is a built-in function that takes a URL and breaks it into its primitive components: service type, URI, query string, and so on. Prior to PHP 4.3.0 a number of bug reports said that the parse_url function's performance was abysmally poor. For perspective, you can roll back the clocks to PHP 4.2.3 and benchmark parse_url against a userspace reimplementation: require 'RusageBench.inc'; $fullurl = "http://george:george@www.example.com:8080/foo/bar.php?example=yes#here"; function preg_parse_url($url) { $regex = '!^(([^:/?#]+):)?(//(([^/:?#@]+):([^/:?#@]+)@)?([^/:?#]*)'. '(:(\d+))?)?([^?#]*)(\\?([^#]*))?(#(.*))?!'; preg_match($regex, $url, $matches); list(,,$url['scheme'],,$url['user'],$url['pass'],$url['host'], , $url['port'],$url['path'],,$url['query']) = $matches; return $url; } foreach(array('preg_parse_url', 'parse_url') as $func) { $b = new RusageBench; $b->run('1000', $func, $fullurl); $result = $b->get(); print "$func\t"; printf("System + User Time: %1.6f\n", $result[mean][ru_utime] + $result[mean][ru_stime]); } When I run this under PHP version 4.2.3, my laptop returns the following: PHP 4.2.3 preg_parse_url System + User Time: 0.000280 parse_url System + User Time: 0.002110 So much for built-in functions always being faster! The preg_match solution is a full order of magnitude faster than parse_url. What might be causing this problem? If you delve into the 4.2.3 source code for the parse_url function, you see that the function uses the system (POSIX-compatible) regular expression library and on every iteration uses the following: /* pseudo-C code */ regex_t re; /* locally scoped regular expression variable */ regmatch_t subs[11]; /* the equivalent of $matches in our userspace parser */ /* compile the pattern */ regcomp(&re, pattern, REG_EXTENDED); /* execute the regex on our input string and stick the matches in subs */ regexec(&re, string, stringlen, subs, 0) So on each iteration, you are recompiling your regular expression before executing it. In the userspace reimplementation you use preg_match, which is smart enough to cache the compiled regular expression in case it wants to use it later. In PHP 4.3.0, the parse_url function was fixed not by adding caching to the regular expression but by hand-coding a URL parser. Here is the same code as before, executed under PHP 4.3.0 PHP 4.3.0 preg_parse_url System + User Time: 0.000210 parse_url System + User Time: 0.000150 The built-in function is now faster, as well it should be. It is worth noting that the performance edge of the built-in function over your reimplementation is only about 30%. This goes to show that it is hard to beat the Perl-Compatible Regular Expression (PCRE) functions (the preg functions) for speed when you're parsing complex strings. Writing Inline BenchmarksTracking benchmark results over time is a good way to keep an eye on the general health of an application as a whole. To make tracking long-term data useful, you need to standardize your tests. You could do this by creating a separate test case, or you could take a cue from your unit testing experiences and include the benchmarks inline in the same file as the library they test. For include files, which are never executed directly, you can write a benchmark so that it is run if the file is run directly: // url.inc function preg_parse_url() { // ... } // add a check to see if we are being executed directly if( $_SERVER['PHP_SELF'] == _ _FILE_ _) { // if so, run our benchmark require 'RusageBench.inc; $testurl = "http://george:george@www.example.com:8080/foo/bar.php?example=yes#here"; $b = new RusageBench; $b->run(1000, 'preg_parse_url', $testurl); $result = $b->get(); printf("preg_parse_url(): %1.6f execs/sec\n", $result['mean']['ru_utime'] + $result['mean']['ru_stime'] ); } Now if you include url.inc, the benchmarking loop is bypassed and the code behaves normally. If you call the library directly, however, you get these benchmark results back: $ php /home/george/devel/Utils/Uri.inc preg_parse_url(): 0.000215 execs/sec |