Benchmarking Basics

When comparing benchmarks, you need to make sure they differ with only one degree of freedom. This means that you should vary only one independent factor at a time in a test, leaving the rest of the data and algorithms as a control. Let's say, for example, that you are writing a class that reads in a document and calculates its Flesch readability score. If you simultaneously change the algorithms for counting words and counting sentences, you will be unable to determine which algorithm change accounts for the performance difference.

You should also keep in mind that benchmarks are highly relative. If I compare array_walk() on my laptop versus a for loop on my development server, I will likely just prove that a for loop on a more powerful machine is faster than array_walk() on a less powerful machine. This is not a very useful statement. To make this into a benchmark that has relevance, I should run my tests on the same machine unless the goal is to have a laptop versus server shootout, in which case I should fix the functions I am comparing.

Standardized initial data is also extremely important. Many functions (regular expressions being a prime example) exhibit extremely different performance characteristics as the size and disposition of their operands change. To make a fair comparison, you need to use similar data sets for all the functions you want to compare. If you are using statically specified data for the test, it should be reused between functions. If you are using random data, you should use statistically equivalent data.

Table of Contents