Приглашаем посетить
Чарушин (charushin.lit-info.ru)

Spotting General Inefficiencies

Previous
Table of Contents
Next

Spotting General Inefficiencies

Profilers excel at spotting general inefficiencies. An example might include using a moderately expensive user function repeatedly when a built-in function might do or frequently using a function in a loop where a single built-in function would do the job. Unlike the analysis done earlier in this chapter, using the inclusive timings, mild but widespread issues are often better diagnosed by using exclusive time ordering.

My favorite example of this sort of "obvious" yet largely undetectable inefficiency occurred during the birth of APD. At the company where I was working, there were some functions to handle making binary data (specifically, encrypted user data) 8-bit safe so that they could be set into cookies. On every request to a page that required member credentials, the users' cookie would be decrypted and used for both authentication and as a basic cache of their personal data. User sessions were to be timed out, so the cookie contained a timestamp that was reset on every request and used to ensure that the session was still valid.

This code had been in use for three years and was authored in the days of PHP3, when non-binary-safe data (for example, data containing nulls) was not correctly handled in the PHP cookie handling codeand before rawurlencode() was binary safe. The functions looked something like this:

function hexencode($data) {
  $ascii = unpack("C*", $data);
  $retval = '';
  foreach ($ascii as $v) {
    $retval .= sprintf("%02x", $v);
  }
  return $retval;
}
function hexdecode($data) {
  $len = strlen($data);
  $retval = '';
  for($i=0; $i < $len; $i+= 2) {
    $retval .= pack("C", hexdec(
        substr($data, $i, 2)
      )
    );
  }
  return $retval;
}

On encoding, a string of binary data was broken down into its component characters with unpack(). The component characters were then converted to their hexadecimal values and reassembled. Decoding affected the reverse. On the surface, these functions are pretty efficientor at least as efficient as they can be when written in PHP.

When I was testing APD, I discovered to my dismay that these two functions consumed almost 30% of the execution time of every page on the site. The problem was that the user cookies were not smallthey were about 1KB on averageand looping through an array of that size, appending to a string, is extremely slow in PHP. Because the functions were relatively optimal from a PHP perspective, we had a couple choices:

  • Fix the cookie encoding inside PHP itself to be binary safe.

  • Use a built-in function that achieves a result similar to what we were looking for (for example, base64_encode()).

We ended up choosing the former option, and current releases of PHP have binary-safe cookie handling. However, the second option would have been just as good.

A simple fix resulted in a significant speedup. This was not a single script speedup, but a capacity increase of 30% across the board. As with all technical problems that have simple answers, the question from on top was "How did this happen?" The answer is multifaceted but simple, and the reason all high-traffic scripts should be profiled regularly:

  • The data had changed When the code had been written (years before), user cookies had been much smaller (less than 100 bytes), and so the overhead was much lower.

  • It didn't actually break anything A 30% slowdown since inception is inherently hard to track. The difference between 100ms and 130ms is impossible to spot with the human eye. When machines are running below capacity (as is common in many projects), these cumulative slowdowns do not affect traffic levels.

  • It looked efficient The encoding functions are efficient, for code written in PHP. With more than 2,000 internal functions in PHP's standard library, it is not hard to imagine failing to find base64_encode() when you are looking for a built-in hex-encoding function.

  • The code base was huge With nearly a million lines of PHP code, the application code base was so large that a manual inspection of all the code was impossible. Worse still, with PHP lacking a hexencode() internal function, you need to have specific information about the context in which the userspace function is being used to suggest that base64_encode() will provide equivalent functionality.

Without a profiler, this issue would never have been caught. The code was too old and buried too deep to ever be found otherwise.

Note

There is an additional inefficiency in this cookie strategy. Resetting the user's cookie on every access could guarantee that a user session was expired after exactly 15 minutes, but it required the cookie to be re-encrypted and reset on every access. By changing the time expiration time window to a fuzzy onebetween 15 and 20 minutes for expirationyou can change the cookie setting strategy so that it is reset only if it is already more than 5 minutes old. This will buy you a significant speedup as well.



Previous
Table of Contents
Next