Приглашаем посетить
Сладков (sladkov.lit-info.ru)

Profiling a Larger Application

Previous
Table of Contents
Next

Profiling a Larger Application

Now that you understand the basics of using APD, let's employ it on a larger project. Serendipity is open-source Web log software written entirely in PHP. Although it is most commonly used for private individuals' Web logs, Serendipity was designed with large, multiuser environments in mind, and it supports an unlimited number of authors.

In this sense, Serendipity is an ideal starting point for a community-based Web site to offer Web logs to its users. As far as features go, Serendipity is ready for that sort of high-volume environment, but the code should first be audited to make sure it will be able to scale well. A profiler is perfect for this sort of analysis.

One of the great things about profiling tools is that they give you easy insight into any code base, even one you might be unfamiliar with. By identifying bottlenecks and pinpointing their locations in code, APD allows you to quickly focus your attention on trouble spots.

A good place to start is profiling the front page of the Web log. To do this, the index.php file is changed to a dump trace. Because the Web log is live, you do not generate a slew of trace files by profiling every page hit, so you can wrap the profile call to make sure it is called only if you manually pass PROFILE=1 on the URL line:

<?php
if($_GET['PROFILE'] == 1) {
  apd_set_pprof_trace();
}
/* ... regular serendipity code starts here ... */

Figure 18.3 shows the profile results for the Serendipity index page, sorted by inclusive real times (using -R). I prefer to start my profiling efforts with -R because it helps give me a good idea which macro-level functions in an application are slow. Because the inclusive timing includes all child calls as well, "top-level" functions tend to be prominent in the listing.

Figure 18.3. Initial profiling results for the Serendipity index page.

Profiling a Larger Application


The total time for this page was 0.1231 seconds, which isn't bad if you are running your own personal site, but it might be too slow if you are trying to implement Serendipity for a large user base or a high-traffic site. include_once() is the top-ranked time-consumer, which is not uncommon in larger applications where a significant portion of the logic is implemented in include files. Note, though, that include_once() not only dominates the inclusive listing, but it seems to dominate the exclusive listing as well. Figure 18.4 verifies this: Rerunning the profile with pprofp -r shows that include_once() takes 29.7% of the runtime, without counting any child function calls.

Figure 18.4. An exclusive call summary for the Serendipity index page.

Profiling a Larger Application


What you are seeing here is the cost of compiling all the Serendipity includes. Remember the discussion of compiler caches in Chapter 9, "External Performance Tunings," that one of the major costs associated with executing PHP scripts is the time spent parsing and compiling them into intermediate code. Because include files are all parsed and compiled at runtime, you can directly see this cost in the example shown in Figure 18.4. You can immediately optimize away this overhead by using a compiler cache. Figure 18.5 shows the effect of installing APC and rerunning the profiles. include_once() is still at the top of inclusive times (which is normal because it includes a large amount of the page logic), but its exclusive time has dropped completely out of the top five calls. Also, script execution time has almost been cut in half.

Figure 18.5. A Serendipity index profile running with an APC compiler cache.

Profiling a Larger Application


If you look at the calls that remain, you can see that these are the three biggest offenders:

  • serendipity_plugin_api::generate_plugins

  • serendipity_db_query

  • mysql_db_query

You might expect database queries to be slow. Database accesses are commonly the bottleneck in many applications. Spotting and tuning slow SQL queries is covered in Chapter 12, "Interacting with Databases," so this chapter does not go into detail about that. As predicted earlier, the high real-time cost of the database queries is matched with no user and system time costs because the time that is spent in these queries is exclusively spent on waiting for a response from the database server.

The generate_plugins() function is a different story. Serendipity allows custom user plug-ins for side navigation bar items and comes with a few bundled examples, including a calendar, referrer tracking, and archive search plug-ins. It seems unnecessary for this plug-in generation to be so expensive.

To investigate further, you can generate a complete call tree with this:

> pprofp -tcm /tmp/pprof.28986

Figure 18.6 shows a segment of the call tree that is focused on the beginning of the first call to serendipity_plugin_api::generate_plugins(). The first 20 lines or so show what seems to be normal lead-up work. A database query is run (via serendipity_db_query()), and some string formatting is performed. About midway down the page, in the serendipity_drawcalendar() function, the trace starts to look very suspicious. Calling mktime() and date() repeatedly seems strange. In fact, date() is called 217 times in this function. Looking back up to the exclusive trace in Figure 18.5, you can see that the date() function is called 240 times in total and accounts for 14.8% of the script's execution time, so this might be a good place to optimize.

Figure 18.6. A call tree for the Serendipity index page.

Profiling a Larger Application


Fortunately, the call tree tells you exactly where to look:

serendipity_functions.inc.php, lines 245261. Here is the offending code:

227 print ("<TR CLASS='serendipity_calendar'>");
228 for ($y=0; $y<7; $y++) {
229   // Be able to print borders nicely
230   $cellProp      = "";
231   if ($y==0) $cellProp = "FirstInRow";
232   if ($y==6) $cellProp = "LastInRow";
233   if ($x==4) $cellProp = "LastRow";
234   if ($x==4 && $y==6) $cellProp = "LastInLastRow";
235
236   // Start printing
237   if (($x>0 || $y>=$firstDayWeekDay) && $currDay<=$nrOfDays) {
238     if ($activeDays[$currDay] > 1) $cellProp.='Active';
239       print("<TD CLASS='serendipity_calendarDay$cellProp'>");

240
241      // Print day
242      if ($serendipity["rewrite"]==true)
243            $link = $serendipity["serendipityHTTPPath"]."archives/".
244                    date("Ymd", mktime(0,0,0, $month, $currDay, $year)).
245                    ".html";
246      else
247        $link = $serendipity["serendipityHTTPPath"];;
248      if (date("m") == $month &&
249        date("Y") == $year &&
250        date("j") == currDay) {
251        echo "<I>";
252      }
253      if ($activeDays[$currDay] > 1) {
254        print ("<AHREF='$link'>");
255      }
256      print ($currDay);
257      if ($activeDays[$currDay] > 1) print ("</A>");
258      if (date("m") == $month &&
259        date("Y") == $year &&
260        date("j") == $currDay) {
261        echo "</I>";
262      }
263      print("</TD>");
264      $currDay++;
265    }
266    else {
267      print "<TD CLASS='serendipity_calendarBlankDay$cellProp'>";
268      print "&nbsp;</TD>";
269    }
270  }
271  print ("</TR>");

This is a piece of the serendipity_drawcalendar() function, which draws the calendar in the navigation bar. Looking at line 244, you can see that the date() call is dependent on $month, $currDay, and $year. $currDay is incremented on every iteration through the loop, so you cannot cleanly avoid this call. You can, however, replace it:

date("Ymd", mktime(0,0,0, $month, $currDay, $year))

This line makes a date string from $month, $currDay, and $year. You can avoid the date() and mktime() functions by simply formatting the string yourself:

sprintf("%4d%02d%02d:, $year, $month, $currDay)

However, the date calls on lines 248, 249, 250, 258, 259, and 260 are not dependent on any variables, so you can pull their calculation to outside the loop. When you do this, the top of the loop should precalculate the three date() results needed:

227 $date_m = date("m");
228 $date_Y = date("Y");
229 $date_j = date("j");
230 print ("<TR CLASS='serendipity_calendar'>");
231 for ($y=0; $y<7; $y++) {
232 /* ... */

Then lines 248250 and 258261 should both become this:

if ($date_m == $month &&
    $date_Y == $year &&
    $date_j == $currDay) {

Implementing this simple change reduces the number of date() calls from 240 to 38, improves the speed of serendipity_plugin_api::generate_plugins() by more than 20%, and reduces the overall execution time of the index page by 10%. That's a significant gain for a nine-line change and 15 minutes' worth of work.

This particular example is easy to categorize as simply being a case of programmer error. Putting an invariant function inside a loop is a common mistake for beginners; dismissing it is a mistake, though, for a number of reasons:

  • Experienced programmers as well as beginners make these sorts of mistakes, especially in large loops where it is easy to forget where variables change.

  • In a team environment, it's extremely easy for simple inefficiencies like these to crop up. For example, a relatively simple task (such as writing a calendar) may be dispatched to a junior developer, and a casual audit of the work might fail to turn up this sort of error.

  • Inefficiencies like these are almost never revealed by intuition. If you approach the code base from afar, it is unlikely that you'll think that the calendar (largely an afterthought in the application design) is a bottleneck. Small features like these often contain subtle inefficiencies; 10% here, 15% therethey quickly add up to trouble in a performance-sensitive application.


Previous
Table of Contents
Next