Language-Level TuningsLanguage-level tunings are changes that you can make to PHP itself to enhance performance. PHP has a nice engine-level API (which is examined in depth in Chapter 21, "PHP and Zend Engine Internals" and Chapter 23, "Writing SAPIs and Extending the Zend Engine") that allows you to write extensions that directly affect how the engine processes and executes code. You can use this interface to speed the compilation and execution of PHP scripts. Compiler CachesIf you could choose only one server modification to make to improve the performance of a PHP application, installing a compiler cache would be the one you should choose. Installing a compiler cache can yield a huge benefit, and unlike many technologies that yield diminishing returns as the size of the application increases, a compiler cache actually yields increasing returns as the size and complexity increase. So what is a compiler cache? And how can it get such impressive performance gains? To answer these questions, we must take a quick peek into the way the Zend Engine executes PHP scripts. When PHP is called on to run a script, it executes a two-step process:
There are some important things to note about this process:
So you can see that you can reap great benefit from caching the generated intermediate code from step 1 for every script and include. This is what a compiler cache does. Figure 9.1 shows the work that is involved in executing a script without a compiler cache. Figure 9.2 shows the work with a compiler cache. Note that only on the first access to any script or include is there a cache miss. After that, the compilation step is avoided completely. Figure 9.1. Executing a script in PHP.
Figure 9.2. Script execution with a compiler cache.
These are the three major compiler caches for PHP:
Chapter 23, which looks at how to extend PHP and the Zend Engine, also looks in depth at the inner working of APC. The APC compiler cache is available through the PEAR Extension Code Library (PECL). You can install it by running this: #pear install apc To configure it for operation, you add the following line to your php.ini file: extension = /path/to/apc.so Besides doing that, you don't need to perform any additional configuration. When you next start PHP, APC will be active and will cache your scripts in shared memory. Remember that a compiler cache removes the parsing stage of script execution, so it is most effective when used on scripts that have a good amount of code. As a benchmark, I compared the example template page that comes with Smarty. On my desktop, I could get 26 requests per second out of a stock PHP configuration. With APC loaded, I could get 42 requests per second. This 61% improvement is significant, especially considering that it requires no application code changes. Compiler caches can have especially beneficial effects in environments with a large number of includes. When I worked at Community Connect (where APC was written), it was not unusual to have a script include (through recursive action) 30 or 40 files. This proliferation of include files was due to the highly modular design of the code base, which broke out similar functions into separate libraries. In this environment, APC provided over 100% in application performance. OptimizersLanguage optimizers work by taking the compiled intermediate code for a script and performing optimizations on it. Most languages have optimizing compilers that perform operations such as the following:
Optimizing compilers can perform many other operations as well. PHP does not have an internal code optimizer, but several add-ons can optimize code:
The main benefits of a code optimizer come when code is compiled and optimized once and then run many times. Thus, in PHP, the benefits of using an optimizer without using a compiler cache are very minimal. When used in conjunction with a compiler cache, an optimizer can deliver small but noticeable gains over the use of the compiler cache alone. HTTP AcceleratorsApplication performance is a complex issue. At first glance, these are the most common ways in which an application is performance bound::
The following chapters investigate how to tune applications to minimize the effects of these bottlenecks. Before we get to that, however, we need to examine another bottleneck that is often overlooked: the effects of network latency. When a client makes a request to your site, the data packets must physically cross the Internet from the client location to your server and back. Furthermore, there is an operating systemmandated limit to how much data can be sent over a TCP socket at a single time. If data exceeds this limit, the application blocks the data transfer or simply waits until the remote system confirms that the data has been received. Thus, in addition to the time that is spent actually processing a request, the Web server serving the request must also wait on the latency that is caused by slow network connections. Figure 9.3 shows the network-level effort involved in serving a single request, combined with times. While the network packets are being sent and received, the PHP application is completely idle. Note that Figure 9.3 shows 200ms of dead time in which the PHP server is dedicated to serving data but is waiting for a network transmission to complete. In many applications, the network lag time is much longer than the time spent actually executing scripts. Figure 9.3. Network transmission times in a typical request.
This might not seem like a bottleneck at all, but it can be. The problem is that even an idle Web server process consumes resources: memory, persistent database connections, and a slot in the process table. If you can eliminate network latency, you can reduce the amount of time PHP processes perform unimportant work and thus improve their efficiency. Reverse ProxiesUnfortunately, eliminating network latency across the Internet is not within our capabilities. (Oh, if only it were!) What we can do, however, is add an additional server that sits in between the end user and the PHP application. This server receives all the requests from the clients and then passes the complete request to the PHP application, waits for the entire response, and then sends the response back to the remote user. This intervening server is known as a reverse proxy or occasionally as an HTTP accelerator. This strategy relies on the following facts to work:
Figure 9.4 shows a typical reverse proxy setup. Note that the remote clients are on high-latency links, whereas the proxy server and Web server are on the same high-speed network. Also note that the proxy server is sustaining many more client connections than Web server connections. This is because the low-latency link between the Web server and the proxy server permits the Web server to "fire and forget" its content, not waste its time waiting on network lag. Figure 9.4. A typical reverse-proxy setup.
If you are running Apache, there are a number of excellent choices for reverse proxies, including the following:
With all these solutions, the proxy instance can be on a dedicated machine or simply run as a second server instance on the same machine. Let's look at setting up a reverse proxy server on the same machine by using mod_proxy. By far the easiest way to accomplish this is to build two copies of Apache, one with mod_proxy built in (installed in /opt/apache_proxy) and the other with PHP (installed in /opt/apache_php). We'll use a common trick to allow us to use the same Apache configuration across all machines: We will use the hostname externalether in our Apache configuration file. We will then map externalether to our public/external Ethernet interface in /etc/hosts. Similarly, we will use the hostname localhost in our Apache configuration file to correspond to the loopback address 127.0.0.1. Reproducing an entire Apache configuration here would take significant space. Instead, I've chosen to use just a small fragment of an httpd.conf file to illustrate the critical settings in a bit of context. A mod_proxy-based reverse proxy setup looks like the following: DocumentRoot /dev/null Listen externalether:80 MaxClients 256 KeepAlive Off AddModule mod_proxy.c ProxyRequests On ProxyPass / http://localhost/ ProxyPassReverse / http://localhost/ ProxyIOBufferSize 131072 <Directory proxy:*> Order Deny,Allow Deny from all </Directory> You should note the following about this configuration:
In contrast to the amount of configuration inside Apache, the PHP setup is very similar to the way it was before. The only change to its configuration is to add the following to its httpd.conf file: Listen localhost:80 This binds the PHP instance exclusively to the loopback address. Now if you want to access the Web server, you must contact it by going through the proxy server. Benchmarking the effect of these changes is difficult. Because these changes reduce the overhead mainly associated with handling clients over high-latency links, it is difficult to measure the effects on a local or high-speed network. In a real-world setting, I have seen a reverse-proxy setup cut the number of Apache children necessary to support a site from 100 to 20. Operating System Tuning for High PerformanceThere is a strong argument that if you do not want to perform local caching, then using a reverse proxy is overkill. A way to get a similar effect without running a separate server is to allow the operating system itself to buffer all the data. In the discussion of reverse proxies earlier in this chapter, you saw that a major component of the network wait time is the time spent blocking between data packets to the client. The application is forced to send multiple packets because the operating system has a limit on how much information it can buffer to send over a TCP socket at one time. Fortunately, this is a setting that you can tune. On FreeBSD, you can adjust the TCP buffers via the following: #sysctl -w net.inet.tcp.sendspace=131072 #sysctl -w net.inet.tcp.recvspace=8192 On Linux, you do this: #echo "131072" > /proc/sys/net/core/wmem_max When you make either of these changes, you set the outbound TCP buffer space to 128KB and the inbound buffer space to 8KB (because you receive small inbound requests and make large outbound responses). This assumes that the maximum page size you will be sending is 128KB. If your page sizes differ from that, you need to change the tunings accordingly. In addition, you might need to tune kern.ipc.nmbclusters to allocate sufficient memory for the new large buffers. (See your friendly neighborhood systems administrator for details.) After adjusting the operating system limits, you need to instruct Apache to use the large buffers you have provided. For this you just add the following directive to your httpd.conf file: SendBufferSize 131072 Finally, you can eliminate the network lag on connection close by installing the lingerd patch to Apache. When a network connection is finished, the sender sends the receiver a FIN packet to signify that the connection is complete. The sender must then wait for the receiver to acknowledge the receipt of this FIN packet before closing the socket to ensure that all data has in fact been transferred successfully. After the FIN packet is sent, Apache does not need to do anything with the socket except wait for the FIN-ACK packet and close the connection. The lingerd process improves the efficiency of this operation by handing the socket off to an exterior daemon (lingerd), which just sits around waiting for FIN-ACKs and closing sockets. For high-volume Web servers, lingerd can provide significant performance benefits, especially when coupled with increased write buffer sizes. lingerd is incredibly simple to compile. It is a patch to Apache (which allows Apache to hand off file descriptors for closing) and a daemon that performs those closes. lingerd is in use by a number of major sites, including Sourceforge.com, Slashdot.org, and LiveJournal.com. Proxy CachesEven better than having a low-latency connection to a content server is not having to make the request at all. HTTP takes this into account. HTTP caching exists at many levels:
Figure 9.5 shows a typical reverse proxy cache setup. When a user makes a request to www.example.foo, the DNS lookup actually points the user to the proxy server. If the requested entry exists in the proxy's cache and is not stale, the cached copy of the page is returned to the user, without the Web server ever being contacted at all; otherwise, the connection is proxied to the Web server as in the reverse proxy situation discussed earlier in this chapter. Figure 9.5. A request through a reverse proxy.
Many of the reverse proxy solutions, including Squid, mod_proxy, and mod_accel, support integrated caching. Using a cache that is integrated into the reverse proxy server is an easy way of extracting extra value from the proxy setup. Having a local cache guarantees that all cacheable content will be aggressively cached, reducing the workload on the back-end PHP servers. |