Документация
HTML CSS PHP PERL другое
Language-Level Tunings
 
Previous
Table of Contents
Next

Language-Level Tunings

Language-level tunings are changes that you can make to PHP itself to enhance performance. PHP has a nice engine-level API (which is examined in depth in Chapter 21, "PHP and Zend Engine Internals" and Chapter 23, "Writing SAPIs and Extending the Zend Engine") that allows you to write extensions that directly affect how the engine processes and executes code. You can use this interface to speed the compilation and execution of PHP scripts.

Compiler Caches

If you could choose only one server modification to make to improve the performance of a PHP application, installing a compiler cache would be the one you should choose. Installing a compiler cache can yield a huge benefit, and unlike many technologies that yield diminishing returns as the size of the application increases, a compiler cache actually yields increasing returns as the size and complexity increase.

So what is a compiler cache? And how can it get such impressive performance gains? To answer these questions, we must take a quick peek into the way the Zend Engine executes PHP scripts. When PHP is called on to run a script, it executes a two-step process:

1.
PHP reads the file, parses it, and generates intermediate code that is executable on the Zend Engine virtual machine. Intermediate code is a computer science term that describes the internal representation of a script's source code after it has been compiled by the language.

2.
PHP executes the intermediate code.

There are some important things to note about this process:

  • For many scriptsespecially those with many includedit takes more time to parse the script and render it into an intermediate state than it does to execute the intermediate code.

  • Even though the results of step 1 are not fundamentally changed from execution to execution, the entire sequence is played through on every invocation of the script.

  • This sequence occurs not only when the main file is run, but also any time a script is run with require(), include(), or eval().

So you can see that you can reap great benefit from caching the generated intermediate code from step 1 for every script and include. This is what a compiler cache does.

Figure 9.1 shows the work that is involved in executing a script without a compiler cache. Figure 9.2 shows the work with a compiler cache. Note that only on the first access to any script or include is there a cache miss. After that, the compilation step is avoided completely.

Figure 9.1. Executing a script in PHP.


Figure 9.2. Script execution with a compiler cache.


These are the three major compiler caches for PHP:

  • The Zend Accelerator A commercial, closed-source, for-cost compiler cache produced by Zend Industries

  • The ionCube Accelerator A commercial, closed-source, but free compiler cache written by Nick Lindridge and distributed by his company, ionCube

  • APC A free and open-source compiler cache written by Daniel Cowgill and me

Chapter 23, which looks at how to extend PHP and the Zend Engine, also looks in depth at the inner working of APC.

The APC compiler cache is available through the PEAR Extension Code Library (PECL). You can install it by running this:

#pear install apc

To configure it for operation, you add the following line to your php.ini file:

extension = /path/to/apc.so

Besides doing that, you don't need to perform any additional configuration. When you next start PHP, APC will be active and will cache your scripts in shared memory.

Remember that a compiler cache removes the parsing stage of script execution, so it is most effective when used on scripts that have a good amount of code. As a benchmark, I compared the example template page that comes with Smarty. On my desktop, I could get 26 requests per second out of a stock PHP configuration. With APC loaded, I could get 42 requests per second. This 61% improvement is significant, especially considering that it requires no application code changes.

Compiler caches can have especially beneficial effects in environments with a large number of includes. When I worked at Community Connect (where APC was written), it was not unusual to have a script include (through recursive action) 30 or 40 files. This proliferation of include files was due to the highly modular design of the code base, which broke out similar functions into separate libraries. In this environment, APC provided over 100% in application performance.

Optimizers

Language optimizers work by taking the compiled intermediate code for a script and performing optimizations on it. Most languages have optimizing compilers that perform operations such as the following:

  • Dead code elimination This involves completely removing unreachable code sections such as if(0) { }.

  • Constant-folding If a group of constants is being operated on, you can perform the operation once at compile time. For example, this:

    $seconds_in_day = 24*60*60;
    

    can be internally rendered equivalent to the following faster form:

    $seconds_in_day = 86400;
    

    without having the user change any code.

  • Peephole optimizations These are local optimizations that can be made to improve code efficiency (for example, converting $count++ to ++$count when the return value is used in a void context). $count++ performs the increment after any expression involving $count is evaluated. For example, $i = $count++; will set $i to the value of $count before it is incremented. Internally, this means that the engine must store the value of $count to use in any expression involving it. In contrast, ++$count increments before any other evaluations so no temporary value needs to be stored (and thus it is cheaper). If $count++ is used in an expression where its value is not used (called a void context), it can be safely be converted to a pre-increment.

Optimizing compilers can perform many other operations as well.

PHP does not have an internal code optimizer, but several add-ons can optimize code:

  • The Zend Optimizer is a closed-source but freely available optimizer.

  • The ionCube accelerator contains an integrated optimizer.

  • There is a proof-of-concept optimizer in PEAR.

The main benefits of a code optimizer come when code is compiled and optimized once and then run many times. Thus, in PHP, the benefits of using an optimizer without using a compiler cache are very minimal. When used in conjunction with a compiler cache, an optimizer can deliver small but noticeable gains over the use of the compiler cache alone.

HTTP Accelerators

Application performance is a complex issue. At first glance, these are the most common ways in which an application is performance bound::

  • Database performance bound

  • CPU bound, for applications that perform intensive computations or manipulations

  • Disk bound, due to intensive input/output (I/O) operations

  • Network bound, for applications that must transfer large amounts of network data

The following chapters investigate how to tune applications to minimize the effects of these bottlenecks. Before we get to that, however, we need to examine another bottleneck that is often overlooked: the effects of network latency. When a client makes a request to your site, the data packets must physically cross the Internet from the client location to your server and back. Furthermore, there is an operating systemmandated limit to how much data can be sent over a TCP socket at a single time. If data exceeds this limit, the application blocks the data transfer or simply waits until the remote system confirms that the data has been received. Thus, in addition to the time that is spent actually processing a request, the Web server serving the request must also wait on the latency that is caused by slow network connections.

Figure 9.3 shows the network-level effort involved in serving a single request, combined with times. While the network packets are being sent and received, the PHP application is completely idle. Note that Figure 9.3 shows 200ms of dead time in which the PHP server is dedicated to serving data but is waiting for a network transmission to complete. In many applications, the network lag time is much longer than the time spent actually executing scripts.

Figure 9.3. Network transmission times in a typical request.


This might not seem like a bottleneck at all, but it can be. The problem is that even an idle Web server process consumes resources: memory, persistent database connections, and a slot in the process table. If you can eliminate network latency, you can reduce the amount of time PHP processes perform unimportant work and thus improve their efficiency.

Blocking Network Connections

Saying that an application has to block network connections is not entirely true. Network sockets can be created in such a way that instead of blocking, control is returned to the application. A number of high-performance Web servers such as thttpd and Tux utilize this methodology. That aside, I am aware of no PHP server APIs (SAPIs; applications that have PHP integrated into them), that allow for a single PHP instance to serve multiple requests simultaneously. Thus, even though the network connection may be nonblocking, these fast servers still require a dedicated PHP process to be dedicated for the entire life of every client request.


Reverse Proxies

Unfortunately, eliminating network latency across the Internet is not within our capabilities. (Oh, if only it were!) What we can do, however, is add an additional server that sits in between the end user and the PHP application. This server receives all the requests from the clients and then passes the complete request to the PHP application, waits for the entire response, and then sends the response back to the remote user. This intervening server is known as a reverse proxy or occasionally as an HTTP accelerator.

This strategy relies on the following facts to work:

  • The proxy server must be lightweight. On a per-client-request basis, the proxy consumes much fewer resources than a PHP application.

  • The proxy server and the PHP application must be on the same local network. Connections between the two thus have extremely low latency.

Figure 9.4 shows a typical reverse proxy setup. Note that the remote clients are on high-latency links, whereas the proxy server and Web server are on the same high-speed network. Also note that the proxy server is sustaining many more client connections than Web server connections. This is because the low-latency link between the Web server and the proxy server permits the Web server to "fire and forget" its content, not waste its time waiting on network lag.

Figure 9.4. A typical reverse-proxy setup.


If you are running Apache, there are a number of excellent choices for reverse proxies, including the following:

  • mod_proxy A "standard" module that ships with Apache

  • mod_accel A third-party module that is very similar to mod_proxy (large parts actually appear to be rewrites of mod_proxy) and adds features that are specific to reverse proxies

  • mod_backhand A third-party load-balancing module for Apache that implements reverse proxy functionality

  • Squid An external caching proxy daemon that performs high-performance forward and reverse proxying

With all these solutions, the proxy instance can be on a dedicated machine or simply run as a second server instance on the same machine. Let's look at setting up a reverse proxy server on the same machine by using mod_proxy. By far the easiest way to accomplish this is to build two copies of Apache, one with mod_proxy built in (installed in /opt/apache_proxy) and the other with PHP (installed in /opt/apache_php).

We'll use a common trick to allow us to use the same Apache configuration across all machines: We will use the hostname externalether in our Apache configuration file. We will then map externalether to our public/external Ethernet interface in /etc/hosts. Similarly, we will use the hostname localhost in our Apache configuration file to correspond to the loopback address 127.0.0.1.

Reproducing an entire Apache configuration here would take significant space. Instead, I've chosen to use just a small fragment of an httpd.conf file to illustrate the critical settings in a bit of context.

A mod_proxy-based reverse proxy setup looks like the following:

DocumentRoot /dev/null
Listen          externalether:80
MaxClients      256
KeepAlive       Off
AddModule mod_proxy.c
ProxyRequests On
ProxyPass        / http://localhost/
ProxyPassReverse / http://localhost/
ProxyIOBufferSize 131072
<Directory proxy:*>
    Order Deny,Allow
    Deny from all
</Directory>

You should note the following about this configuration:

  • DocumentRoot is set to /dev/null because this server has no content of its own.

  • You specifically bind to the external Ethernet address of the server (externalether). You need to bind to it explicitly because you will be running a purely PHP instance on the same machine. Without a Listen statement, the first server to start would bind to all available addresses, prohibiting the second instance from working.

  • Keepalives are off. High-traffic Web servers that use a pre-fork model (such as Apache), or to a lesser extent use threaded models (such as Zeus), generally see a performance degradation if keepalives are on.

  • ProxyRequests is on, which enables mod_proxy.

  • ProxyPass / http://localhost/ instructs mod_proxy to internally proxy any requests that start with / (that is, any request at all) to the server that is bound to the localhost IP address (that is, the PHP instance).

  • If the PHP instance issues to foo.php a location redirect that includes its server name, the client will get a redirect that looks like this:

    Location: http://localhost/foo.php
    

    This won't work for the end user, so ProxyPassReverse rewrites any Location redirects to point to itself.

  • ProxyIOBufferSize 131072 sets the size of the buffer that the reverse proxy uses to collect information handed back by PHP to 131072 bytes. To prevent time spent by the proxy blocking while talking to the browser to be passed back to the PHP instance, you need to set this at least as large as the largest page size served to a user. This allows the entire page to be transferred from PHP to the proxy before any data is transferred back to the browser. Then while the proxy is handling data transfer to the client browser, the PHP instance can continue doing productive work.

  • Finally, you disable all outbound proxy requests to the server. This prevents open proxy abuse.

Pre-Fork, Event-Based, and Threaded Process Architectures

The three main architectures used for Web servers are pre-fork, event-based, and threaded models.

In a pre-fork model, a pool of processes is maintained to handle new requests. When a new request comes in, it is dispatched to one of the child processes for handling. A child process usually serves more than one request before exiting. Apache 1.3 follows this model.

In an event-based model, a single process serves requests in a single thread, utilizing nonblocking or asynchronous I/O to handle multiple requests very quickly. This architecture works very well for handling static files but not terribly well for handling dynamic requests (because you still need a separate process or thread to the dynamic part of each request). thttpd, a small, fast Web server written by Jef Poskanzer, utilizes this model.

In a threaded model, a single process uses a pool of threads to service requests. This is very similar to a pre-fork model, except that because it is threaded, some resources can be shared between threads. The Zeus Web server utilizes this model. Even though PHP itself is thread-safe, it is difficult to impossible to guarantee that third-party libraries used in extension code are also thread-safe. This means that even in a threaded Web server, it is often necessary to not use a threaded PHP, but to use a forked process execution via the fastcgi or cgi implementations.

Apache 2 uses a drop-in process architecture that allows it to be configured as a pre-fork, threaded, or hybrid architecture, depending on your needs.


In contrast to the amount of configuration inside Apache, the PHP setup is very similar to the way it was before. The only change to its configuration is to add the following to its httpd.conf file:

Listen localhost:80

This binds the PHP instance exclusively to the loopback address. Now if you want to access the Web server, you must contact it by going through the proxy server.

Benchmarking the effect of these changes is difficult. Because these changes reduce the overhead mainly associated with handling clients over high-latency links, it is difficult to measure the effects on a local or high-speed network. In a real-world setting, I have seen a reverse-proxy setup cut the number of Apache children necessary to support a site from 100 to 20.

Operating System Tuning for High Performance

There is a strong argument that if you do not want to perform local caching, then using a reverse proxy is overkill. A way to get a similar effect without running a separate server is to allow the operating system itself to buffer all the data. In the discussion of reverse proxies earlier in this chapter, you saw that a major component of the network wait time is the time spent blocking between data packets to the client.

The application is forced to send multiple packets because the operating system has a limit on how much information it can buffer to send over a TCP socket at one time. Fortunately, this is a setting that you can tune.

On FreeBSD, you can adjust the TCP buffers via the following:

#sysctl -w net.inet.tcp.sendspace=131072
#sysctl -w net.inet.tcp.recvspace=8192

On Linux, you do this:

#echo "131072" > /proc/sys/net/core/wmem_max

When you make either of these changes, you set the outbound TCP buffer space to 128KB and the inbound buffer space to 8KB (because you receive small inbound requests and make large outbound responses). This assumes that the maximum page size you will be sending is 128KB. If your page sizes differ from that, you need to change the tunings accordingly. In addition, you might need to tune kern.ipc.nmbclusters to allocate sufficient memory for the new large buffers. (See your friendly neighborhood systems administrator for details.)

After adjusting the operating system limits, you need to instruct Apache to use the large buffers you have provided. For this you just add the following directive to your httpd.conf file:

SendBufferSize 131072

Finally, you can eliminate the network lag on connection close by installing the lingerd patch to Apache. When a network connection is finished, the sender sends the receiver a FIN packet to signify that the connection is complete. The sender must then wait for the receiver to acknowledge the receipt of this FIN packet before closing the socket to ensure that all data has in fact been transferred successfully. After the FIN packet is sent, Apache does not need to do anything with the socket except wait for the FIN-ACK packet and close the connection. The lingerd process improves the efficiency of this operation by handing the socket off to an exterior daemon (lingerd), which just sits around waiting for FIN-ACKs and closing sockets.

For high-volume Web servers, lingerd can provide significant performance benefits, especially when coupled with increased write buffer sizes. lingerd is incredibly simple to compile. It is a patch to Apache (which allows Apache to hand off file descriptors for closing) and a daemon that performs those closes. lingerd is in use by a number of major sites, including Sourceforge.com, Slashdot.org, and LiveJournal.com.

Proxy Caches

Even better than having a low-latency connection to a content server is not having to make the request at all. HTTP takes this into account.

HTTP caching exists at many levels:

  • Caches are built into reverse proxies

  • Proxy caches exist at the end user's ISP

  • Caches are built in to the user's Web browser

Figure 9.5 shows a typical reverse proxy cache setup. When a user makes a request to www.example.foo, the DNS lookup actually points the user to the proxy server. If the requested entry exists in the proxy's cache and is not stale, the cached copy of the page is returned to the user, without the Web server ever being contacted at all; otherwise, the connection is proxied to the Web server as in the reverse proxy situation discussed earlier in this chapter.

Figure 9.5. A request through a reverse proxy.


Many of the reverse proxy solutions, including Squid, mod_proxy, and mod_accel, support integrated caching. Using a cache that is integrated into the reverse proxy server is an easy way of extracting extra value from the proxy setup. Having a local cache guarantees that all cacheable content will be aggressively cached, reducing the workload on the back-end PHP servers.


Previous
Table of Contents
Next
Главная
Высоцкий
Литература
Goryunova
Блок
Bulgakov
Love
Билан
Мода
Мода