Приглашаем посетить
Романтизм (19v-euro-lit.niv.ru)

Managing Packaging

Previous
Table of Contents
Next

Managing Packaging

Now that you have used change control systems to master your development cycle, you need to be able to distribute your production code. This book is not focused on producing commercially distributed code, so when I say that code needs to be distributed, I'm talking about the production code being moved from your development environment to the live servers that are actually serving the code.

Packaging is an essential step in ensuring that what is live in production is what is supposed to be live in production. I have seen many people opt to manually push changed files out to their Web servers on an individual basis. That is a recipe for failure.

These are just two of the things that can go wrong:

  • It is very easy to lose track of what files you need to copy for a product launch. Debugging a missing include is usually easy, but debugging a non-updated include can be devilishly hard.

  • In a multiserver environment, things get more complicated. There the list expands. For example, if a single server is down, how do you ensure that it will receive all the incremental changes it needs when it is time to back up? Even if all your machines stay up 100% of the time, human error makes it extremely easy to have subtle inconsistencies between machines.

Packaging is important not only for your PHP code but for the versions of all the support software you use as well. At a previous job I ran a large (around 100) machine PHP server cluster that served a number of applications. Between PHP 4.0.2 and 4.0.3, there was a slight change in the semantics of pack(). This broke some core authentication routines on the site that caused some significant and embarrassing downtime. Bugs happen, but a sitewide show-stopper like this should have been detected and addressed before it ever hit production. The following factors made this difficult to diagnose:

  • Nobody read the 4.0.3 change log, so at first PHP itself was not even considered as a possible alternative.

  • PHP versions across the cluster were inconsistent. Some were running 4.0.1, others 4.0.2, still others 4.0.3. We did not have centralized logging running at that point, so it was extremely difficult to associate the errors with a specific machine. They appeared to be completely sporadic.

Like many problems, though, the factors that led to this one were really just symptoms of larger systemic problems. These were the real issues:

  • We had no system for ensuring that Apache, PHP, and all supporting libraries were identical on all the production machines. As machines became repurposed, or as different administrators installed software on them, each developed its own personality. Production machines should not have personalities.

  • Although we had separate trees for development and production code, we did not have a staging environment where we could make sure that the code we were about to run live would work on the production systems. Of course, without a solid system for making sure your systems are all identical, a staging environment is only marginally useful.

  • Not tracking PHP upgrades in the same system as code changes made it difficult to correlate a break to a PHP upgrade. We wasted hours trying to track the problem to a code change. If the fact that PHP had just been upgraded on some of the machines the day before had been logged (preferably in the same change control system as our source code), the bug hunt would have gone much faster.

Solving the pack() Problem

We also took the entirely wrong route in solving our problem with pack(). Instead of fixing our code so that it would be safe across all versions, we chose to undo the semantics change in pack() itself (in the PHP source code). At the time, that seemed like a good idea: It kept us from having to clutter our code with special cases and preserved backward compatibility.

In the end, we could not have made a worse choice. By "fixing" the PHP source code, we had doomed ourselves to backporting that change any time we needed to do an upgrade of PHP. If the patch was forgotten, the authentication errors would mysteriously reoccur.

Unless you have a group of people dedicated to maintaining core infrastructure technologies in your company, you should stay away from making semantics-breaking changes in PHP on your live site.


Packaging and Pushing Code

Pushing code from a staging environment to a production environment isn't hard. The most difficult part is versioning your releases, as you learned to do in the previous section by using CVS tags and branches. What's left is mainly finding an efficient means of physically moving your files from staging to production.

There is one nuance to moving PHP files. PHP parses every file it needs to execute on every request. This has a number of deleterious effects on performance (which you will learn more about in Chapter 9, "External Performance Tunings") and also makes it rather unsafe to change files in a running PHP instance. The problem is simple: If you have a file index.php that includes a library, such as the following:

# index.php
<?php
require_once "hello.inc";
hello();
?>

# hello.inc
<?php
function hello() {
  print "Hello World\n";
}
?>

and then you change both of these files as follows:

# index.php
<?php
require_once "hello.inc";
hello("George");
?>

# hello.inc
<?php
function hello($name) {
  print "Hello $name\n";
}
?>

if someone is requesting index.php just as the content push ensues, so that index.php is parsed before the push is complete and hello.inc is parsed after the push is complete, you will get an error because the prototypes will not match for a split second.

This is true in the best-case scenario where the pushed content is all updated instantaneously. If the push itself takes a few seconds or minutes to complete, a similar inconsistency can exist for that entire time period.

The best solution to this problem is to do the following:

  1. Make sure your push method is quick.

  2. Shut down your Web server during the period when the files are actually being updated.

The second step may seem drastic, but it is necessary if returning a page-in-error is never acceptable. If that is the case, you should probably be running a cluster of redundant machines and employ the no-downtime syncing methods detailed at the end of Chapter 15, "Building a Distributed Environment."

Note

Chapter 9 also describes compiler caches that prevent reparsing of PHP files. All the compiler caches have built-in facilities to determine whether files have changed and to reparse them. This means that they suffer from the inconsistent include problem as well.


There are a few choices for moving code between staging and production:

  • tar and ftp/scp

  • PEAR package format

  • cvs update

  • rsync

  • NFS

Using tar is a classic option, and it's simple as well. You can simply use tar to create an archive of your code, copy that file to the destination server, and unpack it. Using tar archives is a fine way to distribute software to remote sites (for example, if you are releasing or selling an application). There are two problems with using tar as the packaging tool in a Web environment, though:

  • It alters files in place, which means you may experience momentarily corrupted reads for files larger than a disk block.

  • It does not perform partial updates, so every push rewrites the entire code tree.

An interesting alternative to using tar for distributing applications is to use the PEAR package format. This does not address either of the problems with tar, but it does allow users to install and manage your package with the PEAR installer. The major benefit of using the PEAR package format is that it makes installation a snap (as you've seen in all the PEAR examples throughout this book). Details on using the PEAR installer are available at http://pear.php.net.

A tempting strategy for distributing code to Web servers is to have a CVS checkout on your production Web servers and use cvs update to update your checkout. This method addresses both of the problems with tar: It only transfers incremental changes, and it uses temporary files and atomic move operations to avoid the problem of updating files in place. The problem with using CVS to update production Web servers directly is that it requires the CVS metadata to be present on the destination system. You need to use Web server access controls to limit access to those files.

A better strategy is to use rsync. rsync is specifically designed to efficiently synchronize differences between directory trees, transfers only incremental changes, and uses temporary files to guarantee atomic file replacement. rsync also supports a robust limiting syntax, allowing you to add or remove classes of files from the data to be synchronized. This means that even if the source tree for the data is a CVS working directory, all the CVS metadata files can be omitted for the sync.

Another popular method for distributing files to multiple servers is to serve them over NFS. NFS is very convenient for guaranteeing that all servers instantaneously get copies of updated files. Under low to moderate traffic, this method stands up quite well, but under higher throughput it can suffer from the latency inherent in NFS. The problem is that, as discussed earlier, PHP parses every file it runs, every time it executes it. This means that it can do significant disk I/O when reading its source files. When these files are served over NFS, the latency and traffic will add up. Using a compiler cache can seriously minimize this problem.

A technique that I've used in the past to avoid overstressing NFS servers is to combine a couple of the methods we've just discussed. All my servers NFS-mount their code but do not directly access the NFS-mounted copy. Instead, each server uses rsync to copy the NFS-mounted files onto a local filesystem (preferably a memory-based filesystem such as Linux's tmpfs or ramfs). A magic semaphore file is updated only when content is to be synced, and the script that runs rsync uses the changing timestamp on that file to know it should actually synchronize the directory trees. This is used to keep rsync from constantly running, which would be stressful to the NFS server.

Packaging Binaries

If you run a multiserver installation, you should also package all the software needed to run your application. This is an often-overlooked facet of PHP application management, especially in environments that have evolved from a single-machine setup.

Allowing divergent machine setups may seem benign. Most of the time your applications will run fine. The problems arise only occasionally, but they are insidious. No one suspects that the occasional failure on a site is due to a differing kernel version or to an Apache module being compiled as a shared object on one system and being statically linked on anotherbut stranger things happen.

When packaging my system binaries, I almost always use the native packaging format for the operating system I am running on. You can use tar archives or a master server image that can be transferred to hosts with rsync, but neither method incorporates the ease of use and manageability of Red Hat's rpm or FreeBSD's pkg format. In this section I use the term RPM loosely to refer to a packaged piece of software. If you prefer a different format, you can perform a mental substitution; none of the discussions are particular to the RPM format itself.

I recommend not using monolithic packages. You should keep a separate package for PHP, for Apache, and for any other major application you use. I find that this provides a bit more flexibility when you're putting together a new server cluster.

The real value in using your system's packaging system is that it is easy to guarantee that you are running identical software on every machine. I've used tar() archives to distribute binaries before. They worked okay. The problem was that it was very easy to forget which exact tar ball I had installed. Worse still were the places where we installed everything from source on every machine. Despite intentional efforts to keep everything consistent, there were subtle differences across all the machines. In a large environment, that heterogeneity is unacceptable.

Packaging Apache

In general, the binaries in my Apache builds are standard across most machines I run. I like having Apache modules (including mod_php) be shared objects because I find the plug-and-play functionality that this provides extremely valuable. I also think that the performance penalty of running Apache modules as shared objects is completely exaggerated. I've never been able to reproduce any meaningful difference on production code.

Because I'm a bit of an Apache hacker, I often bundle some custom modules that are not distributed with Apache itself. These include things like mod_backhand, mod_log_spread, and some customized versions of other modules. I recommend two Web server RPMs. One contains the Web server itself (minus the configuration file), built with mod_so, and with all the standard modules built as shared objects. A second RPM contains all the custom modules I use that aren't distributed with the core of Apache. By separating these out, you can easily upgrade your Apache installation without having to track down and rebuild all your nonstandard modules, and vice versa. This is because the Apache Group does an excellent job of ensuring binary compatibility between versions. You usually do not need to rebuild your dynamically loadable modules when upgrading Apache.

With Apache built out in such a modular fashion, the configuration file is critical to make it perform the tasks that you want. Because the Apache server builds are generic and individual services are specific, you will want to package your configuration separately from your binaries. Because Apache is a critical part of my applications, I store my httpd.conf files in the same CVS repository as my application code and copy them into place. One rule of thumb for crafting sound Apache configurations is to use generic language in your configurations. A commonly overlooked feature of Apache configuration is that you can use locally resolvable hostnames instead of IP literals in your configuration file. This means that if every Web server needs to have the following configuration line:

Listen 10.0.0.N:8000

where N is different on every server, instead of hand editing the httpd.conf file of every server manually, you can use a consistent alias in the /etc/hosts file of every server to label such addresses. For example, you can set an externalether alias in every host via the following:

10.0.0.1 externalether

Then you can render your httpd.conf Listen line as follows:

Listen externalether:8000

Because machine IP addresses should change less frequently than their Web server configurations, using aliases allows you to keep every httpd.conf file in a cluster of servers identical. Identical is good.

Also, you should not include modules you don't need. Remember that you are crafting a configuration file for a particular service. If that service does not need mod_rewrite, do not load mod_rewrite.

Packaging PHP

The packaging rules for handling mod_php and any dependent libraries it has are similar to the Apache guidelines. Make a single master distribution that reflects the features and build requirements that every machine you run needs. Then bundle additional packages that provide custom or nonstandard functionality.

Remember that you can also load PHP extensions dynamically by building them shared and loading them with the following php.ini line:

extension = my_extension.so

An interesting (and oft-overlooked) configuration feature in PHP is config-dir support. If you build a PHP installation with the configure option --with-config-file-scan-dir, as shown here:

./configure [ options ] --with-config-file-scan-dir=/path/to/configdir

then at startup, after your main php.ini file is parsed, PHP will scan the specified directory and automatically load any files that end with the extension .ini (in alphabetical order). In practical terms, this means that if you have standard configurations that go with an extension, you can write a config file specifically for that extension and bundle it with the extension itself. This provides an extremely easy way of keeping extension configuration with its extension and not scattered throughout the environment.

Multiple ini Values

Keys can be repeated multiple times in a php.ini file, but the last seen key/value pair will be the one used.



Previous
Table of Contents
Next