Change ControlChange control software is a tool that allows you to track individual changes to project files and create versions of a project that are associated with specific versions of files. This ability is immensely helpful in the software development process because it allows you to easily track and revert individual changes. You do not need to remember why you made a specific change or what the code looked like before you made a change. By examining the differences between file versions or consulting the commit logs, you can see when a change was made, exactly what the differences were, and (assuming that you enforce a policy of verbose log messages) why the change was made. In addition, a good change control system allows multiple developers to safely work on copies of the same files simultaneously and supports automatic safe merging of their changes. A common problem when more than one person is accessing a file is having one person's changes accidentally overwritten by another's. Change control software aims to eliminate that risk. The current open source standard for change control systems is Concurrent Versioning System (CVS). CVS grew as an expansion of the capabilities of Revision Control System (RCS). RCS was written by Walter Tichy of Purdue University in 1985, itself an improvement on Source Code Control System (SCSS), authored at ATT Labs in 1975. RCS was written to allow multiple people to work on a single set of files via a complex locking system. CVS is built on top of RCS and allows for multi-ownership of files, automatic merging of contents, branching of source trees, and the ability for more than one user to have a writable copy of the source code at a single time. CVS BasicsThe first step in managing files with CVS is to import a project into a CVS repository. To create a local repository, you first make a directory where all the repository files will stay. You can call this path /var/cvs, although any path can do. Because this is a permanent repository for your project data, you should put the repository someplace that gets backed up on a regular schedule. First, you create the base directory, and then you use cvs init to create the base repository, like this: > mkdir /var/cvs > cvs -d /var/cvs init This creates the base administrative files needed by CVS in that directory.
To import all the examples for this book, you then use import from the top-level directory that contains your files: > cd Advanced_PHP > cvs -d /var/cvs import Advanced_PHP advanced_php start cvs import: Importing /var/cvs/books/Advanced_PHP/examples N books/Advanced_PHP/examples/chapter-10/1.php N books/Advanced_PHP/examples/chapter-10/10.php N books/Advanced_PHP/examples/chapter-10/11.php N books/Advanced_PHP/examples/chapter-10/12.php N books/Advanced_PHP/examples/chapter-10/13.php N books/Advanced_PHP/examples/chapter-10/14.php N books/Advanced_PHP/examples/chapter-10/15.php N books/Advanced_PHP/examples/chapter-10/2.php ... No conflicts created by this import This indicates that all the files are new imports (not files that were previously in the repository at that location) and that no problems were encountered. -d /var/cvs specifies the repository location you want to use. You can alternatively set the environment variable CVSROOT, but I like to be explicit about which repository I am using because different projects go into different repositories. Specifying the repository name on the command line helps me make sure I am using the right one. import is the command you are giving to CVS. The three items that follow (Advanced_PHP advanced_php start) are the location, the vendor tag, and the release tag. Setting the location to Advanced_PHP tells CVS that you want the files for this project stored under /var/cvs/Advanced_PHP. This name does not need to be the same as the current directory that your project was located in, but it should be both the name by which CVS will know the project and the base location where the files are located when you retrieve them from CVS. When you submit that command, your default editor will be launched, and you will be prompted to enter a message. Whenever you use CVS to modify the master repository, you will be prompted to enter a log message to explain your actions. Enforcing a policy of good, informative log messages is an easy way to ensure a permanent paper trail on why changes were made in a project. You can avoid having to enter the message interactively by adding -m "message" to your CVS lines. If you set up strict standards for messages, your commit messages can be used to automatically construct a change log or other project documentation. The vendor tag (advanced_php) and the release tag (start) specify special branches that your files will be tagged with. Branches allow for a project to have multiple lines of development. When files in one branch are modified, the effects are not propagated into the other branches. The vendor branch exists because you might be importing sources from a third party. When you initially import the project, the files are tagged into a vendor branch. You can always go back to this branch to find the original, unmodified code. Further, because it is a branch, you can actually commit changes to it, although this is seldom necessary in my experience. CVS requires a vendor tag and a release tag to be specified on import, so you need to specify them here. In most cases, you will never need to touch them again. Another branch that all projects have is HEAD. HEAD is always the main branch of development for a project. For now, all the examples will be working in the HEAD branch of the project. If a branch is not explicitly specified, HEAD is the branch in which all work takes place. The act of importing files does not actually check them out; you need to check out the files so that you are working on the CVS-managed copies. Because there is always a chance that an unexpected error occurred during import, I advise that you always move away from your current directory, check out the imported sources from CVS, and visually inspect to make sure you imported everything before removing your original repository. Here is the command sequence to check out the freshly imported project files: > mv Advanced_PHP Advanced_PHP.old > cvs -d /var/cvs checkout Advanced_PHP cvs checkout: Updating Advanced_PHP cvs checkout: Updating Advanced_PHP/examples U Advanced_PHP/examples/chapter-10/1.php U Advanced_PHP/examples/chapter-10/10.php U Advanced_PHP/examples/chapter-10/11.php U Advanced_PHP/examples/chapter-10/12.php U Advanced_PHP/examples/chapter-10/13.php U Advanced_PHP/examples/chapter-10/14.php U Advanced_PHP/examples/chapter-10/15.php ... # manually inspect your new Advanced_PHP > rm -rf Advanced_PHP.old Your new Advanced_PHP directory should look exactly like the old one, except that every directory will have a new CVS subdirectory. This subdirectory holds administrative files used by CVS, and the best plan is to simply ignore their presence. Modifying FilesYou have imported all your files into CVS, and you have made some changes to them. The modifications seem to be working as you wanted, so you would like to save your changes with CVS, which is largely a manual system. When you alter files in your working directory, no automatic interaction with the master repository happens. When you are sure that you are comfortable with your changes, you can tell CVS to commit them to the master repository by using cvs commit. After you do that, your changes will be permanent inside the repository. The following was the original version of examples/chapter-7/1.php: <?php echo "Hello $_GET['name']"; ?> You have changed it to take name from any request variable: <?php echo "Hello $_REQUEST['name']"; ?> To commit this change to CVS, you run the following: > cvs commit -m "use any method, not just GET" examples/chapter-7/1.php Checking in examples/chapter-7/1.php; /var/cvs/Advanced_PHP/examples/chapter-7/1.php,v <-- 1.php new revision: 1.2; previous revision: 1.1 done Note the -m syntax, which specifies the commit message on the command line. Also note that you do not specify the CVS repository location. When you are in your working directory, CVS knows what repository your files came from. If you are adding a new file or directory to a project, you need to take an additional step. Before you can commit the initial version, you need to add the file by using cvs add: > cvs add 2.php cvs add: scheduling file `2.php' for addition cvs add: use 'cvs commit' to add this file permanently As this message indicates, adding the file only informs the repository that the file will be coming; you need to then commit the file in order to have the new file fully saved in CVS. Examining Differences Between FilesA principal use of any change control software is to be able to find the differences between versions of files. CVS presents a number of options for how to do this. At the simplest level, you can determine the differences between your working copy and the checked-out version by using this: > cvs diff -u3 examples/chapter-7/1.php Index: examples/chapter-7/1.php =================================================================== RCS file: /var/cvs/books/Advanced_PHP/examples/chapter-7/1.php,v retrieving revision 1.2 diff -u -3 -r1.2 1.php --- 1.php 2003/08/26 15:40:47 1.2 +++ 1.php 2003/08/26 16:21:22 @@ -1,3 +1,4 @@ <?php echo "Hello $_REQUEST['name']"; +echo "\nHow are you?"; ?> The -u3 option specifies a unified diff with three lines of context. The diff itself shows that the version you are comparing against is revision 1.2 (CVS assigns revision numbers automatically) and that a single line was added. You can also create a diff against a specific revision or between two revisions. To see what the available revision numbers are, you can use cvs log on the file in question. This command shows all the commits for that file, with dates and commit log messages: > cvs log examples/chapter-7/1.php RCS file: /var/cvs/Advanced_PHP/examples/chapter-7/1.php,v Working file: examples/chapter-7/1.php head: 1.2 branch: locks: strict access list: symbolic names: keyword substitution: kv total revisions: 2; selected revisions: 2 description: ---------------------------- revision 1.2 date: 2003/08/26 15:40:47; author: george; state: Exp; lines: +1 -1 use any request variable, not just GET ---------------------------- revision 1.1 date: 2003/08/26 15:37:42; author: george; state: Exp; initial import ============================================================================= As you can see from this example, there are two revisions on file: 1.1 and 1.2. You can find the difference between 1.1 and 1.2 as follows: > cvs diff -u3 -r 1.1 -r 1.2 examples/chapter-7/1.php Index: examples/chapter-7/1.php =================================================================== RCS file: /var/cvs/books/Advanced_PHP/examples/chapter-7/1.php,v retrieving revision 1.1 retrieving revision 1.2 diff -u -3 -r1.1 -r1.2 --- 1.php 2003/08/26 15:37:42 1.1 +++ 1.php 2003/08/26 15:40:47 1.2 @@ -1,3 +1,3 @@ <?php -echo "Hello $_GET['name']"; +echo "Hello $_REQUEST['name']"; ?> Or you can create a diff of your current working copy against 1.1 by using the following syntax: > cvs diff -u3 -r 1.1 examples/chapter-7/1.php Index: examples/chapter-7/1.php =================================================================== RCS file: /var/cvs/books/Advanced_PHP/examples/chapter-7/1.php,v retrieving revision 1.1 diff -u -3 -r1.1 1.php --- 1.php 2003/08/26 15:37:42 1.1 +++ 1.php 2003/08/26 16:21:22 @@ -1,3 +1,4 @@ <?php -echo "Hello $_GET['name']"; +echo "Hello $_REQUEST['name']"; +echo "\nHow are you?"; ?> Another incredibly useful diff syntax allows you to create a diff against a date stamp or time period. I call this "the blame finder." Oftentimes when an error is introduced into a Web site, you do not know exactly when it happenedonly that the site definitely worked at a specific time. What you need to know in such a case is what changes had been made since that time period because one of those must be the culprit. CVS has the capability to support this need exactly. For example, if you know that you are looking for a change made in the past 20 minutes, you can use this: > cvs diff -u3 -D '20 minutes ago' examples/chapter-7/1.php Index: examples/chapter-7/1.php =================================================================== RCS file: /var/cvs/Advanced_PHP/examples/chapter-7/1.php,v retrieving revision 1.2 diff -u -3 -r1.2 1.php --- 1.php 2003/08/26 15:40:47 1.2 +++ 1.php 2003/08/26 16:21:22 @@ -1,3 +1,4 @@ <?php echo "Hello $_REQUEST['name']"; +echo "\nHow are you?"; ?> The CVS date parser is quite good, and you can specify both relative and absolute dates in a variety of formats. CVS also allows you to make recursive diffs of directories, either by specifying the directory or by omitting the diff file, in which case the current directory is recursed. This is useful if you want to look at differences on a number of files simultaneously. Note Time-based CVS diffs are the most important troubleshooting tools I have. Whenever a bug is reported on a site I work on, my first two questions are "When are you sure it last worked?" and "When was it first reported broken?" By isolating these two dates, it is often possible to use CVS to immediately track the problem to a single commit. Helping Multiple Developers Work on the Same ProjectOne of the major challenges related to allowing multiple people to actively modify the same file is merging their changes together so that one developer's work does not clobber another's. CVS provides the update functionality to allow this. You can use update in a couple different ways. The simplest is to try to guarantee that a file is up-to-date. If the version you have checked out is not the most recent in the repository, CVS will attempt to merge the differences. Here is the merge warning that is generated when you update 1.php:: > cvs update examples/chapter-7/1.php M examples/chapter-7/1.php In this example, M indicates that the revision in your working directory is current but that there are local, uncommitted modifications. If someone else had been working on the file and committed a change since you started, the message would look like this: > cvs update 1.php U 1.php In this example, U indicates that a more recent version than your working copy exists and that CVS has successfully merged those changes into your copy and updated its revision number to be current. CVS can sometimes make a mess, as well. If two developers are operating on exactly the same section of a file, you can get a conflict when CVS tries to merge them, as in this example: > cvs update examples/chapter-7/1.php RCS file: /var/cvs/Advanced_PHP/examples/chapter-7/1.php,v retrieving revision 1.2 retrieving revision 1.3 Merging differences between 1.2 and 1.3 into 1.php rcsmerge: warning: conflicts during merge cvs update: conflicts found in examples/chapter-7/1.php C examples/chapter-7/1.php You need to carefully look at the output of any CVS command. A C in the output of update indicates a conflict. In such a case, CVS tried to merge the files but was unsuccessful. This often leaves the local copy in an unstable state that needs to be manually rectified. After this type of update, the conflict causes the local file to look like this: <?php echo "Hello $_REQUEST['name']"; <<<<<<< 1.php echo "\nHow are you?"; ======= echo "Goodbye $_REQUEST['name']"; >>>>>>> 1.3 ?> Because the local copy has a change to a line that was also committed elsewhere, CVS requires you to merge the files manually. It has also made a mess of your file, and the file won't be syntactically valid until you fix the merge problems. If you want to recover the original copy you attempted to update, you can: CVS has saved it into the same directory as .#filename.revision. To prevent messes like these, it is often advisable to first run your update as follows: > cvs -nq update -n instructs CVS to not actually make any changes. This way, CVS inspects to see what work it needs to do, but it does not actually alter any files. Normally, CVS provides informational messages for every directory it checks. If you are looking to find the differences between a tree and the tip of a branch, these messages can often be annoying. -q instructs CVS to be quiet and not emit any informational messages. Like commit, update also works recursively. If you want CVS to be able to add any newly added directories to a tree, you need to add the -d flag to update. When you suspect that a directory may have been added to your tree (or if you are paranoid, on every update), run your update as follows: > cvs update -d Symbolic TagsUsing symbolic tags is a way to assign a single version to multiple files in a repository. Symbolic tags are extremely useful for versioning. When you push a version of a project to your production servers, or when you release a library to other users, it is convenient to be able to associate to that version specific versions of every file that application implements. Consider, for example, the Text_Statistics package implemented in Chapter 6. That package is managed with CVS in PEAR. These are the current versions of its files: > cvs status cvs server: Examining . =================================================================== File: Statistics.php Status: Up-to-date Working revision: 1.4 Repository revision: 1.4 /repository/pear/Text_Statistics/Text/Statistics.php,v Sticky Tag: (none) Sticky Date: (none) Sticky Options: (none) =================================================================== File: Word.php Status: Up-to-date Working revision: 1.3 Repository revision: 1.3 /repository/pear/Text_Statistics/Text/Word.php,v Sticky Tag: (none) Sticky Date: (none) Sticky Options: (none) Instead of having users simply use the latest version, it is much easier to version the package so that people know they are using a stable version. If you wanted to release version 1.1 of Text_Statistics, you would want a way of codifying that it consists of CVS revision 1.4 of Statistics.php and revision 1.3 of Word.php so that anyone could check out version 1.1 by name. Tagging allows you do exactly that. To tag the current versions of all files in your checkout with the symbolic tag RELEASE_1_1, you use the following command: > cvs tag RELEASE_1_1 You can also tag specific files. You can then retrieve a file's associated tag in one of two ways. To update your checked-out copy, you can update to the tag name exactly as you would to a specific revision number. For example, to return your checkout to version 1.0, you can run the following update: > cvs update -r RELEASE_1_0 Be aware that, as with updating to specific revision numbers for files, updating to a symbolic tag associates a sticky tag to that checked-out file. Sometimes you might not want your full repository, which includes all the CVS files for your project (for example, when you are preparing a release for distribution). CVS supports this behavior, with the export command. export creates a copy of all your files, minus any CVS metadata. Exporting is also ideal for preparing a copy for distribution to your production Web servers, where you do not want CVS metadata lying around for strangers to peruse. To export RELEASE_1_1, you can issue the following export command: > cvs -d cvs.php.net:/repository export -r RELEASE_1_1 \ -d Text_Statistics-1.1 pear/Text/Statistics This exports the tag RELEASE_1_1 of the CVS module pear/Text/Statistics (which is the location of Text_Statistics in PEAR) into the local directory Text_Statistics-1.1. BranchesCVS supports the concept of branching. When you branch a CVS tree, you effectively take a snapshot of the tree at a particular point in time. From that point, each branch can progress independently of the others. This is useful, for example, if you release versioned software. When you roll out version 1.0, you create a new branch for it. Then, if you need to perform any bug fixes for that version, you can perform them in that branch, without having to disincorporate any changes made in the development branch after version 1.0 was released. Branches have names that identify them. To create a branch, you use the cvs tag -b syntax. Here is the command to create the PROD branch of your repository: > cvs tag -b PROD Note though that branches are very different from symbolic tags. Whereas a symbolic tag simply marks a point in time across files in the repository, a branch actually creates a new copy of the project that acts like a new repository. Files can be added, removed, modified, tagged, and committed in one branch of a project without affecting any of the other branches. All CVS projects have a default branch called HEAD. This is the main trunk of the tree and cannot be removed. Because a branch behaves like a complete repository, you will most often create a completely new working directory to hold it. To check out the PROD branch of the Advanced_PHP repository, you use the following command: > cvs checkout -r PROD Advanced_PHP To signify that this is a specific branch of the project, it is often common to rename the top-level directory to reflect the branch name, as follows: > mv Advanced_PHP Advanced_PHP-PROD Alternatively, if you already have a checked-out copy of a project and want to update it to a particular branch, you can use update -r, as you did with symbolic tags, as follows: > cvs update -r Advanced_PHP There are times when you want to merge two branches. For example, say PROD is your live production code and HEAD is your development tree. You have discovered a critical bug in both branches and for expediency you fix it in the PROD branch. You then need to merge this change back into the main tree. To do this, you can use the following command, which merges all the changes from the specified branch into your working copy: > cvs update -j PROD When you execute a merge, CVS looks back in the revision tree to find the closest common ancestor of your working copy and the tip of the specified branch. A diff between the tip of the specified branch and that ancestor is calculated and applied to your working copy. As with any update, if conflicts arise, you should resolve them before completing the change. Maintaining Development and Production EnvironmentsThe CVS techniques developed so far should carry you through managing your own personal site, or anything where performing all development on the live site is acceptable. The problems with using a single tree for development and production should be pretty obvious:
To address these issues you need to build a development environment that allows developers to operate independently and coalesce their changes cleanly and safely. In the ideal case, I suggest the following setup:
Figure 7.1 shows one implementation of this setup, using two CVS branches, PROD for production-ready code and HEAD for development code. Although there are only two CVS branches in use, there are four tiers to this progression. Figure 7.1. A production and staging environment that uses two CVS branches.
At one end, developers implementing new code work on their own private checkout of the HEAD branch. Changes are not committed into HEAD until they are stable enough not to break the functionality of the HEAD branch. By giving every developer his or her own Web server (which is best done on the developers' local workstations), you allow them to test major functionality-breaking changes without jeopardizing anyone else's work. In a code base where everything is highly self-contained, this is likely not much of a worry, but in larger environments where there is a web of dependencies between user libraries, the ability to make changes without affecting others is very beneficial. When a developer is satisfied that his or her changes are complete, they are committed into the HEAD branch and evaluated on dev.example.com, which always runs HEAD. The development environment is where whole projects are evaluated and finalized. Here incompatibilities are rectified and code is made production ready. When a project is ready for release into production, its relevant parts are merged into the PROD branch, which is served by the stage.example.com Web server. In theory, it should then be ready for release. In reality, however, there is often fine-tuning and subtle problem resolution that needs to happen. This is the purpose of the staging environment. The staging environment is an exact-as-possible copy of the production environment. PHP versions, Web server and operating system configurationseverything should be identical to what is in the live systems. The idea behind staging content is to ensure that there are no surprises. Staged content should then be reviewed, verified to work correctly, and propagated to the live machines. The extent of testing varies greatly from organization to organization. Although it would be ideal if all projects would go through a complete quality assurance (QA) cycle and be verified against all the use cases that specified how the project should work, most environments have neither QA teams nor use cases for their projects. In general, more review is always better. At a minimum, I always try to get a nontechnical person who wasn't involved in the development cycle of a project to review it before I launch it live. Having an outside party check your work works well for identifying bugs that you miss because you know the application should not be used in a particular fashion. The inability of people to effectively critique their own work is hardly limited to programming: It is the same reason that books have editors. After testing on stage.example.com has been successful, all the code is pushed live to www.example.com. No changes are ever made to the live code directly; any emergency fixes are made on the staging server and backported into the HEAD branch, and the entire staged content is pushed live. Making incremental changes directly in production makes your code extremely hard to effectively manage and encourages changes to be made outside your change control system. You might have noticed a flaw in this system: Because the code in the live environment is a particular point-in-time snapshot of the PROD branch, it can be difficult to revert to a previous consistent version without knowing the exact time it was committed and pushed. These are two possible solutions to this problem:
The former option is very common in the realm of shrink-wrapped software, where version releases occur relatively infrequently and may need to have different changes applied to different versions of the code. In this scheme, whenever the stage environment is ready to go live, a new branch (for example, VERSION_1_0_0) is created based on that point-in-time image. That version can then evolve independently from the main staging branch PROD, allowing bug fixes to be implemented in differing ways in that version and in the main tree. I find this system largely unworkable for Web applications for a couple reasons:
The other solution is to use symbolic tags to mark releases. As discussed earlier in this chapter, in the section "Symbolic Tags," using a symbolic tag is really just a way to assign a single marker to a collection of files in CVS. It associates a name with the then-current version of all the specified files, which in a nonbranching tree is a perfect way to take a snapshot of the repository. Symbolic tags are relatively inexpensive in CVS, so there is no problem with having hundreds of them. For regular updates of Web sites, I usually name my tags by the date on which they are made, so in one of my projects, the tag might be PROD_2004_01_23_01, signifying Tag 1 on January 23, 2004. More meaningful names are also useful if you are associating them with particular events, such as a new product launch. Using symbolic tags works well if you do a production push once or twice a week. If your production environment requires more frequent code updates on a regular basis, you should consider doing the following:
Note One of the rules that I try to get clients to agree to is no production pushes after 3 p.m. and no pushes at all on Friday. Bugs will inevitably be present in code, and pushing code at the end of the day or before a weekend is an invitation to find a critical bug just as your developers have left the office. Daytime pushes mean that any unexpected errors can be tackled by a fresh set of developers who aren't watching the clock, trying to figure out if they are going to get dinner on time. |