Приглашаем посетить
Лермонтов (lermontov-lit.ru)

Hack 60. Make Usable URLs with mod_rewrite

Previous
Table of Contents
Next

Hack 60. Make Usable URLs with mod_rewrite

Hack 60. Make Usable URLs with mod_rewrite Hack 60. Make Usable URLs with mod_rewrite

Use Apache's mod_rewrite module to create URLs that are easy to understand and use.

The Apache server's mod_rewrite module gives you the ability to redirect one URL to another transparently, all without the user's knowledge. This opens up all sorts of possibilities, from simply redirecting old URLs to new addresses, to cleaning up the "dirty" URLs (filled with extra parameters and data your application will never use) coming from a poor publishing systemturning them into URLs that are friendlier to both readers and search engines.

6.11.1. An Introduction to Rewriting

Readable URLs are nice. A well-designed web site will have a logical filesystem layout with smart folder names and filenames and as many implementation details left out as possible. In the better-designed sites, readers can even guess at filenames with a high level of success.

However, sometimes the best possible design still can't stop your site's URLs from being nigh impossible to use. For instance, you might be using a content management system that serves out URLs that look something like http://www.site.com/viewcatalog.php?category=hats&prodID=53.

This is a horrible URL, but it and its brethren are becoming increasingly prevalent in these days of dynamically generated pages. There are a number of problems with a URL of this kind:

  • It exposes the underlying technology of the web site (in this case, PHP). This can give potential hackers clues as to what type of data they should send, along with the query string needed, to performa front-door attack on the site. You shouldn't give away information like this if you can help it.

    Hack 60. Make Usable URLs with mod_rewrite

    Even if you're not overly concerned with the security of your site, the technology you're using is at best irrelevantand at worst a source of confusionto your readers, so you should hide it from them if at all possible.

    If at some point in the future, you decide to change the language that your site is based on (to ASP, for instance), all of your old URLs will stop working. This is a pretty serious problem, as anyone who has tackled a full-on site rewrite will attest.


  • The URL is littered with awkward punctuation, such as the question mark and ampersand. Those & characters, in particular, are problematic; if another webmaster links to this page using that URL, the unescaped ampersands will invalidate his XHTML.

  • Some search engines won't index pages that they think are generated dynamically. Because of the danger of finding infinite pages by changing the query string of a URL, many search engine spiders are designed to avoid adding pages like this to their index.

Luckily, using rewriting, it's easy to clean up this URL to something far more manageable. For example, you can map the URL to http://www.site.com/catalog/hats/53/.

Much better, isn't it? This URL is more logical, readable, and memorable, and will be picked up by all search engines. The faux directories are short and descriptive. As an added benefit, it looks more permanent.

To use mod_rewrite, you supply it with the URLs you want the server to match (these are the dirty URLs mentioned earlier) and the real URLs that these will be redirected to. The URLs to be matched can be normal file addresses, which will match one file, or they can be regular expressions, which can match many files at the same time.

6.11.2. Basic Rewriting

Some servers will not have mod_rewrite enabled by default. As long as the module is present in an Apache installation, though, you can enable it simply by creating an Apache configuration file. Call this file .htaccess (or open one if it already exists), and place it in your site's root directory so that rewriting is enabled throughout your site. Once you have created the file, add this line:

	RewriteEngine on

6.11.2.1. Basic redirects.

We'll start off with a straight redirect; this is as if you had moved a file to a new location and want all links to the old location to be forwarded to the new location. Here's the code for a simple file redirect:

	RewriteEngine on
	RewriteRule ^old\.html$ new.html

Though this is the simplest example possible, it might still throw a few people off. The structure of the "old" URL is the only difficult part in RewriteRule. There are three special characters in there:

  • The caret, ^, signifies the start of a URL to be matched under the directory the .htaccess file is in. If you had omitted the caret, the preceding code would also match a file called cold.html. Because of the unintended matches that this can cause, you should start almost all of your matches with the caret.

  • The dollar sign, $, signifies the end of the string to be matched. You should add this to stop your rules from matching the first part of longer URLs.

  • The period, ., placed before the file extension, is a special character in regular expressions and would mean something special if we didn't escape it with the backslash, telling Apache to treat it as a normal character.

So, this rule will make your server transparently redirect from the old.html page to the new.html page. Your reader will have no idea that it happened, and it's pretty much instantaneous.

6.11.2.2. Forcing new requests.

Sometimes you do want your readers to know a redirect has occurred, and you can do this by forcing a new HTTP request for the new page. This will make the browser load the new page as if it were the page originally requested, and the location bar will change to show the URL of the new page. All you need to do is turn on the [R] flag by appending it to the rule:

	RewriteRule ^old\.html$ new.html [R]

Figure 6-25 shows this redirect in action.

6.11.3. Using Regular Expressions

Now we get on to the really useful stuff. The power of mod_rewrite comes at the expense of complexity. If this is your first encounter with regular expressions [Hack #87], you might find them to be a tough nut to crack, but the options they afford you are well worth the work it takes to learn and master them.

Figure 6-25. old.html redirecting the user to new.html
Hack 60. Make Usable URLs with mod_rewrite


With regular expressions, you can have your rules match a set of URLs at a time and mass-redirect them to actual pages. This is very useful when building a large site with many pages that are generated from a single PHP file. For example, you might design your site to have the URL structure http://www.example.com/articles/<article_id>.

This is a nice, clean structure. However, if all of your articles are generated from one show_article.php script, as is often the case, you're going to want to set up redirects from each URL to its real location. Take this rule:

	RewriteRule ^articles/([0-9][0-9])/$ show_article.php?articleID=$1

This will match any URLs that start with articles/, followed by any two digits, followed by a forward slash. For example, this rule will match a URL such as articles/12/ or articles/99/ and redirect it to the PHP script (see Figure 6-26 for an example).

Figure 6-26. A redirect from a clean URL to the article's location in the underlying PHP system
Hack 60. Make Usable URLs with mod_rewrite


The parts in square brackets are called ranges. In this case, we're allowing anything in the range 09, which is any digit. Other ranges might be [AZ], for any uppercase letter, [az] for any lowercase letter, and [AZaz] for any letter in either case.

We have encased the regular-expression part of the URL in parentheses because we want to store whatever value was found here for later use. In this case, we're sending this value to a PHP script as an argument. Once we have a value in parentheses, we can use it through a back-reference. Each part you've placed in parentheses is given an index, starting with 1. So, the first back-reference is $1, the third is $3, etc. Thus, once the redirect is done, the page loaded in the reader's browser will be something like show_article.php?articleID=12.

6.11.3.1. Adding trailing slashes.

If your site visitor had entered something like articles/ 12 into his browser's location bar, the preceding rule won't do a redirect, as the slash at the end of the URL is missing. To promote good URL writing, take care of this by doing a direct redirect to the same URL with the slash appended:

	RewriteRule ^articles/([0-9][0-9])$ articles/$1/ [R]

Multiple redirects in the same .htaccess file can be applied in sequence, which is what we're doing here. This rule is added before the one we did earlier, like so:

	RewriteRule ^articles/([0-9][0-9])$ articles/$1/ [R]
	RewriteRule ^articles/([0-9][0-9])/$ show_article.php?articleID=$1

Thus, if the user types in the URL articles/12, the first rule kicks in, rewriting the URL to include the trailing slash and doing a new request for articles/12/. Then the second rule has something to match and transparently redirects this URL to show_article.php?articleID=12. Pretty slick, huh?

6.11.3.2. Match modifiers.

You can expand your regular-expression patterns by adding some modifier characters, which allow you to match URLs with an indefinite number of characters. In the earlier examples, we were allowing only two numbers for each article's ID number. This isn't the most expandable solution, because if the number of articles published ever grew beyond these initial confines of 99 articles, resulting in a URL such as show_article.php?articleID=100, our rules would cease to match this URL.

So, instead of hardcoding a set number of characters to look for, we'll work in some room to grow by allowing any number of digits to be entered. The following rule does just that:

	RewriteRule ^articles/([0-9]+)$ articles/$1/ [R]

Note the plus sign (+) that has sneaked in there. This modifier changes whatever comes directly before it by saying "one or more of the preceding character or range." In this case, it means that the rule will match any URL that starts with articles/ and ends with at least one digit. So this'll match both articles/1 and articles/1000.

Other match modifiers you can use in the same way are the asterisk, *, which means "zero or more of the preceding character or range," and the question mark, ?, which means "zero or only one of the preceding character or range." Using URL rewriting means less confusing 404 errors for your readers, and a site that seems to run a whole lot smoother all around.

Ross Shannon


Previous
Table of Contents
Next