Appendix B. Regular Expression Basics
Behind the innocuous and generic phrase
regular expression lives an intricate and
powerful world of text pattern
matching. With regular expressions, you can make sure that a user
really entered a ZIP Code or an email address in a form field, or
find all the HTML <a> tags in a page. If
your web site relies on data feeds that come in text files, such as
sports scores, news articles, or frequently updated headlines,
regular expressions can help you make sense of these.
This appendix provides an overview of the most useful and commonly
encountered parts of the regular expression menagerie. By learning
the special meanings of 5 or 10 symbols and 2 or 3 PHP functions, you
can use regular expressions to solve most of the text-processing
problems you run into when building a web site with PHP. There are
some dark corners and steep ravines of the regular expression
landscape that are not covered here, however, such as locale support,
lookahead and assertions, and conditional subpatterns. To learn more
about regular expressions, see the PCRE section of the PHP Manual, at
http://www.php.net/pcre, or read
the comprehensive Mastering Regular
Expressions by Jeffrey E.F. Friedl (O'Reilly).
To work with regular expressions in PHP, use the functions in the
PCRE (Perl-compatible regular
expressions) extension. These functions are
included with PHP by default and are described in the online manual
at http://www.php.net/pcre. Section B.6, later in this appendix,
gives an overview of the PCRE functions. If you're
already familiar with regular expression basics, read that section to
learn the language-specific details of using regular expressions in
PHP.
A regular expression is a string. That string defines a pattern that
matches other strings. For example, the regular expression
\d{5}(-\d{4})? matches U.S. ZIP or ZIP+4 Codes:
- \d
-
A digit (0-9)
- {5}
-
A total of five of the previous item (a digit)
- -
-
A literal - character
- \d
-
A digit
- {4}
-
A total of four of the previous item (a digit)
- ( )?
-
Makes what's inside the parentheses optional
So, the regular expression \d{5}(-\d{4})? matches
"five digits, optionally followed by a hyphen and
four digits."
Here's another regular expression:
</?[bBiI]>. This one matches opening or
closing HTML <b> or
<i> tags:
- <
-
A literal < character
- /
-
A literal / character
- ?
-
Make the previous item (the /) optional
- [bBiI]
-
One of anything inside the square brackets: b,
B, i, or I
- >
-
A literal > character
The regular expression </?[bBiI]> means
"A less-than sign, followed by an optional forward
slash, followed by a b, B, i, or I, followed by a greater-than
sign." This matches eight HTML tags:
<b>, <B>,
</b>, </B>,
<i>, <I>,
</i>, and </I>.
|