15.3. Regexp Special CharactersThe metacharacters +, *, ?, and { } affect the number of times a pattern should be matched, ( ) allows you to create subpatterns, and $ and ^ affect the position. + means "Match one or more of the previous expression," * means "Match zero or more of the previous expression," and ? means "Match zero or one of the previous expression." For example: preg_match("/[A-Za-z ]*/", $string); // matches "", "a", "aaaa", "The sun has got his hat on", etc preg_match("/-?[0-9]+/", $string); // matches 1, 100, 324343995, and also -1, -234011, etc. The "-?" means "match exactly 0 or 1 minus symbols" This next regexp shows two character classes, with the first being required and the second optional. As mentioned before, $ is a regexp symbol in its own right; however, here we precede it with a backslash, which works as an escape character, turning the $ into a standard character and not a regexp symbol. We match precisely one symbol from the range A-Z, a-z, and _, then match zero or more symbols from the range A-Z, a-z, underscore, and 0-9. If you're able to parse this in your head, you will see that this regexp will match PHP variable names: preg_match("/\$[A-Za-z_][A-Za-z_0-9]*/", $string); Table 15-3 shows a list of regular expressions using +, *, and ?, and whether or not a match is made.
Opening braces { and closing braces } can be used to define specific repeat counts in three different ways. First, {n}, where n is a positive number, will match n instances of the previous expression. Second, {n,} will match a minimum of n instances of the previous expression. Third, {m,n} will match a minimum of m instances and a maximum of n instances of the previous expression. Note that there are no spaces inside the braces. Table 15-4 shows a list of regular expressions using braces, and whether or not a match is made.
Parentheses inside regular expressions allow you to define subpatterns that should be matched individually. The most common use for these is to specify groups of alternatives for matches, allowing you to match very specific criteria. For example, "the (cat|car) sat on the (mat|drive)" would match "the cat sat on the mat", "the car sat on the mat", "the cat sat on the drive", and "the car sat on the drive". You can use as many alternatives as you want, so "the (car|cat|bat|bull|wool|white paint) sat on the (mat|drive)" could match many sentences. Table 15-5 shows a list of regular expressions using parentheses, and whether or not a match is made.
Finally, we have the dollar $ and caret ^ symbols, which mean "end of line" and "start of line," respectively. Consider the following string: $multitest = "This is\na long test\nto see whether\nthe dollar\nSymbol\nand the\ncaret symbol\nwork as planned"; As you know, \n means "new line," so that is a string containing the following text:
In order to parse multiline strings, we need the m modifier, so m needs to go after the final slash. Without m, our multiline string is treated as only being one line, with "This" at the start of the line and "planned" at the end. By adding "m" to the regexp, we're asking PHP to match $ and ^ against the start and end of each line wherever the newline (\n) character is. All of these code snippets return true: preg_match("/is$/m", $multitest); // returns true if 'is' is at the end of a line preg_match("/the$/m", $multitest); // returns true if 'the' is at the end of a line preg_match("/^the/m", $multitest); // returns true if 'the' is at the end of a line preg_match("/^Symbol/m", $multitest); // returns true if 'Symbol' is at the start of a line preg_match("/^[A-Z][a-z]{1,}/m", $multitest); // returns true if there's a capital and one or more lowercase letters at line start As explained, without the m modifier, the $ and ^ metacharacters only match the start and end of the entire string. With m, $ and ^ match the start and end of each new line. If you want to get the start and end of the string when m is enabled, you should use \A and \z, like this: preg_match("/\AThis/m", $multitest); // returns true if the string starts with "This" (true) preg_match("/symbol\z/m", $multitest); // returns true if the string ends with "symbol" (false) |