Приглашаем посетить
Чулков (chulkov.lit-info.ru)

Other Regular Expression Functions

Previous
Table of Contents
Next

Other Regular Expression Functions

You have thus far seen how to use the ereg function to match strings against regular expressions and find out what those matches were. There are a few other functions that we would like to mention, however, because they provide some additional functionality you might find useful in your web applications.

ereg_replace

A very powerful application of regular expressions is to use them to help us find and replace items within a string, via the ereg_replace function. It takes three parameters, which are, in order

  1. The regular expression to match

  2. The text to replace any matches with

  3. The string on which to apply the operation

The function returns the third parameter with any applicable replacements applied.

A very simple usage is just to specify which string or pattern to replace with another, such as the following:

<?php

// replace all instances of "shoe" with "cat"
echo ereg_replace('shoe', 'cat',
                  'I like shoes and shoes like me.');
echo "<br/>\n";

// replace any USD monetary value with "(lots of money)"
echo ereg_replace('\$[0-9]+(\.[0-9]{1,2})?', '(lots of money)',
                  'John is paid $150453.44 each year!');
?>

The output of this is as follows:

I like cats and cats like me.
John is paid (lots of money) each year! 

This function, however, enables us to perform much more powerful replacements. To do this, it requires one extra piece of knowledge about regular expressions so that we can tell it what to replace.

Regular expressions have a feature known as back references. These assign a name to any group (delimited by parentheses, ( and ) ) in a regular expression. This name can then be used to tell ereg_replace what to replace in matches. In the POSIX regular expressions in PHP5, the first group will be given the name \1, the second \2, and the nth \n. n is not permitted to exceed 9 in this implementation, and \0 refers to the entire string.

For example, we need a regular expression to match % and ; characters in an input string, so we could put a backslash in front of them. This expression could be as follows:

[%;]

Unfortunately, this gives us no way to use a back reference with any matches against that expression. To solve that problem, we just wrap it in parentheses to create a group, as follows:

([%;])

Now we can refer to any matches against this group as \1. We now use ereg_replace to replace any matches in the group ([%;]) with a backslash character followed by that match:

$replaced = ereg_replace('([%;])', '\\\1', $in_string);

The first parameter instructs ereg_replace to match (as a group) any % or ; character. The second parameter tells it to then replace any matches from that group (\1) with a backslash (\\we use two backslashes because it has to be escaped) and the contents of that match (\1). We would thus see the input

Horatio %; DELETE FROM Users;

replaced with the following:

Horatio \%\; DELETE FROM Users\;

As a second example, if we want to clean up a phone number for output, we can write an extremely tolerant pattern for phone numbers that wraps each of the three-digit sections with grouping parentheses, such as the following:

.*([0-9]{3,3}).*([0-9]{3,3}).*([0-9]{4,4})

The three groups of digits then have the back references \1, \2, and \3, from left to right. So, to clean up our phone numbers, we could write the following code:

$pn = '       123-    456 -   - 7890';
$pn_regex = '.*([0-9]{3,3}).*([0-9]{3,3}).*([0-9]{4,4})';

$str = ereg_replace($pn_regex, '(\1)\2-\3', $pn);

The preceding code would output the following very lovely phone number:

(123)456-7890

Note again that we are always using single quotes when writing the regular expressions or the string to replace. If we use double quotes, we have to include an extra backslash in front of the back references so that PHP does not try to treat them as escape sequences, as follows:

$str = ereg_replace($pn_regex, "(\\1)\\2-\\3", $pn);

Split

The split function lets us break apart strings, using regular expressions to specify the matching strings that will be used as the boundaries for that splitting. A simple usage is just to list a character to use to split apart a string:

$array = split(':', "One:Two:Three');

The preceding code returns an array with three strings in it, namely One, Two, and Three. Regular expressions, however, let us be more flexible:

$array = split('[|:, ]', 'One|Two:Three Four,Five');

The first parameter in the preceding code says to use any of the characters as a valid separator, so we will see an array with five strings in it after executing this code:

Array
( 
  [0] => One
  [1] => Two
  [2] => Three
  [3] => Four
  [4] => Five
)

If we have a very long string that represents a number of lines of text, each separated by a newline character, we can write the following code to split them up:

if (substr($_SERVER['OS'], 'Windows') !== FALSE)
  $nl = '\r\n';

else // Unix
  $nl = '\n';

$array = split($nl, $extremely_long_text);

The $array variable now contains each of the lines in the $exTRemely_long_text variable as individual values within the array (without any of the newline characters).

PHP includes another function that does something similar to split called explode. This function is not multi-byte character safe, but it is significantly faster than split. We will therefore use these functions interchangeably. We will use explode when we are absolutely guaranteed that our input is in single-byte character sets such as ISO-8859-1, and split when we are dealing with user input.


Previous
Table of Contents
Next