Приглашаем посетить
Иностранная литература (ino-lit.ru)

Hack 38. Read XML on the Cheap with Regular Expressions

Previous
Table of Contents
Next

Hack 38. Read XML on the Cheap with Regular Expressions

Hack 38. Read XML on the Cheap with Regular Expressions Hack 38. Read XML on the Cheap with Regular Expressions

Use regular-expression hacks to read XML without paying the expense of firing up the XML parser functions.

You can read XML with PHP using very few PHP libraries. For example, XML support is actually an extension that might or might not be installed on the server your code is running on. To avoid reliance on an optional extension, it's sometimes easier and more portable to extract data from XML with a few regular expressions than it is to fire up the XML parser.

5.6.1. The Code

Save the XML in Example 5-20 as books.xml.

Example 5-20. Some simple XML code, serving as a demonstration
	<books>
			<book name="Pragmatic Programmer" />
			<book
				name="Code Generation in Action" />
			<book id="8951234" name="Podcasting Hacks" />
	</books>

Now save the code in Example 5-21 as bookread.php.

Example 5-21. A simple script that uses regular expressions to read XML
<?php
$xml = "";
while( !feof(STDIN) ) { $xml .= fgets( STDIN ); }
	
preg_match_all( "/\<book\s+.*?name=[\"|\'](.*?)[\"|\'].*?\/\>/is", $xml,
	$found );

foreach( $found[1] as $name ) { print( "$name\n" ); } 
?>

This script uses preg_match_all() to find all the occurrences of the book tag in the XML. Then it groups the content in the name field and pulls that out. The s modifier flag is critical because it tells the regular-expression engine to match across multiple lines. All of the variants of the book tag in books.xml are valid, so this regular expression needs to be flexible enough to handle them.

5.6.2. Running the Hack

Run the bookread.php file from the command line using the php command:

	% php bookread.php < books.xml
	Pragmatic Programmer
	Code Generation in Action
	Podcasting Hacks

5.6.3. Hacking the Hack

Another common XML situation is to have a complex nested structure that you need to parse as units. Take a list of people such as this one:

	<people>
			<person>
				<first>Jack</first>
				<last>Herrington</last>
			</person>
			<person>
				<last>Katzen</last>
				<first>Molly</first>
			</person>
	</people>

Ideally, you probably want an array of each person, complete with first and last names. Example 5-22 is a script that uses successive regular expression calls to first find the person tags and then search within the tags to find the first and last names.

Example 5-22. Regular expressions dealing with complex XML
<?php
$text = "";
while( !feof( STDIN ) ) { $text .= fgets( STDIN ); }

preg_match_all( "/\<person\>(.*?)\<\/person\>/si", $text, $people );

$list = array( );

foreach( $people[1] as $person )
{
		preg_match( "/\<first\>(.*?)\<\/first\>/is", $person, $res );
		$first = $res[1];
		preg_match( "/\<last\>(.*?)\<\/last\>/is", $person, $res );
		$last = $res[1];
		$list []= array(
				'first' => $first,
				'last' => $last
		);
}
print_r( $list );
?>

Because the first and last tags can appear in any order, this script is more robust than the one shown in Example 5-21, which uses a single regular expression to try to get first and last simultaneously.

Using the command-line version of PHP, run the command like this:

	% php peopleread.php < people.xml
	Array
	(

		[0] => Array
			(
				[first] => Jack
				[last] => Herrington
			)

		[1] => Array
			(
				[first] => Molly
				[last] => Katzen
			)
	)

The print_r() function shows the contents of the array in an "easy for a programmer to read" manner. At this point, you can use this information however you wantwithout having to rely on PHP's XML extension libraries.

5.6.4. See Also

  • "Create a Simple XML Query Handler for Database Access" [Hack #40]

  • "Create XML the Right Way" [Hack #54]


Previous
Table of Contents
Next