Приглашаем посетить
Орловка (orlovka.niv.ru)

Hack 46. Load Your Database from Excel

Previous
Table of Contents
Next

Hack 46. Load Your Database from Excel

Hack 46. Load Your Database from Excel Hack 46. Load Your Database from Excel

Use Excel 2003's XML capability to load your SQL database from an Excel spreadsheet.

More than a few times, I have had an Excel spreadsheet full of data that I needed to load into my database. Before Office 2003, I had to export each sheet as a CSV, and then use a custom loader to insert the records into the database. With Excel 2003's ability to save spreadsheets, macros, and even formatting as XML, that custom loader can go the way of eight-track tapes. The script in this hack turns Excel XML data into SQL that you can feed to your database. Figure 5-17 shows how the Excel-generated XML, taken as input to the gen.php script, is converted to SQL, which is then fed into the database.

Figure 5-17. The flow between the Excel XML and the database
Hack 46. Load Your Database from Excel


5.14.1. The Code

Save the code in Example 5-36 as gen.php.

Example 5-36. The code to generate SQL from an Excel XML file
<?php
$tables = array();
$indata = 0;

function encode( $text )

{ 
 $text = preg_replace( "/'/", "''", $text ); 
 return "'".$text."'";
}
function start_element( $parser, $name, $attribs )
{ 
  global $tables, $indata; 
  if ( $name == "WORKSHEET" ) 
  {
		$tables []= array( 
			    'name' => $attribs['SS:NAME'],                
				'data' => array()
		);
	}
	if ( $name == "ROW" )
	{
			$tables[count($tables)-1]['data'] []= array();
	}
	if ( $name == "DATA" )
	{
		$indata = 1;
	} 
}
function text( $parser, $text ) 
{
global $tables, $indata;
if ( $indata )
{

		$data =& $tables[count($tables)-1]['data'];
		$data[count($data)-1] []= $text;
	}
}
function end_element( $parser, $name )
{
	global $indata;
	if ( $name == "DATA" )
	$indata = 0;
}
$parser = xml_parser_create( );
xml_set_element_handler( $parser, "start_element", "end_element" );
xml_set_character_data_handler( $parser, "text" );
while( !feof( STDIN ) ) {
  $text = fgets( STDIN );
  xml_parse( $parser, $text );
}
xml_parser_free( $parser );
foreach( $tables as $table ) {
	$name = $table['name'];
	$data =& $table['data'];
	$cols = implode( ", ", $data[0] );
	for( $in = 1; $in < count( $data ); $in++ ) {
		$sqldata = implode( ", ", array_map( "encode", $data[$in] ) );
?>
INSERT INTO <?php echo( $name )?> ( <?php echo( $cols ) ?> ) VALUES ( <?php echo(
$sqldata ); ?> );
<?php } } ?>

This script is primarily an XML parser. Most of the code is in the XML parsing where the script looks for <Data> tags, which hold the spreadsheet's data. When the data is found, it's stored in an in-memory list of data tables. Then, for each table, there is a set of <Data> rows. These <Data> rows contain the data for each cell in the spreadsheet.

The second half of the script formats the fields and data within the $tables array into INSERT INTO commands (for simplicity, these are just output to the console). You can easily redirect that output into the mysql command, loading the data directly into the database, or you can store the output in a file for later (or repeated) use.

5.14.2. Running the Hack

First create a spreadsheet in Excel 2003. I created the simple sheet shown in Figure 5-18 to preload the publisher table of my database.

Figure 5-18. A sample Excel spreadsheet with data for the database
Hack 46. Load Your Database from Excel


I removed the worksheets that I didn't needempty sheets just bloat the XMLand then renamed the first worksheet publisher.

Hack 46. Load Your Database from Excel

The script in Example 5-36 uses the name of the sheet containing the data as the name of the table into which to load the data.


The first row of the worksheet contains the names of the fields. All subsequent rows contain the data to load into the table.

Next, save the file as XML using the Save As command in Excel's File menu. Here is a portion of the XML data from that file:

	<?xml version="1.0"?>
	<Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet"
	  xmlns:o="urn:schemas-microsoft-com:office:office"
	  xmlns:x="urn:schemas-microsoft-com:office:excel"
	  xmlns:html="http://www.w3.org/TR/REC-html40"
	  xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet">
	  <DocumentProperties xmlns="urn:schemas-microsoft-com:office:office">
	    <Author>Jack Herrington</Author>
		<LastAuthor>Jack Herrington</LastAuthor>
		<Created>2005-05-23T03:36:24Z</Created>
		<Company>MM</Company>
		<Version>11.257</Version>
	  </DocumentProperties>

		<OfficeDocumentSettings xmlns="urn:schemas-microsoft-com:office:office">
		<AllowPNG/>
		</OfficeDocumentSettings>
		<ExcelWorkbook xmlns="urn:schemas-microsoft-com:office:excel">
		<WindowHeight>15080</WindowHeight>
		<WindowWidth>24840</WindowWidth>
		<WindowTopX>80</WindowTopX>
		<WindowTopY>-20</WindowTopY>
		<Date1904/>
		<AcceptLabelsInFormulas/>
		<ProtectStructure>False</ProtectStructure>
		<ProtectWindows>False</ProtectWindows>
		</ExcelWorkbook>
		<Styles>
		<Style ss:ID="Default" ss:Name="Normal">
		<Alignment ss:Vertical="Bottom"/>
		<Borders/>
		<Font ss:FontName="Verdana"/>
		<Interior/>
		<NumberFormat/>
		<Protection/>
		</Style>
		</Styles>
		<Worksheet ss:Name="Publisher">
		<Table ss:ExpandedColumnCount="2" ss:ExpandedRowCount="7" x:
		FullColumns="1"
		x:FullRows="1">
		<Row>
		<Cell><Data ss:Type="String">ID</Data></Cell>
		<Cell><Data ss:Type="String">Name</Data></Cell>
		</Row>
		<Row>
		<Cell ss:Formula="=&quot;0&quot;"><Data ss:Type="String">0</Data></Cell>
		<Cell><Data ss:Type="String">O'Reilly</Data></Cell>
		</Row>
		<Row>
		<Cell ss:Formula="=&quot;0&quot;"><Data ss:Type="String">0</Data></Cell>
		<Cell><Data ss:Type="String">Manning</Data></Cell>
		</Row>

From this XML, I want the name of the worksheet, as well as the rows of data.

Use the command-line version of PHP to run the SQL generator script on the Excel XML data:

	% php gen.php < data.xml
	INSERT INTO Publisher ( ID, Name ) VALUES ( '0', 'O''Reilly' );
	INSERT INTO Publisher ( ID, Name ) VALUES ( '0', 'Manning' );
	INSERT INTO Publisher ( ID, Name ) VALUES ( '0', 'Wiley' );
	INSERT INTO Publisher ( ID, Name ) VALUES ( '0', 'Addison-Wesley' );
	INSERT INTO Publisher ( ID, Name ) VALUES ( '0', 'Pragmatic Press' );
	INSERT INTO Publisher ( ID, Name ) VALUES ( '0', 'APress' );

You can also pipe this right into a file, for example:

	% php gen.php < data.xml > publishers.sql

5.14.3. See Also


Previous
Table of Contents
Next