Приглашаем посетить
Бианки (bianki.lit-info.ru)

Sample: Working with the Google APIs

Previous
Table of Contents
Next

Sample: Working with the Google APIs

Many corporations now offer functionality via XML Web Services to the public. One such company is Google, Inc., which offers the use of its search engine as a service. You can learn more about this at http://www.google.com/apis.

This section demonstrates a small add-on to a web application via which we can use keywords to show "related links" to the user. We will just generate the keywords manually and focus mostly on our using the Google APIs.

Setting Up to Use the Google APIs

Before we can begin using the Google search functionality from within web applications, you need to perform three steps, outlined at http://www.google.com/apis.

For the first step, download the developer kit from the Google site. This contains a number of files and classes for Java and Microsoft's .NET platform. We will not need any of these. Pay attention only to the .wsdl file included in the kit.

In the second step, you need to set up an account with Google. Because Google graciously offers this service free of charge, it places some restrictions on its use:

  • You are only allowed to make 1,000 queries per day.

  • You will only receive (at most) 10 search results per query.

  • The services can only be used for noncommercial private use only. If you want to make money or increase traffic to your site through the use of Google's services, Google (reasonably) expects you to enter into a commercial agreement with the company.

  • You agree not to violate the spirit of the free service and not to use it to attempt to manipulate page rankings or otherwise do more than straightforward querying.

For our demonstration purposes, these terms are entirely reasonable and not at all restrictive.

In the final step, we unpack the developer kit .zip file and place the GoogleSearch.wsdl in a location where we can access it from our web application. For this sample, we placed it in the same directory as our scripts.

After you have created an account, Google will send you an e-mail with your license key, which must be passed along with queries sent to Google.

Learning More About the Service

The first thing we will do will be to learn more about the functionality offered by the Google APIs and the GoogleSearch.wsdl file they sent us. Although we could look through the WSDL document and try to figure out what the methods are, we have another means at our disposalthe __getFunctions method on SoapClient. This enables us to verify that everything is working properly with SOAP and saves us looking through some potentially complicated XML.

To demonstrate, we write this simple script to list all of the methods available to us through the APIs:

<?php

try
{
  //
  // first load the .wsdl file that Google provides with
  // its API download.
  //
  $sc = @new SoapClient('GoogleSearch.wsdl');

  //
  // next, we'll show a list of all the API functions that
  // this WSDL file contains:
  //
  $fns = @$sc->__getFunctions();
  foreach ($fns as $fn)
  {
    //
    // these first four lines just extract the appropriate
    // parts from the API string.
    //
    ereg(' [[:alnum:]]*\(', $fn, $res);
    $api = substr($res[0], 0, strlen($res[0]) - 1);
    ereg('\(.*\)', $fn, $res);
    echo "<b>$api</b>: $res[0]<br/><br/>\n";
  }
}
catch (SoapFault $sf)
{
  echo "SOAP Error: <b>$sf->faultstring</b><br/>\n";
}
catch (Exception $e)
{
  $msg = $e->getMessage();
  echo "Unknown Exception: <b>$msg</b><br/>\n";
}

?>

The output of this script looks like this. (The ereg calls exist strictly to help us extract portions of the function signature for formatted output. See whether you can figure out how they work.)

doGetCachedPage: (string $key, string $url)

doSpellingSuggestion: (string $key, string $phrase)

doGoogleSearch: (string $key, string $q, int $start, 
                 int $maxResults, boolean $filter,
                 string $restrict, boolean $safeSearch,
                 string $lr, string $ie, string $oe)

We can see that the XML Web Service exposes three methods. We will concern ourselves with the doGoogleSearch method and leave learning about the others (at http://www.google.com/apis/reference.html) as an exercise for you.

How the Search Works

The doGoogleSearch web method has a reasonably large function signature, requiring 10 parameters, as listed in the following table.

Parameter Name

Description

$key

The license key that you have been given by Google to use the APIs.

$q

The query string to use for the search.

$start

The (zero-based) starting index of the results.

$maxResults

The maximum number of results to return. This cannot exceed 10.

$filter

Controls whether results should be filtered to eliminate closely related results or those originating from the same site.

$restricts

Controls from which countries results should originate ('' means no filtering).

$safeSearch

Controls whether Google SafeSearch to eliminate adult content is turned on.

$lr

Controls in which languages results should be returned ('' means no restrictions).

$ie

A parameter that is deprecated and ignored.

$oe

A parameter that is deprecated and ignored.


The function returns an object with the following structure:

class stdClass
{
  public $documentFiltering;          // true or false
  public $searchComments;             // comments from Google
  public $estimatedTotalResultsCount; // total num. of results
  public $estimateIsExact;            // estimated or actual
  public $resultElements;             // array of result objs
  public $searchQuery;                // the submitted query
  public $startIndex;                 // start index of results
  public $endIndex;                   // end index of results
  public $searchTips;                 // tips from Google
  public $directoryCategories;        // ODP category
  public $searchTime;                 // how long it took
}

Most of the members are intuitive except for the $resultElements member (and $directoryCategories, which we will not use). The result elements are returned in an array of objects, each of which is as follows:

class stdClass
{
  public $summary;                   // summary from ODP dir
  public $URL;                       // URL of result
  public $snippet;                   // quick desc of result
  public $title;                     // title of the page
  public $cachedSize;                // if not 0, cache avail.
  public $relatedInformationPresent; // true means available
  public $hostName;                  // returned when filtering
  public $directoryCategory;         // ODP category
  public $directoryTitle;            // ODP category title
}

In both of these objects, ODP refers to the Open Directory Project, an attempt to create a global directory of the Internet. Google uses this in its searches whenever possible.

With an idea of how to use the doGoogleSearch function and an idea of what it is going to return to us, we can write the main portion of our sample.

Searching for Keywords

To do our work, we will write a GoogleKeywords class, with a public static method called findAndPrintRelatedPages.

This class and this first method are as follows:

define('GOOGLE_LICENSE_KEY', 'secret');  // from Google
define('RESULTS_PER_PAGE', 10);          // Google's limit

class GoogleKeywords
{
  //
  // this function takes a string containing keywords to
  // search for through Google and prints out the top
  // 10 results as returned by Google.
  //
  public static function findAndPrintRelatedPages($in_keywords)
  {
    try
    {
      // we need the .wsdl file to make this work!
      $sc = @new SoapClient('GoogleSearch.wsdl');

      // full documentation for this method can be found
      // at http://www.google.com/apis/reference.html
      $results = @$sc->doGoogleSearch(
          GOOGLE_LICENSE_KEY,                 // Google key
          trim($in_keywords),                 // query string
          0,                                  // starting index
          RESULTS_PER_PAGE,                   // max # results
          FALSE,                              // filter output?
          '',                                 // pref. country
          FALSE,                              // SafeSearch on?
          '',                                 // preferred lang
          '',                                 // ignored
          ''                                  // ignored
      );

      // start the page and summarize the results:
      self::emitSearchSummary($results);

      // now show the results:
      foreach ($results->resultElements as $resultObject)
        self::emitSearchResult($resultObject);

    }
    catch (SoapFault $sf)
    {
      echo "SOAP Fault Occurred: {$sf->faultstring}<br/>\n";
    }
    catch (Exception $e)
    {
      echo "Exception Occurred: {$sf->faultstring}<br/>\n";
    }

  }
}

This method calls two others: the emitSearchSummary function

  private static function emitSearchSummary($in_results)
  {
    echo <<<EOHEADER
  <br/>
  Google found approximately
  <em>$in_results->estimatedTotalResultsCount</em>
  pages related to this one.<br/><br/>

  Showing the first ten:<br/>

EOHEADER;
  }

and the emitSearchResult function:

  private static function emitSearchResult($in_result)
  {
      echo <<<EORESULT
  <table width='70%' border='0' cellspacing='0'
         cellpadding='0'>
  <tr>
    <td width='100%' bgcolor='#ebecca'>
      <a href='$in_result->URL'>
          <b>$in_result->title</b>
      </a>
    </td>
  </tr>
  <tr>
    <td>
      $in_result->snippet<br/>
    </td>
  </tr>
  <tr>
    <td bgcolor='#fbfcda'>
      <a href='$in_result->URL'>$in_result->URL</a>
    </td>
  </tr>
  </table>
  <br/><br/>

EORESULT;
  }

With all this ready to go, we just need to write the page to use it. We have written a small script called showarticle.php, which has three "dummy" articles including keywords. It randomly selects one of these, prints the (single-sentence) article, and then tells the GoogleKeywords class to print the related pages:

<?php
ob_start();

// this will let us show keywords for this article.
include('google_keywords.inc');

?>
<!DOCTYPE html PUBLIC "~//W3C//DTD XHTML 1.0 Transitional//EN"
  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en-US"
      xml:lang="en-US">
<head>
  <title>Display Article</title>
  <meta http-equiv="content-type"
        content="text/html; charset=utf-8"/>
</head>
<body>

<?php
//
// to keep this sample simple, we're going to use some fake
// article placeholders here and just associate some keywords
// with them.  We will randomly select one of these articles
// to display.
//
$articles = array(
  array('keywords' => 'Jose Maria Aznar biography',
        'article' => 'All about Jose Maria Aznar, former prime
                      minister of Spain.'),
  array('keywords' => 'Egyptian Mau cats',
        'article' => 'Egyptian Mau cats are adorable, but quite
                      expensive, and surprisingly annoying at
                      6.00 in the morning!'),
  array('keywords' => 'uralo altaic hypothesis',
        'article' => 'The Uralo-Altaic Hypothesis suggests that
                      languages such as Turkish and Japanese
                      are genetically related, but is losing
                      favour.')
);

//
// randomly select and display an article.
//
$use = rand(0, count($articles));

echo <<<EOT
  <h2>The Article</h2>
  <hr size='1'/>
  <p align='left'>
    {$articles[$use]['article']}
  </p>
  <hr size='1'/>
  <br/><br/>
EOT;

//
// now display the related matches against their keywords.
//
GoogleKeywords::findandPrintRelatedPages(
    $articles[$use]['keywords']);

?>

</body>
</html>
<?php ob_flush(); ?>

The output of this page might look something like that shown in Figure 27-3.

Figure 27-3. Running our keywords XML Web Service sample.

Sample: Working with the Google APIs


With this sample, you should have a good idea how easy it is to integrate XML Web Services into your applications and how powerful they can be.


Previous
Table of Contents
Next