Приглашаем посетить
Чехов (chehov-lit.ru)

Hack 24. Create Link Graphs

Previous
Table of Contents
Next

Hack 24. Create Link Graphs

Hack 24. Create Link Graphs Hack 24. Create Link Graphs

Use the font size of links to express the importance of certain terms.

Flikr (http://flikr.com/) is a site that allows users to upload images, and then tag those images with single-word terms. You can come back later and search the Flikr database using those terms, and you can see a link graph [Hack #91] that shows the most-used terms in a font larger than that used for the less frequently used terms.

In this hack, I show you how to create a link graph by analyzing an article from the Cable News Network web site (http://cnn.com/) for keywords. The words are counted, and the font size of each word is scaled relative to the number of counts.

3.15.1. The Code

Save the code in Example 3-19 as linkgraph.php.

Example 3-19. Link graph code
<?php
$wordcounts = array();

$words = split( " ", "CNN number Americans disapproving President Bush job
perance risen highest level presidency according CNN USA Today Gallup poll
released Monday According poll percent respondents disapproved Bush performance
compared percent approved margerror plus minus percentage points percent
figure highest disapproval rating recorded CNN USA Today Gallup poll Bush
president January approval percentage percent matches low point late March
point gap between those disapproved approved largest recorded during Bush tenure
As Bush prepares address nation Tuesday defend Iraq policy just percent those
responding poll approved handling war percent disapproved Full story approval
rating Iraq unchanged poll late May disapproval figure marked increase
percentage points But poll found issues other Iraq war dragging down Bush numbers
Respondents expressed stronger disapproval handling economy energy policy health
care Social Security lone bright spot president poll handling terrorism which
scored percent approval rating compared just percent disapproved presidents
worst numbers latest poll came issue Social Security respondents disapproving
performance margmore percent percent Bush made changing Social Security system
signature issue second term He proposed creating voluntary government sponsored
personal retirement accounts workers younger Under proposal workers could invest
portion their Social Security taxes range government selected funds exchange
guaranteed benefits retirement plan run instiff opposition Democrats accounts are
too risky undermine Social Security system Some Republicans are wary taking such
politically risky economy only percent poll respondents approved Bush
performance compared percent disapproved On energy policy percent approved
percent disapproved health care percent approved percent disapproved poll results
based interviews Friday Sunday American adults" );
foreach( $words as $word )
{
  $word = strtolower( $word );
  if ( strlen( $word ) > 0 )
  {
	if ( ! array_key_exists( $word, $wordcounts ) )
	    $wordcounts[ $word ] = 0; 
	  $wordcounts[ $word ] += 1;
  } 
}

$min = 1000000;
$max = -1000000;
foreach( array_keys( $wordcounts ) as $word )
{
  if ( $wordcounts[ $word ] > $max )
	$max = $wordcounts[ $word ];
  if ( $wordcounts[ $word ] < $min )
	$min = $wordcounts[ $word ]; 
} 
$ratio = 18.0 / ( $max - $min );
?>
<html>
<head>

<style type="text/css">
body { font-family: arial, verdana, sans-serif; }
.link { line-height: 20pt; }
</style>
</head>
<body>
<div style="width:600px;">
<?php
$wc = array_keys( $wordcounts );
sort( $wc );
foreach( $wc as $word )
{
$fs = (int)( 9 + ( $wordcounts[ $word ] * $ratio ) );
?>
<a class="link" href="http://en.wikipedia.org/wiki/<?php echo($word);
	?>" style="font-size:<?php echo( $fs ); ?>pt;">
<?php echo( $word ); ?></a> &nbsp;
<?php } ?>
</div>
</body>
</html>

Hack 24. Create Link Graphs

I've hardcoded in the keywords of an article; you could just as easily fetch an article from the Web programmatically.


3.15.2. Running the Hack

Upload the file to your web server and navigate your browser to linkgraph.php. You should see something like Figure 3-20.

As you can see, terms like percent, bush, approved, disapproved, security, and social stand out from the rest because they were used more often. It's interesting that from these clues, it's clear that this CNN article was about recent polling numbers and Bush's second-term efforts on Social Security. The word disapproved is slightly larger, which could indicate something negative, or just a writing style in the article. Regardless, even on this simple data set, it's clear that some interesting features in the data appear clearly contrasted in a link graph.

3.15.3. See Also

Figure 3-20. The link graph of the article
Hack 24. Create Link Graphs



Previous
Table of Contents
Next