Ïðèãëàøàåì ïîñåòèòü
ßçûêîâ (yazykov.lit-info.ru)

Handling Input/Output (I/O)

Previous
Table of Contents
Next

Handling Input/Output (I/O)

A central aspect of the Unix design philosophy is that a number of small and independent programs can be chained together to perform complicated tasks. This chaining is traditionally accomplished by having a program read input from the terminal and send its output back to the terminal. The Unix environment provides three special file handles that can be used to send and receive data between an application and the invoking user's terminal (also known as a tty):

  • stdin Pronounced "standard in" or "standard input," standard input captures any data that is input through the terminal.

  • stdout Pronounced "standard out" or "standard output," standard output goes directly to your screen (and if you are redirecting the output to another program, it is received on its stdin). When you use print or echo in the PHP CGI or CLI, the data is sent to stdout.

  • stderr Pronounced "standard error," this is also directed to the user's terminal, but over a different file handle than stdin. stderr generated by a program will not be read into another application's stdin file handle without the use of output redirection. (See the man page for your terminal shell to see how to do this; it's different for each one.)

In the PHP CLI, the special file handles can be accessed by using the following constants:

  • STDIN

  • STDOUT

  • STDERR

Using these constants is identical to opening the streams manually. (If you are running the PHP CGI version, you need to do this manually.) You explicitly open those streams as follows:

$stdin = fopen("php://stdin", "r");
$stdout = fopen("php://stdout", "w");
$stderr = fopen("php://stderr", "w");

Why Use STDOUT?

Although it might seem pointless to use STDOUT as a file handle when you can directly print by using print/echo, it is actually quite convenient. STDOUT allows you to write output functions that simply take stream resources, so that you can easily switch between sending your output to the user's terminal, to a remote server via an HTTP stream, or to anywhere via any other output stream.

The downside is that you cannot take advantage of PHP's output filters or output buffering, but you can register your own streams filters via streams_filter_register().


Here is a quick script that reads in a file on stdin, numbers each line, and outputs the result to stdout:

#!/usr/bin/env php
<?php

$lineno = 1;
while(($line = fgets(STDIN)) != false) {
        fputs(STDOUT, "$lineno $line");
        $lineno++;
}
?>

When you run this script on itself, you get the following output:

1 #!/usr/bin/env php
2 <?php
3
4 $lineno = 1;
5 while(($line = fgets(STDIN)) != false) {
6       fputs(STDOUT, "$lineno $line");
7       $lineno++;
8 }
9 ?>

stderr is convenient to use for error notifications and debugging because it will not be read in by a receiving program's stdin. The following is a program that reads in an Apache combined-format log and reports on the number of unique IP addresses and browser types seen in the file:

<?php
$counts = array('ip' => array(), 'user_agent' => array());
while(($line = fgets(STDIN)) != false) {
  # This regex matches a combined log format line field-by-field.
  $regex = '/^(\S+) (\S+) (\S+) \[([^:]+):(\d+:\d+:\d+) ([^\]]+)\] '.
           '"(\S+) (.*?) (\S+)" (\S+) (\S+) "([^"]*)" "([^"]*)"$/';
  preg_match($regex,$line,$matches);
  list(, $ip, $ident_name, $remote_user, $date, $time,
       $gmt_off, $method, $url, $protocol, $code,
       $bytes, $referrer, $user_agent) = $matches;
  $counts['ip']["$ip"]++;
  $counts['user_agent']["$user_agent"]++;
  # Print a '.' to STDERR every thousand lines processed.
  if(($lineno++ % 1000) == 0) {
    fwrite(STDERR, ".");
  }
}
arsort($counts['ip'], SORT_NUMERIC);
reset($counts['ip']);
arsort($counts['user_agent'], SORT_NUMERIC);
reset($counts['user_agent']);

foreach(array('ip', 'user_agent') as $field) {
  $i = 0;
  print "Top number of requests by $field\n";
  print "--------------------------------\n";
  foreach($counts[$field] as $k => $v) {
    print "$v\t\t$k\n";
  if($i++ == 10) {
    break;
  }
 }
 print "\n\n";
}
?>

The script works by reading in a logfile on STDIN and matching each line against $regex to extract individual fields. The script then computes summary statistics, counting the number of requests per unique IP address and per unique Web server user agent. Because combined-format logfiles are large, you can output a . to stderr every 1,000 lines to reflect the parsing progress. If the output of the script is redirected to a file, the end report will appear in the file, but the .'s will only appear on the user's screen.


Previous
Table of Contents
Next