Ïðèãëàøàåì ïîñåòèòü
ßçûêîâ (yazykov.lit-info.ru)

The Streams API

Previous
Table of Contents
Next

The Streams API

The streams API is a very exciting development for PHP. It wraps all I/O access and all the PHP I/O functions in an abstraction layer. The goal of the streams project is to wrap all I/O in PHP in a generic wrapper, so that regardless of how a file is accessed (via the local filesystem, HTTP, or FTP), the basic I/O functions fopen(), fread(), fwrite()/fclose(), and fstat() all work. Providing an API for this allows you to register a named protocol type, specify how certain primitive operations work, and have the base PHP I/O functions work for that prototype as well.

From an extension-author point of view, streams is nice because it allows you to access streams-compatible protocols from C almost as you would in PHP. The following snippet of C implements this PHP statement:

return file_get_contents("http://www.advanced-php.com/");

php_stream *stream;
char *buffer;
int alloced = 1024;
int len = 0;

stream = php_stream_open_wrapper("http://www.advanced-php.com/"), "rb",
                                 REPORT_ERRORS, NULL);
if(!stream) {
  return;
}
buffer = emalloc(len);
while(!php_eof_stream(stream)) {
  if(alloced == len + 1) {
    alloced *= 2;
    buffer = erealloc(buffer, alloced);
  }
  php_stream_read(stream, buffer + len, alloced - len - 1);
}
RETURN_STRINGL(buffer, 0);

This might seem like a lot of code, but realize that this function itself knows nothing about how to open an HTTP connection or read from a network socket. All that logic is hidden in the streams API, and the necessary protocol wrapper is automatically inferred from the URL protocol in the string passed to php_stream_open_wrapper().

Further, you can create stream zvals for passing a stream resource between functions. Here is a reimplementation of fopen() that you might use if you wanted to turn off allow_url_fopen to prevent accidental opening of network file handles but still allow them if you were sure the user was requesting that facility:

PHP_FUNCTION(url_fopen)
{
  php_stream *stream;
  char *url;
  long url_length;
  char *flags;
  int flags_length;
  if(zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "ss",
		      &url, &url_length,
                            &flags, &flags_length) == FAILURE) {
    return;
  }
  stream = php_stream_open_wrapper(url, flags, REPORT_ERRORS, NULL);
  if(!stream) {
    RETURN_FALSE;
  }
  php_stream_to_zval(stream, return_value);
}

Similarly, you can pass streams into a function. Streams are stored as resources, so you use the "r" format descriptor to extract them and php_stream_from_zval() to convert them into a php_stream structure. Here is a simple version of fgets():

Note

Note that this example is for informational purposes only. Because the stream opened by url_fopen() is a standard stream, the resource it returns can be acted on with fgets() as well.


PHP_FUNCTION(url_fgets)
{
  php_stream *stream;
  zval *stream_z;
  int l;
  char buffer[1024];

  if(zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC,
                           "z", &stream_z) == FAILURE) {
    return;
  }
  php_stream_from_zval(stream, &stream_z);
  if(!php_stream_eof(stream)) {
    l = php_stream_gets(stream, buffer, sizeof(buffer));
  }
  RETURN_STRINGL(buffer, l, 1);
}

The real power of streams, though, is that you can implement your own streams types. Implementing your own custom streams is extremely useful if you need to access a storage type or protocol that is not internally supported by PHP. As in many things, reinventing the wheel is not a good path to take: The built-in stream handlers for normal files and network protocols are well vetted and have been coded to handle the idiosyncrasies of many specific platforms.

The basic idea of the streams API is that I/O operations can be represented by six primitive operations:

  • open() Determines how a data stream is created.

  • write() Determines how data is written to a stream.

  • read() Determines how data is read from the stream.

  • close() Determines how shutdown/destruction of the stream is handled.

  • flush() Ensures that stream data is in storage.

  • seek() Moves to an offset in the stream.

You can think of these operations as defining an interface. If a wrapper fully implements the interface, then the PHP standard I/O functions will know how to interact with it. To me, the streams interfaces is an incredible example of object-oriented programming techniques. By writing a small suite of functions corresponding to a specific API, you can make your protocols natively understood by PHP and leverage the entire PHP standard I/O function library.

As a simple example, this section describes an implementation of a streams wrapper around memory-mapped files. Memory-mapped files allow multiple processes to use a single file as a shared "scratch pad," and they provide a fast implementation of a temporary data store. The goal of the initial implementation is to allow code that looks like this:

<?php
$mm = mmap_open("/dev/zero", 65536);
fwrite($mm, "Hello World\n");
rewind($mm);
echo fgets($mm);
?>

You need to correctly open the device /dev/zero, map it with mmap(), and then be able to access it as a normal file.

Inside the php_stream data type is an attribute abstract.abstract is, as you'd guess, an abstract pointer that is used to hold any implementation-specific data about the stream. The first step in implementing the stream is to define an appropriate data type to represent the memory-mapped file. Because mmap() is passed a file descriptor and a fixed length, and it returns a memory address for accessing it, you minimally need to know the starting address for the memory segment and how long it is. Segments allocated with mmap() are always of a fixed length and must not be overrun. Streams also need to know their current position in a buffer (to support multiple reads, writes, and seeks), so you should also track the current position in the memory-mapped buffer. The structure mmap_stream_data contains these elements and can be the abstract stream data type in this example. It is shown here:

struct mmap_stream_data {
  void *base_pos;
  void *current_pos;
  int len;
};

Next, you need to implement the interface. You can start with the write interface. The write function is passed the following arguments:

  • php_stream *stream The stream.

  • char *buf The buffer to be read from.

  • size_t count The size of the buffer and the amount of data to be written.

The write function is expected to return the number of bytes successfully written. The following is the mmap implementation mmap_write():

size_t mmap_write(php_stream * stream, char *buf, size_t count TSRMLS_DC)
{
  int wrote;
  struct mmap_stream_data *data = stream->abstract;
  wrote = MIN(data->base_pos + data->len - data->current_pos, count);
  if(wrote == 0) {
    return 0;
  }
  memcpy(data->current_pos, buf, wrote);
  data->current_pos += wrote;
  return wrote;
}

Notice that you extract the mmap_stream_data structure directly from the stream's abstract element. Then you just ensure that the amount of data won't overwrite the buffer, perform the maximal write possible, and return the number of bytes.

mmap_read() is almost identical to mmap_write():

size_t mmap_read(php_stream *stream, char *buf, size_t count TSRMLS_DC)
{
  int to_read;
  struct mmap_stream_data *data = stream->abstract;
  to_read = MIN(data->base_pos + data->len - data->current_pos, count);
  if(to_read == 0) {
    return 0;
  }
  memcpy(buf, data->current_pos, to_read);
  data->current_pos += to_read;
  return to_read;
}

mmap_read() takes the same arguments as mmap_write(), but now the buffer is to be read into. mmap_read() returns the number of bytes read.

mmap_flush() is intended to make a stream-specific interpretation of the fsync() operation on files. It is shown here:

int mmap_flush(php_stream *stream TSRMLS_DC)
{
  struct mmap_stream_data *data = stream->abstract;
  return msync(data->base_pos, data->len, MS_SYNC | MS_INVALIDATE);
}

Any data that is potentially buffered should be flushed to its backing store. The mmap_flush() function accepts a single argumentthe php_stream pointer for the stream in questionand it returns 0 on success.

Next, you need to implement the seek functionality. The seek interface is adapted from the C function lseek(), so it accepts the following four parameters:

  • php_stream *stream The stream.

  • off_t offset The offset to seek to.

  • int whence Where the offset is from, either SEEK_SET, SEEK_CUR, or SEEK_END.

  • off_t *newoffset An out-variable specifying what the new offset is, in relationship to the start of the stream.

mmap_seek() is a bit longer than the other functions, mainly to handle the three whence settings. As usual, it checks whether the seek requested does not overrun or underrun the buffer, and it returns 0 on success and -1 on failure. Here is its implementation:

int mmap_seek(php_stream *stream, off_t offset, int whence,
              off_t *newoffset TSRMLS_DC)
{
  struct mmap_stream_data *data = stream->abstract;
  switch(whence) {
    case SEEK_SET:
      if(offset < 0 || offset > data->len) {
        *newoffset = (off_t) -1;
        return -1;
      }
      data->current_pos = data->base_pos + offset;
      *newoffset = offset;
      return 0;
      break;
    case SEEK_CUR:
      if(data->current_pos + offset < data->base_pos ||
         data->current_pos + offset > data->base_pos + data->len) {
        *newoffset = (off_t) -1;
        return -1;
      }
      data->current_pos += offset;
      *newoffset = data->current_pos - data->base_pos;
      return 0;
      break;
    case SEEK_END:
      if(offset > 0 || -1 * offset > data->len) {
        *newoffset = (off_t) -1;
        return -1;
      }
      data->current_pos += offset;
      *newoffset = data->current_pos - data->base_pos;
      return 0;
      break;
    default:
      *newoffset = (off_t) -1;
      return -1;
   }
 }

Finally is the close function, shown here:

int mmap_close(php_stream *stream, int close_handle TSRMLS_DC)
{
  struct mmap_stream_data *data = stream->abstract;

  if(close_handle) {
    munmap(data->base_pos, data->len);
  }
  efree(data);
  return 0;
}

The close function must close any open resources and free the mmap_stream_data pointer. Because streams may be closed both by automatic garbage collection and by user request, the close function may sometimes not be responsible for closing the actual resource. To account for this, it is passed not only the php_stream for the stream but an integer flag close_handle, which indicates whether the call to close the connection should be performed.

We have not yet covered opening this stream, but all of the stream's internal operations have been implemented, meaning that once you have an opener function, fread(), fgets(), fwrite(), and so on will all work as you have defined them to work.

To register a stream in the opener, you first need to create a php_stream_ops structure, which specifies the names of the hooks you just implemented. For the mmap stream, this looks as follows:

php_stream_ops mmap_ops = {
  mmap_write,    /* write */
  mmap_read,     /* read */
  mmap_close,    /* close */
  mmap_flush,    /* flush */
  "mmap stream", /* stream type name */
  mmap_seek,     /* seek */
  NULL,          /* cast */
  NULL,          /* stat */
  NULL          /* set option */
};

You have not implemented the cast(), stat(), and set() option hooks. These are defined in the streams API documentation but are not necessary for this wrapper.

Now that you have the interface defined, you can register it in a custom opener function. The following is the function mmap_open(), which takes a filename and a length, uses mmap on it, and returns a stream:

PHP_FUNCTION(mmap_open)
{
  char *filename;
  long filename_len;
  long file_length;
  int fd;
  php_stream * stream;
  void *mpos;

  struct mmap_stream_data *data;

  if(zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "sl",
                          &filename, &filename_len, &file_length) == FAILURE)
  {
    return;
  }
  if((fd = open(filename, O_RDWR)) < -1) {
    RETURN_FALSE;
  }
  if((mpos = mmap(NULL, file_length, PROT_READ|PROT_WRITE, MAP_PRIVATE, fd, 0))
       == (void *) -1) {
   close(fd);
   RETURN_FALSE;
  }
  data = emalloc(sizeof(struct mmap_stream_data));
  data->base_pos = mpos;
  data->current_pos = mpos;
  data->len = file_length;
  close(fd);
  stream = php_stream_alloc(&mmap_ops, data, NULL, "r+");
  php_stream_to_zval(stream, return_value);
 }

After performing all the lead-up work of calling open() and mmap() on the file, you allocate a mmap_stream_data structure, set its value, and then register it as a stream with the mmap implementation, like this:

stream = php_stream_alloc(&mmap_ops, data, NULL, "r+");

This creates a new stream with that abstract data container and registers the operations specified by mmap_ops.

With the extension loaded, you can now execute the following code:

<?php
$mm = mmap_open("/dev/zero", 1024);
fwrite($mm, "Hello World\n");
rewind($mm);
echo fgets($mm);
?>

At the beginning of this section, the following code opens a URL:

php_stream_open_wrapper("http://www.advanced-php.com","rb",REPORT_ERRORS,NULL);

You can also execute similar code from PHP:

$fp = fopen("http://www.advanced-php.com");

The streams subsystem is aware of HTTP and can thus automatically dispatch the open request to the appropriate stream wrapper. Registering such a wrapper is also available in extensions (and, in fact, in userspace PHP code). In this case, it would allow you to open an mmap file, via a mmap URL, like this:

<?php
$mm = fopen("mmap:///dev/zero:65536");
fwrite($mm, "Hello World\n");
rewind($mm);
echo fgets($mm);
?>

Implementing this on top of your existing interface is surprisingly simple. First, you need to create a php_stream_wrapper_ops struct. This structure defines the opener, closer, stream stat, URL stat, directory opener, and unlink actions. The php_stream_ops operations described earlier in this chapter all define operations on open streams. These operations all define operations on raw URLs/files that may or may not have been opened yet.

The following is a minimal wrapper to allow fopen():

php_stream_wrapper_ops mmap_wops = {
  mmap_open,
  NULL, NULL, NULL, NULL,
  "mmap wrapper"
};

Now that you have the wrapper operations defined, you need to define the wrapper itself. You do this with a php_stream_wrapper structure:

php_stream_wrapper mmap_wrapper = {
  &mmap_wops,  /* operations the wrapper can perform */
  NULL,            /* abstract context for the wrapper
  0                /* is this network url? (for fopen_url_allow) */
};

Then you need to define the mmap_open() function. This is not the same as the PHP_FUNCTION(mmap_open); it is a function that complies with the required interface for php_stream_wrapper_ops. It takes the following arguments:

Argument

Description

php_stream_wrapper *wrapper

The calling wrapper structure

char *filename

The URI/filename passed to fopen()

char *mode

The mode passed to fopen()

int options

Option flags passed to fopen()

char **opened_path

A buffer that may be passed in from the caller to hold the opened file's path.

php_stream_context *context

An external context you can pass in.


The mmap_open() function should return a php_stream pointer. mmap_open() looks very much like PHP_FUNCTION(mmap_open). These are some critical differences:

  • filename will be the complete URI, so you need to strip off the leading mmap://.

  • You also want to parse a size in the form mmap:///path:size. Alternatively, if a size is not passed, you should use stat() on the underlying file to get the desired length.

Here is the full code for mmap_open():

php_stream *mmap_open(php_stream_wrapper *wrapper, char *filename, char *mode,
                      int options, char **opened_path,
                      php_stream_context *context STREAMS_DC TSRMLS_DC)
{
  php_stream *stream;
  struct mmap_stream_data *data;
  char *tmp;
  int file_length = 0;
  struct stat sb;
  int fd;
  void *mpos;

  filename += sizeof("mmap://") - 1;
  if(tmp = strchr(filename, ':')) {
    /* null terminate where the ':' was and read the remainder as the length */
    tmp++;
    *tmp = '\0';
    if(tmp) {
      file_length = atoi(tmp);
    }
  }
  if((fd = open(filename, O_RDWR)) < -1) {
    return NULL;
  }
  if(!file_length) {
    if(fstat(fd, &sb) == -1) {
      close(fd);
      return NULL;
    }
    file_length = sb.st_size;
  }
  if((mpos = mmap(NULL, file_length, PROT_READ|PROT_WRITE, MAP_PRIVATE, fd, 0))
     == (void *) -1) {
    return NULL;
  }
  data = emalloc(sizeof(struct mmap_stream_data));
  data->base_pos = mpos;
  data->current_pos = mpos;
  data->len = file_length;
  close(fd);
  stream = php_stream_alloc(&mmap_ops, data, NULL, "mode");
  if(opened_path) {
    *opened_path = estrdup(filename);
  }
  return stream;
}

Now you only need to register this function with the engine. To do so, you add a registration hook to the MINIT function, as follows:

PHP_MINIT_FUNCTION(mmap_session)
{
  php_register_url_stream_wrapper("mmap", &mmap_wrapper TSRMLS_CC);
}

Here the first argument, "mmap", instructs the streams subsystem to dispatch to the wrapper any URLs with the protocol mmap. You also need to register a de-registration function for the wrapper in MSHUTDOWN:

PHP_MSHUTDOWN_FUNCTION(mmap_session)
{
  php_unregister_url_stream_wrapper("mmap" TSRMLS_CC);
}

This section provides only a brief treatment of the streams API. Another of its cool features is the ability to write stacked stream filters. These stream filters allow you to transparently modify data read from or written to a stream. PHP 5 features a number of stock stream filters, including the following:

  • Content compression

  • HTTP 1.1 chunked encoding/decoding

  • Streaming cryptographic ciphers via mcrypt

  • Whitespace folding

The streams API's ability to allow you to transparently affect all the internal I/O functions in PHP is extremely powerful. It is only beginning to be fully explored, but I expect some very ingenious uses of its capabilities over the coming years.


Previous
Table of Contents
Next