Ïðèãëàøàåì ïîñåòèòü
ßçûêîâ (yazykov.lit-info.ru)

Modifying and Introspecting the Zend Engine

Previous
Table of Contents
Next

Modifying and Introspecting the Zend Engine

One of the most exciting design aspects of the Zend Engine is that its behavior is open to extension and modification. As discussed in Chapter 20, there are two ways to modify Zend Engine behavior: by using alterable function pointers and by using the Zend extension API.

Ironically, modification of engine-internal function pointers is not only the most effective way of making many changes, but it can also be done in regular PHP extensions. As a reminder, these are the four major function pointers used inside the Zend Engine:

  • zend_compile_file() zend_compile_file() is the wrapper for the lexer, parser, and code generator. It compiles a file and returns a zend_op_array.

  • zend_execute() After a file is compiled, its zend_op_array is executed by zend_execute(). There is also a companion zend_execute_internal() function, which executes internal functions.

  • zend_error_cb This function is called when any error is generated in PHP.

  • zend_fopen This function implements the open call that is used internally whenever a file needs to be opened.

The following sections present four different engine modifications that use function pointer reassignment. Then a brief section covers parts of the Zend Engine extension API.

Warnings as Exceptions

A much-requested feature that is likely to never appear in a default PHP build is the ability to automatically throw exceptions on E_WARNING class errors. This feature allows object orientation fans to convert all their error checking into exception-based checking.

The reason this feature will never get implemented as an INI-toggleable value is that it makes it nearly impossible to write portable code. If E_WARNING is a nonfatal error on some systems and requires a TRy{}/catch{} block in other configurations, you have a nightmare on your hands if you distribute code.

It's a neat feature, though, and by overloading zend_error_cb, you can easily implement it as an extension. The idea is to reset zend_error_cb to a function that throws exceptions instead.

First, you need an extension framework. Here is the base code:

#ifdef HAVE_CONFIG_H
#include "config.h"
#endif

#include "php.h"
#include "php_ini.h"
#include "ext/standard/info.h"
#include "zend.h"
#include "zend_default_classes.h"

ZEND_BEGIN_MODULE_GLOBALS(warn_as_except)
  ZEND_API void (*old_error_cb)(int type, const char *error_filename,
                                const uint error_lineno, const char *format,
                                va_list args);
ZEND_END_MODULE_GLOBALS(warn_as_except)
ZEND_DECLARE_MODULE_GLOBALS(warn_as_except)

#ifdef ZTS
#define EEG(v) TSRMG(warn_as_except_globals_id,zend_warn_as_except_globals *,v)
#else
#define EEG(v) (warn_as_except_globals.v)
#endif

void exception_error_cb(int type, const char *error_filename,
                        const uint error_lineno, const char *format,
                        va_list args);

PHP_MINIT_FUNCTION(warn_as_except)
{
  EEG(old_error_cb) = zend_error_cb;
  zend_error_cb = exception_error_cb;
  return SUCCESS;
}

PHP_MSHUTDOWN_FUNCTION(warn_as_except)
{
  return SUCCESS;
}

PHP_MINFO_FUNCTION(warn_as_except)
{
}

function_entry no_functions[] = { {NULL, NULL, NULL} };

zend_module_entry warn_as_except_module_entry = {
  STANDARD_MODULE_HEADER,
  "warn_as_except",
  no_functions,
  PHP_MINIT(warn_as_except),
  PHP_MSHUTDOWN(warn_as_except),
  NULL,
  NULL,
  PHP_MINFO(warn_as_except),
  "1.0",
  STANDARD_MODULE_PROPERTIES
};

#ifdef COMPILE_DL_WARN_AS_EXCEPT
ZEND_GET_MODULE(warn_as_except)
#endif

All the work happens in PHP_MINIT_FUNCTION(warn_as_except). There the old error callback is stored in old_error_cb, and zend_error_cb is set to the new error function exception_error_cb. You learned how to throw exceptions in C code in Chapter 22, "Extending PHP: Part II," so the code for exception_error_cb should look familiar. Here it is:

void exception_error_cb(int type, const char *error_filename,
                        const uint error_lineno, const char *format,
                        va_list args)
{
  char *buffer;
  int buffer_len;
  TSRMLS_FETCH();

  if(type == E_WARNING || type == E_USER_WARNING) {
    buffer_len = vspprintf(&buffer, PG(log_errors_max_len), format, args);
    zend_throw_exception(zend_exception_get_default(), buffer, type);
    free(buffer);
  }
  else {
    EEG(old_error_cb)(type, error_filename, error_lineno, format, args);
  }
  return;
}

If you compile and load this extension, the following script:

<?php
try {
  trigger_error("Testing Exception", E_USER_WARNING);
}
catch(Exception $e) {
  print "Caught this error\n";
}
?>

yields the following output:

> php test.php
Caught this error

An Opcode Dumper

Chapter 20 uses an opcode dumper to dump the Zend Engine intermediate code into human-readable assembly language. In this section you will see how to write it. The idea is to capture the zend_op_array returned from zend_compile_file() and format it. You could write an extension function to parse a file and dump the output, but it would be more clever to write a standalone application using the embed SAPI.

You learned in Chapter 20 that a zend_op_array contains an array of zend_ops in this form:

struct _zend_op {
  opcode_handler_t handler;
  znode result;
  znode op1;
  znode op2;
  ulong extended_value;
  uint lineno;
  zend_uchar opcode;
};

To break these down into assembly language, you need to identify the name of the operation associated with the opcode and then dump the contents of the znodes op1, op2, and result.

The mapping from ocode to operation name must be performed by hand. In zend_compile.h in the Zend source tree is a set of defines that lists all the operations. It is simple to write a script that parses them all into a function. Here's an example of such a function:

char *opname(zend_uchar opcode)
{
  switch(opcode) {
    case ZEND_NOP: return "ZEND_NOP"; break;
    case ZEND_ADD: return "ZEND_ADD"; break;
    case ZEND_SUB: return "ZEND_SUB"; break;
    case ZEND_MUL: return "ZEND_MUL"; break;
    case ZEND_DIV: return "ZEND_DIV"; break;
    case ZEND_MOD: return "ZEND_MOD"; break;
    /* ... */
    default: return "UNKNOWN"; break;
  }
}

Then you need functions to dump the znodes and their zvals. Here's an example:

#define BUFFER_LEN 40

char *format_zval(zval *z)
{

  static char buffer[BUFFER_LEN];
  int len;

  switch(z->type) {
    case IS_NULL:
      return "NULL";
    case IS_LONG:
    case IS_BOOL:
      snprintf(buffer, BUFFER_LEN, "%d", z->value.lval);
      return buffer;
    case IS_DOUBLE:
      snprintf(buffer, BUFFER_LEN, "%f", z->value.dval);
      return buffer;
    case IS_STRING:
      snprintf(buffer, BUFFER_LEN, "\"%s\"",
        php_url_encode(z->value.str.val, z->value.str.len, &len));
      return buffer;
    case IS_ARRAY:
    case IS_OBJECT:
    case IS_RESOURCE:
    case IS_CONSTANT:
    case IS_CONSTANT_ARRAY:
      return "";
    default:
      return "unknown";
   }
 }

 char *format_znode(znode *n)
 {
   static char buffer[BUFFER_LEN];

     switch (n->op_type) {
       case IS_CONST:
     return format_zval(&n->u.constant);
     break;
       case IS_VAR:
     snprintf(buffer, BUFFER_LEN, "$%d",  n->u.var/sizeof(temp_variable));
     return buffer;
     break;
       case IS_TMP_VAR:
     snprintf(buffer, BUFFER_LEN, "~%d",  n->u.var/sizeof(temp_variable));
     return buffer;
     break;
       default:
       return "";
         break;
     }
}

In the format_zval, you can safely ignore the array, object, and constant types because they do not appear in znodes. To wrap these helper functions all together, here is a function to dump the entire zend_op:

void dump_op(zend_op *op, int num)
{
  printf("%5d  %5d %30s %040s %040s %040s\n", num, op->lineno,
    opname(op->opcode),
    format_znode(&op->op1),
    format_znode(&op->op2),
    format_znode(&op->result)) ;
}

Then you need a function to iterate through a zend_op_array and dump the opcodes in order, as shown here:

void dump_op_array(zend_op_array *op_array)
{
  if(op_array) {
    int i;
    printf("%5s  %5s %30s %040s %040s %040s\n", "opnum", "line",
      "opcode", "op1", "op2", "result");
    for(i = 0; i < op_array->last; i++) {
      dump_op(&op_array->opcodes[i], i);
    }
  }
}

Finally, you tie them all together with a main() routine that compiles the script in question and dumps its contents. Here is a routine that does that:

int main(int argc, char **argv)
{
  zend_op_array *op_array;
  zend_file_handle file_handle;

  if(argc != 2) {
    printf("usage:  op_dumper <script>\n");
    return 1;
  }
  PHP_EMBED_START_BLOCK(argc,argv);
  printf("Script: %s\n", argv[1]);
  file_handle.filename = argv[1];
  file_handle.free_filename = 0;
  file_handle.type = ZEND_HANDLE_FILENAME;
  file_handle.opened_path = NULL;
  op_array =  zend_compile_file(&file_handle, ZEND_INCLUDE TSRMLS_CC);
  if(!op_array) {
    printf("Error parsing script: %s\n", file_handle.filename);
    return 1;
  }
  dump_op_array((void *) op_array);
  PHP_EMBED_END_BLOCK();
  return 0;
}

When you compile this as you did psh earlier in this chapter, you can generate full opcode dumps for scripts.

APD

In Chapter 18, "Profiling," you learned how to use APD for profiling PHP code. APD is a Zend extension that wraps zend_execute() to provide timings around function calls.

In its MINIT section, APD overrides both zend_execute() and zend_execute_internal() and replaces them with its own apd_execute() and apd_execute_internal(). Here is APD's initialization function:

PHP_MINIT_FUNCTION(apd)
{
  ZEND_INIT_MODULE_GLOBALS(apd, php_apd_init_globals, php_apd_free_globals);
  old_execute = zend_execute;
  zend_execute = apd_execute;
  zend_execute_internal = apd_execute_internal;
  return SUCCESS;
}

apd_execute() and apd_execute_internal() both record the name, location, and time of the function being called. Then they use the saved execution functions to complete execution. Here is the code for both of these functions:

ZEND_API void apd_execute(zend_op_array *op_array TSRMLS_DC)
{
  char *fname = NULL;
  fname = apd_get_active_function_name(op_array TSRMLS_CC);
  trace_function_entry(fname, ZEND_USER_FUNCTION,
            zend_get_executed_filename(TSRMLS_C),
            zend_get_executed_lineno(TSRMLS_C));
  old_execute(op_array TSRMLS_CC);
  trace_function_exit(fname);
  efree(fname);
}

ZEND_API void apd_execute_internal(zend_execute_data *execute_data_ptr,
                                   int return_value_used TSRMLS_DC)
{
  char *fname = NULL;
  fname =
    apd_get_active_function_name(EG(current_execute_data)->op_array TSRMLS_CC);
  trace_function_entry(fname, ZEND_INTERNAL_FUNCTION,
                       zend_get_executed_filename(TSRMLS_C),
                       zend_get_executed_lineno(TSRMLS_C));
  execute_internal(execute_data_ptr, return_value_used TSRMLS_CC);
  trace_function_exit(fname);
  efree(fname);
}

Both of these functions perform the same core logic. First, they use the helper function apd_get_active_function_name() to identify the name of the executing function. Next, the APD function trace_function_entry() is called. This function calls APD's logging mechanism to record entry into the function, including the file and line number the function call occurred on.

Next, APD uses PHP's default execution function to call the passed function. After the function call completes and the execution call returns, APD calls trace_function_exit(). This uses APD's logging mechanism to record the function call exit. In addition, this method records the elapsed time since the last function call, which is how APD compiles the information necessary for profiling.

You now know the heart of the APD extension. As they say, everything else is just the details.

APC

APC follows the same pattern as APD but is a bit more complex. The core functionality in APC is overriding zend_compile_file() with an alternative that can remap, store, and retrieve the resulting zend_op_array in a shared memory cache.

Using Zend Extension Callbacks

A Zend extension is similar to a regular extension except that it implements the following defining struct:

struct _zend_extension {
        char *name;
        char *version;
        char *author;
        char *URL;
        char *copyright;
        startup_func_t startup;
        shutdown_func_t shutdown;
        activate_func_t activate;
        deactivate_func_t deactivate;
        message_handler_func_t message_handler;
        op_array_handler_func_t op_array_handler;
        statement_handler_func_t statement_handler;
        fcall_begin_handler_func_t fcall_begin_handler;
        fcall_end_handler_func_t fcall_end_handler;
        op_array_ctor_func_t op_array_ctor;
        op_array_dtor_func_t op_array_dtor;
        int (*api_no_check)(int api_no);
        void *reserved2;
        void *reserved3;
        void *reserved4;
        void *reserved5;
        void *reserved6;
        void *reserved7;
        void *reserved8;
        DL_HANDLE handle;
        int resource_number;
};

The startup, shutdown, activate, and deactivate functions behave identically to the MINIT, MSHUTDOWN, RINIT, and RSHUTDOWN functions. If a handler of a given type is registered at script compile time, the engine inserts extra opcodes at appropriate places and then calls out to the handler when those opcodes are reached during execution.

Of all the Zend Extension callbacks, the one that is by far the most useful is the statement handler. The statement handler callback inserts an additional opcode at the end of every statement in a script in which the callback is called. One of the primary uses for this sort of callback is to implement per-line profiling, stepping debuggers, or code-coverage utilities. All these applications require information to be collected and acted on in every statement that PHP executes.

The following statement handler prints the filename and line number of every executed statement in a script to stderr:

void statement_handler(zend_op_array *op_array)
{
  fprintf(stderr, "%s:%d\n", zend_get_executed_filename(TSRMLS_C),
          zend_get_executed_lineno(TSRMLS_C));
}

To then register it, you wrap it in this framework:

#ifdef HAVE_CONFIG_H
#include "config.h"
#endif

#include "php.h"
#include "php_ini.h"
#include "ext/standard/info.h"
#include "zend.h"
#include "zend_extensions.h"
void statement_handler(zend_op_array *op_array)
{
  fprintf(stderr, "%s:%d\n", zend_get_executed_filename(TSRMLS_C),
          zend_get_executed_lineno(TSRMLS_C));
}

int call_coverage_zend_startup(zend_extension *extension)
{
  TSRMLS_FETCH();
  CG(extended_info) = 1;
  return SUCCESS;
}
#ifndef ZEND_EXT_API
#define ZEND_EXT_API  ZEND_DLEXPORT
#endif
ZEND_EXTENSION();

ZEND_DLEXPORT zend_extension zend_extension_entry = {
  "Simple Call Coverage",
  "1.0",
  "George Schlossnagle",
  "http://www.schlossnagle.org/~george",
  "",
  call_coverage_zend_startup,
  NULL,
  NULL,
  NULL,
  NULL,   // message_handler_func_t
  NULL,   // op_array_handler_func_t
  statement_handler,   // statement_handler_func_t
  NULL,   // fcall_begin_handler_func_t
  NULL,   // fcall_end_handler_func_t
  NULL,   // op_array_ctor_func_t
  NULL,   // op_array_dtor_func_t
  STANDARD_ZEND_EXTENSION_PROPERTIES
};

You compile it as you would a regular PHP extension. Note the startup function, which sets CG(extended_info). Without that set, the engine does not generate the extended opcodes necessary for the handlers to work.

Then you register the extension in the php.ini file, as follows:

zend_extension=/full/path/to/call_coverage.so

Now if you execute the following script:

<?php
$test = 1;
if($test) {
  $counter++;
}
else {
  $counter--;
}
?>

you get the following output:

/Users/george/Advanced_PHP/examples/chapter-23/call_coverage/test.php:2
/Users/george/Advanced_PHP/examples/chapter-23/call_coverage/test.php:3
/Users/george/Advanced_PHP/examples/chapter-23/call_coverage/test.php:4
/Users/george/Advanced_PHP/examples/chapter-23/call_coverage/test.php:10


Previous
Table of Contents
Next