Приглашаем посетить
Романтизм (19v-euro-lit.niv.ru)

Design for Refactoring and Extensibility

Previous
Table of Contents
Next

Design for Refactoring and Extensibility

It is counterintuitive to many programmers that it is better to have poorly implemented code with a solid API design than to have well-implemented code with poor API design. It is a fact that your code will live on, be reused in other projects, and take on a life of its own. If your API design is good, then the code itself can always be refactored to improve its performance. In contrast, if the API design library is poor, any changes you make require cascading changes to all the code that uses it.

Writing code that is easy to refactor is central to having reusable and maintainable code. So how do you design code to be easily refactored? These are some of the keys:

  • Encapsulate logic in functions.

  • Keep classes and functions simple, using them as building blocks to create a cohesive whole.

  • Use namespacing techniques to compartmentalize your code.

  • Reduce interdependencies in your code.

Encapsulating Logic in Functions

A key way to increase code reusability and manageability is to compartmentalize logic in functions. To illustrate why this is necessary, consider the following story.

A storefront operation located in Maryland decides to start offering products online. Residents of Maryland have to pay state tax on items they purchase from the store (because they have a sales nexus there), so the code is peppered with code blocks like this:

$tax = ($user->state == 'MD') ? 0.05*$price : 0;

This is a one-linerhardly even more characters than passing all the data into a helper function.

Although originally tax is only calculated on the order page, over time it creeps into advertisements and specials pages, as a truth-in-advertising effort.

I'm sure you can see the writing on the wall. One of two things is bound to happen:

  • Maryland legislates a new tax rate.

  • The store decides to open a Pennsylvania branch and has to start charging sales tax to Pennsylvania residents as well.

When either of these things happens, the developer is forced into a mad rush to find all the places in the code where tax is calculated and change them to reflect the new rules. Missing a single location can have serious (even legal) repercussions.

This could all be avoided by encapsulating the tiny bit of tax logic into a function. Here is a simple example:

function Commerce_calculateStateTax($state, $price)
{
  switch($state) {
    case 'MD':
      return 0.05 * $price;
      break;
    case 'PA':
      return  0.06 * $price;
      break;
    default:
      return 0;
}

However, this solution is rather short-sighted as well: It assumes that tax is only based on the user's state location. In reality there are additional factors (such as tax-exempt status). A better solution is to create a function that takes an entire user record as its input, so that if special status needs to be realized, an API redesign won't be required. Here is a more general function that calculates taxes on a user's purchase:

function Commerce_caclulateTax(User $user, $price)
{
  return Commerce_calculateStateTax($user->state, $price);
}

Functions and Performance in PHP

As you read this book, or if you read performance tuning guides on the Web, you will read that calling functions in PHP is "slow." This means that there is overhead in calling functions. It is not a large overhead, but if you are trying to serve hundreds or thousands of pages per second, you can notice this effect, particularly when the function is called in a looping construct.

Does this mean that functions should be avoided? Absolutely not! Donald Knuth, one of the patriarchs of computer science, said "Premature optimization is the root of all evil." Optimizations and tunings often incur a maintainability cost. You should not force yourself to swallow this cost unless the trade-off is really worth it. Write your code to be as maintainable as possible. Encapsulate your logic in classes and functions. Make sure it is easily refactorable. When your project is working, analyze the efficiency of your code (using techniques described in Part IV, "Performance"), and refactor the parts that are unacceptably expensive.

Avoiding organizational techniques at an early stage guarantees that code is fast but is not extensible or maintainable.


Keeping Classes and Functions Simple

In general, an individual function or method should perform a single simple task. Simple functions are then used by other functions, which is how complex tasks are completed. This methodology is preferred over writing monolithic functions because it promotes reuse.

In the tax-calculation code example, notice how I split the routine into two functions: Commerce_calculateTax() and the helper function it called, Commerce_calculateStateTax(). Keeping the routine split out as such means that Commerce_calculateStateTax() can be used to calculate state taxes in any context. If its logic were inlined into Commmerce_calculateTax(), the code would have to be duplicated if you wanted to use it outside the context of calculating tax for a user purchase.

Namespacing

Namespacing is absolutely critical in any large code base. Unlike many other scripting languages (for example, Perl, Python, Ruby), PHP does not possess real namespaces or a formal packaging system. The absence of these built-in tools makes it all the more critical that you as a developer establish consistent namespacing conventions. Consider the following snippet of awful code:

$number = $_GET['number'];
$valid = validate($number);
if($valid) {
        //  ....
}

Looking at this code, it's impossible to guess what it might do. By looking into the loop (commented out here), some contextual clues could probably be gleaned, but the code still has a couple problems:

  • You don't know where these functions are defined. If they aren't in this page (and you should almost never put function definitions in a page, as it means they are not reusable), how do you know what library they are defined in?

  • The variable names are horrible. $number gives no contextual clues as to the purpose of the variable, and $valid is not much better.

Here is the same code with an improved naming scheme:

$cc_number = $_GET['cc_number'];
$cc_is_valid = CreditCard_IsValidCCNumber($cc_number);
if($cc_is_valid) {
  // ...
}

This code is much better than the earlier code. $cc_number indicates that the number is a credit card number, and the function name CreditCard_IsValidCCNumber() tells you where the function is (CreditCard.inc, in my naming scheme) and what it does (determines whether the credit card number is valid).

Using namespacing provides the following benefits:

  • It encourages descriptive naming of functions.

  • It provides a way to find the physical location of a function based on its name.

  • It helps avoid naming conflicts. You can authenticate many things: site members, administrative users, and credit cards, for instance. Member_Authenticate(), Admin_User_Authenticate(), and CreditCard_Authenticate() make it clear what you mean.

Although PHP does not provide a formal namespacing language construct, you can use classes to emulate namespaces, as in the following example:

class CreditCard {
  static public function IsValidCCNumber()
  {
    // ...
  }
  static public function Authenticate()
  {
    // ...
  }
}

Whether you choose a pure function approach or a namespace-emulating class approach, you should always have a well-defined mapping of namespace names to file locations. My preference is to append .inc. This creates a natural filesystem hierarchy, like this:

API_ROOT/
        CreditCard.inc        DB.inc
        DB/
           Mysql.inc
           Oracle.inc
        ...

In this representation, the DB_Mysql classes are in API_ROOT/DB/Mysql.inc.

Deep include Trees

A serious conflict between writing modular code and writing fast code in PHP is the handling of include files. PHP is a fully runtime language, meaning that both compilation and execution of scripts happen at compile time. If you include 50 files in a script (whether directly or through nested inclusion), those are 50 files that will need to be opened, read, parsed, compiled, and executed on every request. That can be quite an overhead. Even if you use a compiler cache (see Chapter 9, "External Performance Tunings"), the file must still be accessed on every request to ensure that it has not been changed since the cached copy was stored. In an environment where you are serving tens or hundreds of pages per second, this can be a serious problem.

There are a range of opinions regarding how many files are reasonable to include on a given page. Some people have suggested that three is the right number (although no explanation of the logic behind that has ever been produced); others suggest inlining all the includes before moving from development to production. I think both these views are misguided. While having hundreds of includes per page is ridiculous, being able to separate code into files is an important management tool. Code is pretty useless unless it is manageable, and very rarely are the costs of includes a serious bottleneck.

You should write your code first to be maintainable and reusable. If this means 10 or 20 included files per page, then so be it. When you need to make the code faster, profile it, using the techniques in Chapter 18, "Profiling." Only when profiling shows you that a significant bottleneck exists in the use of include() and require() should you purposefully trim your include TRee.


Reducing Coupling

Coupling occurs when one function, class, or code entity depends on another to function correctly. Coupling is bad because it creates a Web of dependencies between what should be disparate pieces of code.

Consider Figure 8.1, which shows a partial function call graph for the Serendipity Web log system. (The full call graph is too complicated to display here.) Notice in particular the nodes which have a large number of edges coming into them. These functions are considered highly coupled and by necessity are almost impossible to alter; any change to that function's API or behavior could potentially require changes in every caller.

Figure 8.1. A partial call graph for the Serendipity Web log system.

Design for Refactoring and Extensibility


This is not necessarily a bad thing. In any system, there must be base functions and classes that are stable elements on which the rest of the system is built. You need to be conscious of the causality: Stable code is not necessarily highly coupled, but highly coupled code must be stable. If you have classes that you know will be core or foundation classes (for example, database abstraction layers or classes that describe core functionality), make sure you invest time in getting their APIs right early, before you have so much code referencing them that a redesign is impossible.


Previous
Table of Contents
Next