Design for Refactoring and ExtensibilityIt is counterintuitive to many programmers that it is better to have poorly implemented code with a solid API design than to have well-implemented code with poor API design. It is a fact that your code will live on, be reused in other projects, and take on a life of its own. If your API design is good, then the code itself can always be refactored to improve its performance. In contrast, if the API design library is poor, any changes you make require cascading changes to all the code that uses it. Writing code that is easy to refactor is central to having reusable and maintainable code. So how do you design code to be easily refactored? These are some of the keys:
Encapsulating Logic in FunctionsA key way to increase code reusability and manageability is to compartmentalize logic in functions. To illustrate why this is necessary, consider the following story. A storefront operation located in Maryland decides to start offering products online. Residents of Maryland have to pay state tax on items they purchase from the store (because they have a sales nexus there), so the code is peppered with code blocks like this: $tax = ($user->state == 'MD') ? 0.05*$price : 0; This is a one-linerhardly even more characters than passing all the data into a helper function. Although originally tax is only calculated on the order page, over time it creeps into advertisements and specials pages, as a truth-in-advertising effort. I'm sure you can see the writing on the wall. One of two things is bound to happen:
When either of these things happens, the developer is forced into a mad rush to find all the places in the code where tax is calculated and change them to reflect the new rules. Missing a single location can have serious (even legal) repercussions. This could all be avoided by encapsulating the tiny bit of tax logic into a function. Here is a simple example: function Commerce_calculateStateTax($state, $price) { switch($state) { case 'MD': return 0.05 * $price; break; case 'PA': return 0.06 * $price; break; default: return 0; } However, this solution is rather short-sighted as well: It assumes that tax is only based on the user's state location. In reality there are additional factors (such as tax-exempt status). A better solution is to create a function that takes an entire user record as its input, so that if special status needs to be realized, an API redesign won't be required. Here is a more general function that calculates taxes on a user's purchase: function Commerce_caclulateTax(User $user, $price) { return Commerce_calculateStateTax($user->state, $price); }
Keeping Classes and Functions SimpleIn general, an individual function or method should perform a single simple task. Simple functions are then used by other functions, which is how complex tasks are completed. This methodology is preferred over writing monolithic functions because it promotes reuse. In the tax-calculation code example, notice how I split the routine into two functions: Commerce_calculateTax() and the helper function it called, Commerce_calculateStateTax(). Keeping the routine split out as such means that Commerce_calculateStateTax() can be used to calculate state taxes in any context. If its logic were inlined into Commmerce_calculateTax(), the code would have to be duplicated if you wanted to use it outside the context of calculating tax for a user purchase. NamespacingNamespacing is absolutely critical in any large code base. Unlike many other scripting languages (for example, Perl, Python, Ruby), PHP does not possess real namespaces or a formal packaging system. The absence of these built-in tools makes it all the more critical that you as a developer establish consistent namespacing conventions. Consider the following snippet of awful code: $number = $_GET['number']; $valid = validate($number); if($valid) { // .... } Looking at this code, it's impossible to guess what it might do. By looking into the loop (commented out here), some contextual clues could probably be gleaned, but the code still has a couple problems:
Here is the same code with an improved naming scheme: $cc_number = $_GET['cc_number']; $cc_is_valid = CreditCard_IsValidCCNumber($cc_number); if($cc_is_valid) { // ... } This code is much better than the earlier code. $cc_number indicates that the number is a credit card number, and the function name CreditCard_IsValidCCNumber() tells you where the function is (CreditCard.inc, in my naming scheme) and what it does (determines whether the credit card number is valid). Using namespacing provides the following benefits:
Although PHP does not provide a formal namespacing language construct, you can use classes to emulate namespaces, as in the following example: class CreditCard { static public function IsValidCCNumber() { // ... } static public function Authenticate() { // ... } } Whether you choose a pure function approach or a namespace-emulating class approach, you should always have a well-defined mapping of namespace names to file locations. My preference is to append .inc. This creates a natural filesystem hierarchy, like this: API_ROOT/ CreditCard.inc DB.inc DB/ Mysql.inc Oracle.inc ... In this representation, the DB_Mysql classes are in API_ROOT/DB/Mysql.inc.
Reducing CouplingCoupling occurs when one function, class, or code entity depends on another to function correctly. Coupling is bad because it creates a Web of dependencies between what should be disparate pieces of code. Consider Figure 8.1, which shows a partial function call graph for the Serendipity Web log system. (The full call graph is too complicated to display here.) Notice in particular the nodes which have a large number of edges coming into them. These functions are considered highly coupled and by necessity are almost impossible to alter; any change to that function's API or behavior could potentially require changes in every caller. Figure 8.1. A partial call graph for the Serendipity Web log system.This is not necessarily a bad thing. In any system, there must be base functions and classes that are stable elements on which the rest of the system is built. You need to be conscious of the causality: Stable code is not necessarily highly coupled, but highly coupled code must be stable. If you have classes that you know will be core or foundation classes (for example, database abstraction layers or classes that describe core functionality), make sure you invest time in getting their APIs right early, before you have so much code referencing them that a redesign is impossible. |