Previous | Table of Contents | Next |
8.2 Issues in Writing Data to DatabasesIn this section, we discuss issues that emerge in web database applications when multiple users access an application. Typically, a few users are inserting, updating, or deleting data, while most are running queries. This environment requires careful code design: without it, data can unexpectedly or unreliably change. This may lead to database inconsistencies and confused users. Some of the problems we describe in this section can be solved with restrictive system requirements, knowledge of how the DBMS behaves, and careful script development. Others solutions require an understanding of database theory. We discuss both types of solution in the next section. 8.2.1 Transactions and ConcurrencyProblems can occur when users read and write to a web database at the same time, that is, concurrently. The management of groups of SQL statements that read and write, or transactions , is one important area of the theory and practice of relational databases. Here are four of the more common problems of concurrent read and write transactions:
Fortunately, most of these problems can be solved through locking or careful design of scripts that carry out database transactions. However, you might choose not to solve some problems because they restrict the system requirements or add unnecessary complexity. We discuss locking in the next section. 8.2.2 Locking to Achieve Concurrency in MySQLIt has been shown that a simple scheme called locking (actually, it's correctly known as two-phase locking ) solves the four transaction problems identified in the last section. 8.2.2.1 When and how to lock tablesLocking is needed only when multiple steps must be performed together, and when two or more operations can be going on at the same time. If scripts are being implemented that write to the database but aren't multi-step operations susceptible to the problems described in the previous section, locks aren't needed. Specifically, the following situations do not require a lock:
The following situations do require locks:
Locking may not be required for all parts of a web database application: parts of the application can still be safely used without violating any locking conditions. With its default settings, each MySQL table has two associated lock variables. If a user sets or holds a lock variable for a particular table, no other user can perform particular actions on that table. There are two kinds of locks for each table: read locks, when a user is only reading from a table, and write locks, when a user is both reading and writing to a table. Having locks in a DBMS leads to four rules of use:
When a user holds a write lock on a table, no other users can read or write to that table. When a user holds a read lock on a table, other users can also read or hold a read lock, but no user can hold a write lock on that table, or write to that table.
The following segment of an interaction with the MySQL command interpreter illustrates the use of locks in a summarization task that requires locking: mysql> LOCK TABLES items READ, temp_report WRITE; mysql> SELECT sum(price) FROM items WHERE cust_id=1; +------------+ | sum(price) | +------------+ | 438.65 | +------------+ 1 row in set (0.04 sec) mysql> UPDATE temp_report SET purchases=438.65 WHERE cust_id=1; mysql> UNLOCK TABLES; In this example, a temporary table called temp_report is updated with the result of a SELECT operation on an items table. If locks aren't used, the items table can be modified by another user, possibly altering the summary value of $438.65 used as input to the UPDATE operation. There are two locks obtained for this transaction: first, a read lock on items because we don't need to change items but we don't want another user to make a change to it; and, second, a write lock on temp_report because we want to change the table, and we don't want other users to read or write to the report while we make changes. The UNLOCK TABLES operation releases all locks held; locks can't be progressively released. MySQL doesn't permit us to lock only one of the two tables used in the transaction above. The following rules apply to locks:
In many cases, locking can be avoided through careful query design:
8.2.2.2 The LOCK TABLES and UNLOCK TABLES statements in MySQLThe LOCK TABLES statement is used to lock the listed tables in either READ or WRITE mode. As discussed earlier, all tables that are accessed in the transaction must be locked in either READ or WRITE mode, and must be listed in a single LOCK TABLES statement. A script that issues a LOCK TABLES statement is suspended until all locks listed are successfully obtained. There is no time limit in waiting for locks. If the lock is held by another user or an operation is running on the table already, a request is placed at the back of either the write- or read-lock queue for the table, depending on the lock required. The write-lock queue has priority over the read-lock queue, so a user who wants a write lock obtains it when it becomes available, regardless of how long another user has been waiting in the read-lock queue. MySQL gives priority to database modifications over read queries. This can lead to a problem called starvation , where a transaction never completes because it can't obtain its required read locks. However, most web database applications read from databases much more than they write, and locks are required in only a few situations, so starvation is very uncommon in practice. If low-priority writing is essential to an application, a LOW_PRIORITY option can be prefixed before the WRITE clause. If a transaction is queued for a LOW_PRIORITY WRITE , it receives the lock only when the read lock queue is empty and no other users are reading from the table. Again, consideration of possible starvation is important. Locks can't be progressively obtained through several LOCK TABLES statements. Indeed, issuing a second LOCK TABLES is the same as issuing an UNLOCK TABLES to release all locks and then issuing the second LOCK TABLES. There are good reasons for this strict rule, related to a locking problem called deadlock, which we don't discuss here. However, MySQL is deadlock-free because it enforces the risk-free use of the LOCK TABLES and UNLOCK TABLES statements. If an unlocked table needs to be accessed or locking must be avoided for a particular table, a second server connection can be opened and used. 8.2.2.3 Locking for performanceLocking is primarily designed to ensure that concurrent transactions can execute safely. However, locking is also a useful performance tool to optimize the performance of important transactions. Consider, for example, a situation where we urgently require a complex report that uses a slow query. With other users running queries and using system resources, this query may run even slower. A solution is to use LOCK TABLES with the WRITE option to stop other users running queries or database updates, and to have exclusive access to the database for the query duration. This permits better optimization of the query processing by the server, dedication of all the system resources to the query, and faster disk access. The downside of locking for performance is the reduction in concurrent access to the database. Users may be inconvenienced by slow responses or timeouts from the web database application. Locking for performance should be used sparingly. 8.2.3 Locking Tables in Web Database ApplicationsExample 8-13 shows a PHP script that requires locking to ensure that the value returned from the SELECT query can't change before the INSERT operation. The script adds a row to the phonebook table and does exactly same thing as Example 8-9. However, it doesn't use the MySQL proprietary auto_increment modifier and so it needs to read the maximum primary key value that's in use and then write a new row based on that value. Without the auto_increment modifier and with no locking, it's possible that two rows could be created with the same phonebook_id. This can happen if two or more users run the script at the same time and get the same result from the SELECT query. Both users would then attempt to INSERT a new row with the same primary key value and, if this happens, MySQL will report an error because the primary key value must be unique. Locking solves the problem because it stops users running the queries in the script at the same time. Example 8-13. Creating a phonebook entry using locking<?php require 'db.inc'; require_once "HTML/Template/ITX.php"; function formerror(&$template, $message, &$errors) { $errors = true; $template->setCurrentBlock("error"); $template->setVariable("ERROR", $message); $template->parseCurrentBlock("error"); } if (!($connection = @ mysql_connect("localhost", "fred", "shhh"))) die("Could not connect to database"); $firstname = mysqlclean($_POST, "firstname", 50, $connection); $surname = mysqlclean($_POST, "surname", 50, $connection); $phone = mysqlclean($_POST, "phone", 20, $connection); $template = new HTML_Template_ITX("./templates"); $template->loadTemplatefile("example.8-10.tpl", true, true); $errors = false; if (empty($firstname)) formerror($template, "The first name field cannot be blank.", $errors); if (empty($surname)) formerror($template, "The surname field cannot be blank.", $errors); if (empty($phone)) formerror($template, "The phone field cannot be blank", $errors); // Now the script has finished the validation, show any errors if ($errors) { $template->show( ); exit; } // If we made it here, then the data is valid if (!mysql_select_db("telephone", $connection)) showerror( ); // Lock the table $query = "LOCK TABLES phonebook WRITE"; if (!(@ mysql_query ($query, $connection))) showerror( ); // Find the maximum phonebook_id value that's in use $query = "SELECT max(phonebook_id) FROM phonebook"; if (!($result = @ mysql_query ($query, $connection))) showerror( ); $row = @ mysql_fetch_array($result); // Set the new value for the primary key $phonebook_id = $row["max(phonebook_id)"] + 1; // Insert the new phonebook entry $query = "INSERT INTO phonebook VALUES ({$phonebook_id}, '{$surname}', '{$firstname}', '{$phone}')"; if (!(@ mysql_query ($query, $connection))) showerror( ); // Unlock the table $query = "UNLOCK TABLES"; if (!(@ mysql_query ($query, $connection))) showerror( ); // Show the phonebook receipt header("Location: example.8-5.php?status=T&phonebook_id={$phonebook_id}"); ?> The locking of the phonebook table is performed before the SELECT query, and the UNLOCK TABLES statement is issued after the INSERT. As you can see, the lock and unlock statements are executed just like any other query using mysql_query( ). 8.2.3.1 Locking methods that don't work in web database applicationsThere are several locking paradigms that don't work in a web database application because of the statelessness of HTTP. Each approach fails because there is either no guarantee or no possibility that the locked tables will be unlocked. If tables are locked indefinitely, other transactions can't proceed, and the DBMS will most likely need to be shut down and restarted.
The following must be avoided in web database applications:
8.2.3.2 Locking with an auxiliary tableLocking limits concurrency in your web database application. If tables are locked, then other users won't be able to run the same script at the same time and other scripts may also not be able to proceed. For example, suppose you write lock the phonebook table we've used in our examples throughout this chapter. With the table locked, any other query on the phonebook table in any script won't proceed until you unlock the table; this means, for example, while you insert one row, no other users can search for a phone number. Sometimes, you want to avoid this and this section shows you how. One technique you can use to minimize locking of your frequently used tables is to add an additional table to the database. This additional table stores and manages the next available primary key values for all other tables in the database. The additional table is then locked, queried, updated, and unlocked each time a new primary key value is needed; the main tables in the database are then never locked when data is inserted. In the remainder of this section, we show you how to do this using the MySQL function library; the next section shows you how to do the same thing using PEAR DB. Let's consider an example. Suppose you want to add new rows to the phonebook table without locking it and without using the proprietary MySQL auto_increment modifier. You first create an additional table in the telephone database using the following CREATE TABLE statement: CREATE TABLE identifiers (phonebook_id int(5)); As we show you next, this table only contains one row and therefore there's no need to declare or use a primary key. The new identifiers table stores one row that contains the next available value of the phonebook_id primary key attribute from the phonebook table. To set this up, you add the row to the table and set the phonebook_id attribute to the next available value. Let's suppose your phonebook table is empty, and so the next primary key value for phonebook_id is 1. Here's the INSERT statement you use to set up the table: INSERT INTO identifiers VALUES (1); Now you can use the identifiers table to read and write a primary key value for the phonebook table. Having done this, you use the primary key value to create a new row without locking the phonebook table. Here's how you do it using the MySQL command interpreter: mysql> LOCK TABLES identifiers WRITE; Query OK, 0 rows affected (0.00 sec) mysql> SELECT phonebook_id FROM identifiers; +--------------+ | phonebook_id | +--------------+ | 1 | +--------------+ 1 row in set (0.00 sec) mysql> UPDATE identifiers SET phonebook_id = phonebook_id + 1; Query OK, 1 row affected (0.00 sec) Rows matched: 1 Changed: 1 Warnings: 0 mysql> UNLOCK TABLES; Query OK, 0 rows affected (0.00 sec) mysql> INSERT INTO phonebook VALUES (1, "Williams", "Lucy", "61388763452"); Query OK, 1 row affected (0.01 sec) The locking, querying, modifying, and unlocking process proceeds similarly to our example in the previous section, except that it doesn't use the phonebook table. Instead, the new row is inserted into the phonebook table without a lock using the value discovered with the SELECT query from the identifiers table, thereby maximizing concurrency (but requiring three SQL queries instead of two). Example 8-14 shows a rewritten version of Example 8-13 that uses this approach. Example 8-14. Maintaining the phonebook table using an external identifiers table<?php require 'db.inc'; require_once "HTML/Template/ITX.php"; function formerror(&$template, $message, &$errors) { $errors = true; $template->setCurrentBlock("error"); $template->setVariable("ERROR", $message); $template->parseCurrentBlock("error"); } if (!($connection = @ mysql_connect("localhost", "fred", "shhh"))) die("Could not connect to database"); $firstname = mysqlclean($_POST, "firstname", 50, $connection); $surname = mysqlclean($_POST, "surname", 50, $connection); $phone = mysqlclean($_POST, "phone", 20, $connection); $template = new HTML_Template_ITX("./templates"); $template->loadTemplatefile("example.8-10.tpl", true, true); $errors = false; if (empty($firstname)) formerror($template, "The first name field cannot be blank.", $errors); if (empty($surname)) formerror($template, "The surname field cannot be blank.", $errors); if (empty($phone)) formerror($template, "The phone field cannot be blank", $errors); // Now the script has finished the validation, show any errors if ($errors) { $template->show( ); exit; } // If we made it here, then the data is valid if (!mysql_select_db("telephone", $connection)) showerror( ); // Lock the identifiers table $query = "LOCK TABLES identifiers WRITE"; if (!(@ mysql_query ($query, $connection))) showerror( ); // Find the maximum phonebook_id value that's in use $query = "SELECT phonebook_id FROM identifiers"; if (!($result = @ mysql_query ($query, $connection))) showerror( ); $row = @ mysql_fetch_array($result); $phonebook_id = $row["phonebook_id"]; // Update the phonebook_id identifier $query = "UPDATE identifiers SET phonebook_id = phonebook_id + 1"; if (!($result = @ mysql_query ($query, $connection))) showerror( ); // Unlock the table $query = "UNLOCK TABLES"; if (!(@ mysql_query ($query, $connection))) showerror( ); // Insert the new phonebook entry $query = "INSERT INTO phonebook VALUES ({$phonebook_id}, '{$surname}', '{$firstname}', '{$phone}')"; if (!(@ mysql_query ($query, $connection))) showerror( ); // Show the phonebook receipt header("Location: example.8-5.php?status=T&phonebook_id={$phonebook_id}"); ?> To extend this scheme for a database containing several tables, there are two possible approaches: first, add an additional attribute (or more than one attribute if the primary key isn't on only one attribute) to the identifiers table for each additional table; or, second, add an additional identifier table for each additional table. The first approach is the simplest (and the one we recommend) but it does have the potential disadvantage that concurrency could be limited by excessive locking of the identifiers table if too many tables are maintained by using it. The second approach maximizes concurrency but is probably only necessary for high-throughput applications. 8.2.3.3 Managing identifiers with PEAR DBIn the previous section, we showed you how to maintain identifiers using an additional table. PEAR DB allows you to do the same thing using its DB::nextId( ) method and this is useful if you want to write database independent code. We show you how to use it in this section. The PEAR DB sequence methods are also briefly described in Chapter 7. A sequence is a value associated with a name and it's typically used to create primary key values. A sequence is always initialized to 1, and increments each time you access it with DB::nextId( ). For example, suppose you want to maintain the primary key value for the phonebook_id from the phonebook table that we've used in our examples in this chapter. To do this, you can use the DB::nextID( ) method as shown in Example 8-15: // Get a new primary key value for phonebook_id $phonebook_id = $connection->nextId("phonebook_id"); When this is called for the first time, DB::nextId( ) creates a new sequence named phonebook_id, assigns it the value 1, and returns the value. When you call it for the second time, it returns 2, and so on. It performs exactly the same function as our identifiers table approach in Example 8-14. Example 8-15. Using PEAR DB to maintain primary key values<?php require "db.inc"; require_once "HTML/Template/ITX.php"; require_once "DB.php"; function formerror(&$template, $message, &$errors) { $errors = true; $template->setCurrentBlock("error"); $template->setVariable("ERROR", $message); $template->parseCurrentBlock("error"); } $dsn = "mysql://fred:shhh@localhost/telephone"; $connection = DB::connect($dsn, false); if (DB::isError($connection)) die($connection->getMessage( )); $firstname = mysqlclean($_POST["firstname"], 50, $connection); $surname = mysqlclean($_POST["surname"], 50, $connection); $phone = mysqlclean($_POST["phone"], 20, $connection); $template = new HTML_Template_ITX("./templates"); $template->loadTemplatefile("example.8-10.tpl", true, true); $errors = false; if (empty($firstname)) formerror($template, "The first name field cannot be blank.", $errors); if (empty($surname)) formerror($template, "The surname field cannot be blank.", $errors); if (empty($phone)) formerror($template, "The phone field cannot be blank", $errors); // Now the script has finished the validation, show any errors if ($errors) { $template->show( ); exit; } // Get a new primary key value for phonebook_id $phonebook_id = $connection->nextId("phonebook_id"); if (DB::isError($connection)) die($connection->getMessage( )); // Insert the new phonebook entry $query = "INSERT INTO phonebook VALUES ({$phonebook_id}, {$surname}, {$firstname}, {$phone})"; $result = $connection->query($query); if (DB::isError($result)) die($result->getMessage( )); // Show the phonebook receipt header("Location: example.8-5.php?status=T&phonebook_id={$phonebook_id}"); ?> Behind the scenes, PEAR DB maintains a sequence in a table of the same name. When you create a sequence, it creates a table and an attribute and initializes the attribute to 1. When you call DB::nextId( ), it adds 1 and returns the value. PEAR DB correctly looks after safe concurrent access. If you call DB::nextID( ) without its optional second parameter or with the second parameter set to true, a sequence with the name supplied as the first parameter is created if it doesn't exist. You can also manually create a sequence using DB::createSequence( ) and you can remove it using DB::dropSequence( ). |
Previous | Table of Contents | Next |