Приглашаем посетить
Хомяков (homyakov.lit-info.ru)

Transactions

Previous
Table of Contents
Next

Transactions

One problem with which we must always be concerned is that of the consistency and integrity of our data. While the database management server can make sure that the data makes it in and out of our tables safely, it cannot always guarantee that it makes sense. For example, if we had marked in our inventory table that we had sold three books to a customer, but the power went off on our PHP server before we put the order into the orders table, we would have inconsistent inventory information. We can also run into "race conditions," where two people are trying to buy the last copy of a particular book in our warehouse at the same time. In the worst-case scenario, we could promise it to both users!

We can write code to get around this problem by designing various schemes to "lock" the tables and prevent others from accessing them. At the same time, we can come up with various schemes to detect inconsistencies and incomplete operations. However, this is expensive, complicated, and error-prone. We can get better results by letting the database do it for us through the use of transactions.

The Problem

Let us look closer at the example of the online bookstore. Imagine that we have a primitive table with all the products we have for sale and a table describing an individual order. To keep things simple, you can only order one type of book. We will work with the following MySQL table descriptions and ignore many of the details for orders, such as shipping information, costs, and payment information:

CREATE TABLE Products
(
  pid INTEGER AUTO_INCREMENT PRIMARY KEY,
  title VARCHAR(200) NOT NULL,
  isb n VARCHAR(200) NOT NULL,
  price NUMERIC(10,2) NOT NULL,
  number_in_stock INTEGER NOT NULL
)
ENGINE = InnoDB;

CREATE TABLE Orders
(
  order_id INTEGER AUTO_INCREMENT PRIMARY KEY,
  order_date DATETIME NOT NULL,
  user_id INTEGER NOT NULL,
  product INTEGER NOT NULL,
  num_units INTEGER NOT NULL,
  FOREIGN KEY (user_id) REFERENCES Users (user_id),
  FOREIGN KEY (product) REFERENCES Products (pid)
)
ENGINE = InnoDB;

Now, when we sell a copy of a book to somebody, we need to execute two queries:

UPDATE Products SET number_in_stock = 10 WHERE pid = 343;

INSERT INTO Orders (order_date, user_id, product, num_units)
   VALUES (NOW(), 4538, 343, 1);

The problem arises when the first query executes successfully but something happens before the second is completed. This can occur for any number of reasons:

  • We lose our connection to the database.

  • The database server crashes or the hard disk fills up, causing the second query to fail.

  • The server loses power and shuts off.

There is a tendency to shrug and state that "it won't happen to me," but when you look at large web applications processing thousands, if not millions, of transactions a day, these things do happen, and your customers or clients expect you to handle these failures in a reasonable and graceful way. In the former case, we are fortunate that our client has not been charged for something he is not going to receive, but we would prefer if our application had a way of automatically detecting this failure and restoring the missing item to the number_in_stock field of our Products table.

The Solution

With this in mind, we come to transactions. They are simply a way to group multiple SQL queries and statements into a single atomic action on the database. Either all of them succeed and are actively accepted (committed) to the database, or they all fail and are undone (rolled back). In reality, it turns out to be more complicated, and any relational database management system that wants to claim that it supports transactions must conform to the so-called ACID rules:

  • Atomicity Transactions are an all-or-nothing proposition. All of the actions within the transaction are to be treated as an indivisible unit, and in the event of failure, the net effect should be that none of these actions has run.

  • Consistency After a transaction is run, our database must be in a consistent state with all constraints and data integrity rules satisfied.

  • Isolation Changes being made by a transaction should not be visible from other transactions while it is in progress. Their operation should be kept separate.

  • Durability The server has to be able to handle abnormal termination. This could mean that there are pending changes after a successfully completed transaction, or that there are half-executed changes after an interrupted or aborted transaction. Either way, the server must be prepared for a sudden shutdown and must be able to reconstruct a consistent state on startup (whether it is the fully committed or fully aborted transaction).

All of the databases we discuss fully support transactions, except you must be careful with the table type you choose in MySQL. If your tables are created with the MyISAM table storage engine, then transactions are not supported. In this book, we use the InnoDB storage engine whenever we want to use transactions on a table.

Inconveniently, various servers support varying degrees of isolation through transaction isolation levels. Most support four levels, which range from letting transactions see the progress and current state of other transactions (the lowest level of isolation, but fastest) to truly and completely isolating transactions from each other (highest degree of isolation, but much slower). The existence of these levels is an admission of the need to balance degrees of isolation and performance in our applications. The default level (repeatable read) for most of the database servers we mention in this book provides us with adequate isolation. (Oracle defaults to a slightly lower level of isolation, but it does not prove problematic for our needs.) We will leave it to you to pick up a more robust database programming book to get familiar with the more advanced transaction concepts.

Writing Transactions

The way you begin a transaction varies from server to server, but the most common syntax is a simple query:

BEGIN;

In some of the database servers (MySQL included) there is a property called autocommit, which controls how the database server treats queries. When set to trUE (or 1, the default), any query you enter is automatically committed to the database. When set to FALSE (or 0), any queries are batched up as if in a transaction, and they are only committed to the database when instructed. Thus you can also begin a transaction on MySQL by simply turning off autocommit:

SET AUTOCOMMIT=0;

When you have successfully completed all of the statements and queries in a transaction, the results are committed to the database with the following statement:

COMMIT;

When you want to abort the transaction and make sure that no results are committed, you execute the following statement:

ROLLBACK;

If your connection to the server is automatically dropped, your server crashes, or your transaction is otherwise interrupted before the COMMIT statement completes execution, ROLLBACK is automatically executed and the transaction is discarded.

Revisiting our problem of selling a book, our sequence of SQL statements now becomes the following:

BEGIN;

UPDATE Products SET number_in_stock = 10 WHERE pid = 343;

INSERT INTO Orders (order_date, user_id, product, num_units)
   VALUES (NOW(), 4538, 343, 1);

If we make it this far without any errors:
COMMIT;

Otherwise, in case of any errors whatsoever:
ROLLBACK;

A More Complicated Problem

We are loathe to admit it, but there is still a problem with our system. We have solved the problem of consistency in the event of crash or failure during the execution of our sales program, but before we even get this far, we could run into a problem of multiple people trying to purchase books at the same time.

We will look the situation where two people are trying to purchase the same book with only one left in stock.

+-----+------------+------+-------+-----------------+
| pid | title      | isb n | price | number_in_stock |
+-----+------------+------+-------+-----------------+
| 343 | 'So Happy' | 'xx' | 19.99 |               1 |
+-----+------------+------+-------+-----------------+

The code we have to affect a purchase is roughly as follows:

BEGIN;

SELECT number_in_stock FROM Products WHERE pid = 343;

Subtract 1 from the number_in_stock, call this "new"

UPDATE Products SET number_in_stock = new WHERE pid = 343;

INSERT INTO Orders (order_date, user_id, product, num_units)
   VALUES (NOW(), 4538, 343, 1);

If we make it this far without any errors:
COMMIT;

Otherwise, in case of any errors whatsoever:
ROLLBACK;

Our newest problem arises when we have two users trying to purchase this book in our web application at the same time. We will now show the sequence of actions for the two users running at nearly the same time.

User 1 begins the purchase process, and the following is executed:

[User 1]

  BEGIN;

  SELECT number_in_stock FROM Products WHERE pid = 343;

  Subtract 1 from the number_in_stock, call this "new"

The code for User 1 sees that there is one of this book left and gets ready to purchase it. However, at the same time, User 2 has been shopping, and that process executes the following code:

[User 2]

  BEGIN;

  SELECT number_in_stock FROM Products WHERE pid = 343;

  Subtract 1 from the number_in_stock, call this "new"

The code for User 2 also sees that there is one book left and gets ready to purchase it. Because the default transaction isolation level in our database server is not the most restrictive, parallel transactions can see the values of the same row. However, the code for User 1 now executes the following:

[User 1]

  UPDATE Products SET number_in_stock = 0 WHERE pid = 343;

  INSERT INTO Orders (order_date, user_id, product, num_units)
     VALUES (NOW(), 4538, 343, 1);

  COMMIT;

User 1 has successfully purchased this book. When User 2 tries to purchase it with the following code

[User 2]

  UPDATE Products SET number_in_stock = 0 WHERE pid = 343;

  INSERT INTO Orders (order_date, user_id, product, num_units)
     VALUES (NOW(), 4538, 343, 1);

  COMMIT;

the UPDATE query succeeds but does not update rows in the table! This is because the transaction processing code in the database server realizes that the underlying data has changed and does not want to let this second process change it. However, it does not signal an error condition, and our code now has to add extra logic to detect if the underlying row value has been changed and otherwise abort or try again when it sees that this did not happen.

A far more elegant solution exists in the form of an updated version of the SELECT querythe SELECT ... FOR UPDATE query. When you ask to see the value of a row in a table, you are indicating with this query that you plan to change the data for this row, and any other transactions or people trying to access this data will block until the current transaction is completed. Thus, we can rewrite our logic as follows:

BEGIN;

SELECT number_in_stock FROM Products
 WHERE pid = 343 FOR UPDATE;

Subtract 1 from the number_in_stock, call this "new"

UPDATE Products SET number_in_stock = new WHERE pid = 343;

INSERT INTO Orders (order_date, user_id, product, num_units)
   VALUES (NOW(), 4538, 343, 1);

If we make it this far without any errors:
COMMIT;

Otherwise, in case of any errors whatsoever:
ROLLBACK;

The addition of the FOR UPDATE means that any other code trying to see the same value has to wait until we call COMMIT or ROLLBACK, which eliminates the possibility of our problematic "race" to purchase the book.

There are more advanced locking options available for transactions, but we will not be writing anything complex enough to require them in this book. We will see examples of transactions in Part V, "Sample Projects and Further Ideas."


Previous
Table of Contents
Next