More Advanced Queries

Table of Contents

More Advanced Queries

So far, we have shown you the basics of SQL queries, but the language is far more powerful and flexible than the simple queries demonstrate. We will now introduce some of the more powerful features of the language.

Combining Expressions

The WHERE keyword lets us associate expressions to restrict the function of several SQL queries, including SELECT, UPDATE, and DELETE. While we have shown basic expressions of the form

ColumnName operator value

there are means by which expressions can be combined to form more elaborate expressions. The ones we will use are the AND and OR keywords. These behave similarly to AND and OR operators in other programming languages. The former is considered trUE if both sides of the condition are met, while the latter evaluates to trUE if either side of the condition is met. These let us refine our queries as needed. For example, to find all users with the first name Gaia who were born after 1980, we can enter this:

SELECT user_name, user_email FROM Users
  WHERE full_name LIKE 'Gaia%'
        AND birthdate >= '1980-01-01';

Similarly, we could look for all users with the name John or Jon:

SELECT user_name, birthdate FROM Users
  WHERE full_name LIKE 'John%'
        OR full_name LIKE 'Jon%';

To combine multiple expressions of this sort, parentheses can and should be used to clarify the order of evaluation:

SELECT user_name, full_name, birthdate FROM Users
  WHERE (full_name LIKE 'John%' OR full_name LIKE 'Jon%')
        AND birthdate >= '1980-01-01';

Specifying Sets and Ranges of Values

When we want to specify a range of values in our expressions, SQL provides two useful keywords for thisIN and BETWEEN.

The IN keyword allows us to specify a set of scalar values (no wildcards are permitted) against which the value of a column should be evaluated. There is a match if the column value is in the set specified:

SELECT * FROM Messages
  WHERE forum_id IN (1, 3, 7, 4);

The BETWEEN keyword lets us specify a range of values against which a value should match. The functionality is reasonably transparent with numbers and dates:

  WHERE birthdate BETWEEN '1970-01-01' AND '1970-12-31';

SELECT * FROM Messages
  WHERE message_id BETWEEN 1000 AND 5000;

With strings, however, it can be less so. As we mentioned before, the database uses the information it has on sorting order (collation) to determine whether a string value falls within a range of strings. Depending on what collation has been specified, this tells it not only how to deal with English language ASCII letters, but also those from other languages. Thus, we can write a query

  WHERE user_name BETWEEN 'a' AND 'm';

that returns the set of usernames between a and m. When first entering this query, you might receive a strange error about character sets and collation orders, such as the following one returned by MySQL:

ERROR 1270 (HY000): Illegal mix of collations
   (utf8_general_ci,IMPLICIT), (latin1_swedish_ci,COERCIBLE),
   (latin1_swedish_ci,COERCIBLE) for operation 'BETWEEN'

The most likely explanation for this error is that the client program that you are connecting to the server with is using one character set and collation (see Chapter 9, "Designing and Creating Your Database") while the database is using another. (We typically set them to be Unicode/utf-8 on the server.) To rectify this, you need to set the client program to use the same character set and collation as the database. In the MySQL client program mysql, this is done as follows:

mysql> set character_set_connection = @@character_set_database;

This setting of the character set also sets the collation to the default collation for that character set. If you have chosen something different, you can also set the collation of the connection as follows:

mysql> set collation_connection = @@collation_database;

One final complication is whether various database engines treat the range specified with BETWEEN as inclusive or exclusive. Some databases include in the set of valid matches values that are equal to the boundaries of the BETWEEN range, while others exclude values that match the boundaries. You should double check with any given database server before you rely on this functionality. (All of the DBMSes we discuss are inclusive.)

Fetching Unique Values

There are times when we only want to know the possible set of values in a column instead of all of the values (including any duplicates). In our Messages table, if we were to run the query

SELECT forum_id FROM Messages;

we would find ourselves with many potential duplicate entries for any forums with more than one posting. However, if we wanted to see a list of forums in which there was at least one message posted, we could use the SELECT DISTINCT query in SQL:

SELECT DISTINCT forum_id FROM Messages;

This query returns only individual forum_id values with the duplicates eliminated for us.

SQL Functions in Queries

In addition to scalar values and columns in your queries, SQL allows you to use various functions, including those provided by the server and those written by users on systems that allow stored procedures (user-defined functions compiled and kept on the server).

There are two classes of functions in SQL:

  • Aggregate functions Functions that operate on a set of values (for instance, a column of data from a table) and return a single scalar value. An example of this would be functions to compute the sum of all the values in a column, or functions to compute the average.

  • Scalar functions Functions that operate on a single scalar value (such as the value of a particular column in a particular row) and return a single scalar value. When given a column of data on which to operate, these functions return a column where the individual values are the scalar values as run through the function. Examples of this would be functions to convert strings in a field to lowercase, functions to convert currency, or functions that format data for output.

A function needs to be scalar to be used in an expression. Aggregate functions can be used to process the result of a query before it is finally returned to us, and thus are most commonly used with the results of a SELECT statement.

Unfortunately, these functions are another area where the various SQL servers vary greatly in implementation. While they usually have the same functionality, the name or way in which a particular function is used can differ significantly. We will endeavor to cover the most common ones here. When there is too much variation, we will show the MySQL method and refer you to Appendix B, "Database Function Equivalents," where we will briefly show how other servers might do the same thing.

When you call functions in SQL queries, no spaces are allowed between the function name and the parentheses used for arguments.

AVG(daily_precip)        OK
AVG (daily_precip)       Not OK

Numeric Functions

We will start by introducing a few numeric functions since they are the easiest to understand and use. All of the following functions are aggregate functions that will be used to operate on a set of rows in a query and will return a single value.


The COUNT function can be used in two ways: COUNT(ColumnName) or COUNT(*). The former counts the number of values in the result set from the specified column that do not have NULL values. The second version simply counts the number of rows in the result set. For example


counts the number of users in our Users table. To count the number of users who do not have NULL full names (which is permitted by our table schema), we could execute the following query:

mysql> SELECT COUNT(full_name) FROM Users;
| count(full_name) |
|                4 |
1 row in set (0.00 sec)


To return a sum of the values in the result set for a given column, we will use the SUM function. If we had a table with the daily weather information for a particular place and wanted to compute the total rainfall for the year 2002, we could execute the following query:

SELECT SUM(daily_precip) FROM DailyWeatherReports
  WHERE date BETWEEN '2002-01-01' AND '2002-12-31';


To compute the maximum or minimum value of a column in a result set, you can use the MAX and MIN functions. To continue our weather example from the previous function, we could find the days with the highest and lowest precipitation in 2002 with the following queries:

SELECT MAX(daily_precip) FROM DailyWeatherReports
  WHERE date BETWEEN '2002-01-01' AND '2002-12-31';

SELECT MIN(daily_precip) FROM DailyWeatherReports
  WHERE date BETWEEN '2002-01-01' AND '2002-12-31';

We can combine these queries into a single query if we desire. The result set for this query would contain two columns:

SELECT MAX(daily_precip), MIN(daily_precip)
  FROM DailyWeatherReports
 WHERE date BETWEEN '2002-01-01' AND '2002-12-31';


To compute the average value of a column in a result set, the AVG function is provided. It is used as follows:

SELECT AVG(daily_precip) FROM DailyWeatherReports
  WHERE date BETWEEN '2002-01-01' AND '2002-12-31';

String Functions

There are many useful functions for operating on strings, most of which are scalar functions that can be used in a number of places.

Substring Extraction

To extract a portion of a string in SQL, you use the SUBSTRING function (SUBSTR in Oracle). It takes three argumentsthe value (or column name) on which to operate, the index of the first character to extract, and the number of characters to extract.


Unlike PHP's zero-based indexing, the index of the character is 1-based. This means that the first character in the string is at index 1.

If we had a table with the states and provinces of the USA and Canada, we might execute the following to get the first 5 characters of each:

SELECT SUBSTRING(name, 1, 5) FROM states_provinces;

To find the states and provinces that begin with New, we could execute the following:

SELECT * FROM states_provinces
  WHERE SUBSTRING(name, 1, 3) = 'New';

Case Conversion

To convert strings to lower- or uppercase, you use the LOWER and UPPER functions.

SELECT LOWER(user_name) FROM Users;

SELECT UPPER(last_name) FROM customers;

It is unlikely you will use LOWER and UPPER as part of your expressions in a WHERE clause since, as we mentioned in the previous chapter, most of the collations and comparisons are case-insensitive.

Finding a String

We need a function to find the position of a particular string within a value (or column). While all of the servers with which we will work have such functionality, the exact name and usage differs. MySQL and Oracle both use the INSTR function:

INSTR(look_in_here, find_me)

For example, we could use the following to get just the first names of our message board users from the Users table, assuming it was the first word in the name:

SELECT user_name,
       SUBSTRING(full_name, 1, INSTR(full_name, ' '))
  FROM Users;

If the specified string is not found by the INSTR function, it returns 0.

String Concatenation

The ability to concatenate strings together exists in SQL and varies widely from server to server. Thus, we will cover the MySQL syntax here:

CONCAT(value1, value2, ..., valuen)

If we wanted to return a formatted string with a user's username and e-mail address from our Users table, we could execute the following query:

SELECT CONCAT('User Name: ',
              '\tUser Email:',
  FROM Users;

Trimming Your Strings

trIM is a function in SQL that removes leadings and trailing whitespace from your strings and behaves much like its PHP counterpart:

SELECT user_name, TRIM(full_name), user_email
  FROM Users
  WHERE user_name LIKE 'F%';

Date/Time Functions

There are a number of helpful functions for date and time data types in SQL. They are not standard across implementations, but most of the basic functionality exists in all the major servers.


To find out what the time and date is right now in MySQL and PostgreSQL, you use a function called Now.


INSERT INTO Orders (prodid, user_id, when)
  VALUES(445455423, 32345, Now());

Year, Month, Day

There are a number of functions to extract values of months from date values. Again, these differ significantly between implementations. In MySQL, the YEAR function takes a date and returns the four-digit year that date represents. The MONTH function returns the month for a date, and the DAYOFMONTH, DAYOFWEEK, and DAYNAME functions return the day in a slightly different format.

Formatting Dates and Time

While all databases provide functionality to format date and time values, the implementations are extremely different. In MySQL, the function used for this is the DATE_FORMAT function. It takes both the value to format and a string with the format to use.

DATE_FORMAT(format_me, format_instructions)

The instructions are a sequence of characters to be substituted by the database engine, as in the following example:

mysql> SELECT full_name, DATE_FORMAT(birthdate, '%W %D %M %Y')
         FROM Users
         WHERE birthdate <> '0000-00-00';
| full_name      | DATE_FORMAT(birthdate, '%W %D %M %Y') |
| Klaus Mueller  | Tuesday 16th December 1980            |
| Akira Tanaka   | Wednesday 13th September 0000         |
| Petunia Durbin | Monday 31st March 1975                |
3 rows in set (0.00 sec)

Some of the more common and interesting format codes are presented in Table 11-1. For a complete set of values, consult the MySQL documentation for date and time functions.

Table 11-1. Format Codes for the DATE_FORMAT Function




The day of the week, in the language of the database server (often English)


The day of the week in numeric format (0 = Sunday, 6 = Saturday)


The calendar year, in Julian format, with 4 digits (for instance, 1999)


The Julian calendar year, in 2-digit format (for instance, 33)


The name of the month, in the language of the database server (January, March, and so on)


The numeric value of the month (00 to 12)


The day of the month with an English numeric suffix (1st, 2nd, 3rd, and so on)


The day of the month in numeric format (00 to 31)


The hour in 24-hour format (00 to 23)


The hour in 12-hour format (01 to 12)


AM or PM


Minutes, in numeric format (00 to 59)

%S or %s

Seconds, in numeric format (00 to 59)

Grouping Aggregate Function Data

Aggregate functions would be even more useful if SQL could group the values in a table before executing the functions on them. To continue the example of our weather database, we saw previously that we could use the AVG function to compute the average rainfall in a year. If we wanted to see the average rainfall for a few years, we would have to execute a number of queries on the different year values.

It would be nicer if there was a way for SQL to do this for us. Fortunately, the GROUP BY clause allows us to tell SQL to group the values from a certain column before executing the aggregate function on each of these groups:

SELECT YEAR(date), AVG(daily_precip)
  FROM DailyWeatherReports
  GROUP BY YEAR(date);

The GROUP BY clause, along with the scalar function YEAR, allows us to group the table by common year values and then execute the AVG function on all the row groups with common date values. We can further refine our query by using the HAVING clause, which is a bit like a WHERE clause but operates to restrict which rows the GROUP BY sub-clause sees. For example, to return the list of years in which the average rainfall exceeded 50mm, we might write our query as follows:

SELECT YEAR(date), AVG(daily_precip)
  FROM DailyWeatherReports
  HAVING AVG(daily_precip) > 50;

The ability to combine all of these features into statements can let us write extremely powerful queries. For example, to get the average rainfall between the years 1990 and 2000 for all cities with an average annual rainfall exceeding 50mm

SELECT YEAR(date), AVG(daily_precip)
  FROM DailyWeatherReports
  WHERE YEAR(date) BETWEEN 1990 AND 2000
  HAVING AVG(daily_precip) > 50;

Table of Contents
© 2000- NIV