Registering Users

Before you can go about authenticating users, you need to know who the users are. Minimally, you need a username and a password for a user, although it is often useful to collect more information than that. Many people concentrate on the nuances of good password generation (which, as we discuss in the next section, is difficult but necessary) without ever considering the selection of unique identifiers.

I've personally had very good success using email addresses as unique identifiers for users in Web applications. The vast majority of users (computer geeks aside) use a single address. That address is also usually used exclusively by that user. This makes it a perfect unique identifier for a user. If you use a closed-loop confirmation process for registration (meaning that you will send the user an email message saying that he or she must act on to complete registration), you can ensure that the email address is valid and belongs to the registering user.

Collecting email addresses also allows you to communicate more effectively with your users. If they opt in to receive mail from you, you can send them periodic updates on what is happening with your sites, and being able to send a freshly generated password to a user is critical for password recovery. All these tasks are cleanest if there is a one-to-one correspondence of users and email addresses.

Protecting Passwords

Users choose bad passwords. It's part of human nature. Numerous studies have confirmed that if they are allowed to, most users will create a password that can be guessed in short order.

A dictionary attack is an automated attack against an authentication system. The cracker commonly uses a large file of potential passwords (say all two-word combinations of words in the English language) and tries to log in to a given user account with each in succession. This sort of attack does not work against random passwords, but it is incredibly effective against accounts where users can choose their own passwords.

Ironically, a tuned system makes dictionary attacks even easier for the cracker. At a previous job, I was astounded to discover a cracker executing a dictionary attack at more than 100 attempts per second. At that rate, he could attempt an entire 50,000-word dictionary in under 10 minutes.

There are two solutions to protecting against password attacks, although neither is terribly effective:

Create "good" passwords.
Limit the effectiveness of dictionary attacks.

What is a "good" password? A good password is one that cannot be guessed easily by using automated techniques. A "good" password generator might look like this:

function random_password($length=8) {
  $str = '';
  for($i=0; $i<$length; $i++) {
    $str .= chr(rand(48,122));
  }
  return $str;
}

This generates passwords that consist of random printable ASCII characters. They are also very difficult to remember. This is the key problem with truly random password generators: People hate the passwords they generate. The more difficult a password is to remember, the more likely a person is to put it on a sticky note on his or her monitor or in a text file or an email message.

A common approach to this problem is to put the burden of good password generation on the user and enforce it with simple rules. You can allow the user to select his or her own password but require that password to pass certain tests. The following is a simple password validator for this scenario:

function good_password($password) {
  if(strlen($password) < 8) {
    return 0;
  }
  if(!preg_match("/\d/", $password)) {
    return 0;
  }
  if(!preg_match("/[a-z]/i", $password)) {
    return 0;
  }
}

This function requires a password to be at least eight characters long and contain both letters and numbers.

A more robust function might check to ensure that when the numeric characters are removed, what is left is not a single dictionary word or that the user's name or address is not contained in the password. This approach to the problems is one of the key tenets of consulting work: When a problem is difficult, make it someone else's problem.

Generating a secure password that a user can be happy with is difficult. It is much easier to detect a bad password and prevent the user from choosing it.

The next challenge is to prevent dictionary attacks against the authentication system. Given free reign, a cracker running a dictionary attack will always compromise users. No matter how good your rules for preventing bad passwords, the space of human-comprehensible passwords is small.

One solution is to lock down an account if it has a number of consecutive failures against it. This solution is easy enough to implement. You can modify the original check_credentials function to only allow for a fixed number of failures before the account is locked:

function check_credentials($name, $password)  {
  $dbh = new DB_Mysql_Prod();
  $cur = $dbh->execute("
    SELECT
      userid, password
    FROM
      users
    WHERE
      username = '$name'
    AND failures < 3");
  $row = $cur->fetch_assoc();
  if($row) {
    if($password == $row["password"]) {
      return $row['userid'];
    }
    else {
      $cur = $dbh->execute("
        UPDATE
          users
        SET
          failures = failures + 1,
          last_failure = now()
        WHERE
          username = '$name'");
   }
 }
 throw new AuthException("user is not authorized");
}

Clearing these locks can either be done manually or through a cron job that resets the failure count on any row that is more than an hour old.

The major drawback of this method is that it allows a cracker to disable access to a person's account by intentionally logging in with bad passwords. You can attempt to tie login failures to IP addresses to partially rectify this concern. Login security is an endless battle. There is no such thing as an exploit-free system. It's important to weigh the potential risks against the time and resources necessary to handle a potential exploit.

The particular strategy you use can be as complex as you like. Some examples are no more than three login attempts in one minute and no more than 20 login attempts in a day.

Protecting Passwords Against Social Engineering

Although it's not really a technical issue, we would be remiss to talk about login security without mentioning social engineering attacks. Social engineering involves tricking a user into giving you information, often by posing as a trusted figure. Common social engineering exploits include the following:

Posing as a systems administrator for the site and sending email messages that ask users for their passwords for "security reasons"
Creating a mirror image of the site login page and tricking users into attempting to log in
Trying some combination of the two

It might seem implausible that users would fall for these techniques, but they are very common. Searching Google for scams involving eBay turns up a plethora of such exploits.

It is very hard to protect against social engineering attacks. The crux of the problem is that they are really not technical attacks at all; they are simply attacks that involve duping users into making stupid choices. The only options are to educate users on how and why you might contact them and to try to instill in users a healthy skepticism about relinquishing their personal information.

Good luck, you'll need it.

JavaScript Is a Tool of Evil

The following sections talk about a number of session security methods that involve cookies. Be aware that client-side scripting languages such as JavaScript have access to users' cookies. If you run a site that allows users to embed arbitrary JavaScript or CSS in a page that is being served by your domain (that is, a domain that has access to your cookies), your cookies can easily be hijacked. JavaScript is a community-site cracker's dream because it allows for easy manipulation of all the data you send to the client.

This category of attack is known as cross-site scripting. In a cross-site scripting attack, a malicious user uses some sort of client-side technology (most commonly JavaScript, Flash, and CSS) to cause you to download malicious code from a site other than the one you think you are visiting.

Table of Contents