In order to deepen our understanding of password security, we need to first discuss how password verification takes place (e.g., on PCs and on remote servers). Since the authenticating machine—whether it be a PC or remote server—needs to be able to verify if the user sent the correct password, it seemingly needs to store the original password. However, this is very dangerous since it means that the passwords can be stolen. In the case of a laptop, this is obvious: if the password is stored on the hard disk in order to enable login verification, then an attacker who steals a laptop can remove the hard drive, find the password, and then login using the correct password. (If the drive is not encrypted, then the password may not be needed since all the files can just be read. However, since the user’s local password is most likely their enterprise password, finding the password can often also give the attacker access to the enterprise network.) Likewise, if the user’s passwords are stored in the clear by the remote server, then they are vulnerable any time a network breach occurs.
The solution to this problem is to use a cryptographic hash function. The property that we need from such a function in this context is that it is one-way. This means that it is easy to compute the function but hard (infeasible) to invert it. Now, it is possible to store the hash of the user password rather than the password itself. When the user logs in and sends their password in the clear, the authenticating machine just computes the hash of the password sent by the user and compares it to the stored hash. In this way, it is never necessary to invert in order to verify the validity of the logon, and the passwords are never stored in the clear. At first sight, this seems to be an excellent solution to the problem.
Unfortunately, however, reality is far more complex. In particular, no one said that an attacker has to invert the hash in order to steal the password. Rather, it is possible to simply try all possible passwords, compute their hash, and compare the result. It may seem that this would take a long time, but on not very expensive machinery today (e.g., a $10,000 ASIC), it is possible to compute about 1.5 billion hashes per second! Since most users are simply not capable of choosing very long random passwords, such brute force attacks are extremely effective. Sophisticated password crackers use information about most likely passwords, and how most passwords are formed, in order to successfully crack most passwords in very little time.
Before proceeding, I would just like to note that if a simple hash of the password is stored, then it is possible to carry out even more sophisticated attacks that require significant preprocessing but can then find all passwords very quickly. These are called “time-space tradeoffs” and the most famous of them is the Rainbow table technique. I will describe these more advanced attacks in a future blog. In order to prevent these attacks, random salt is added to the hash. This salt is public, and so instead of storing H(pwd), the pair (salt, H(salt,pwd)) is stored. Since the random salt is long (e.g., 128 bits), it prevents the time-space tradeoff attacks from working. (This will become clear after we describe how these attacks work.) Note that the salt need not be secret, and indeed cannot be secret since then it would be impossible to verify if the password sent by the user is correct. In summary, by adding random salt, the attacker is forced to carry out a separate brute force attack on every password, as described above.
Since, as we have mentioned, brute force attacks are very effective, the common best practice is to not compute the hash function once over the password, but to compute many iterations. For example, denote by H1000(salt,pwd) the result of computing the hash 1000 times, each time over the previous result. This will slow down the verification time by a factor of 1000. However, adding an additional few milliseconds is not significant to user experience, and so is tolerated. In contrast, slowing the attacker down by a factor of 1000 is very significant. This is the standard best practice for password verification today: use random salt and as large a number of iterations as possible. This is the idea underlying standards like PBKDF.
Unfortunately, due to the speed of machines that attackers have at their disposal, this best practice just isn’t good enough. A quick Internet search on data breaches, password breaches and so on shows that passwords are stolen en masse, and salt and iterations are just not enough to prevent it.
In the previous blog post we differentiated between online and offline attacks. The method of salt and iterations is designed to slow down offline attacks, and indeed it does this (if salt is not used, then the result is devastating, as one very large social network found out in 2012). However, the fact is that if the hashed password file is stolen (from a local disk or from a remote server), then the attacker can carry out an offline dictionary attack. Even if the attack is slowed down by a factor of one thousand or even ten thousand, it will still be devastating and will reveal the vast majority of the user’s passwords very quickly.
In the next post, we will present some improvements that exist on this basic method, and will discuss what is known regarding the statistics of user passwords.