Hash Functions Part 1

This is the first of a series of blogs which will discuss the use of hash functions within cryptographic systems. The first blog sets the scene, whereas future blogs will discuss issues such as how to define what a secure cryptographic hash function actually is, how hash functions are used in other schemes, and how they are constructed.

A ‘hash function’ (without the adjective ‘cryptographic’) is a function which takes an arbitrary length input and produces a short fixed length output. As a simple example we could take a string of letters, convert each character to its ASCII equivalent, write out the associated number in base 256 and then take the result modulo a prime p. Such a function would map arbitrary length strings into integers modulo p. Hash functions are used a lot in computer science; for example to implement fast lookups in database systems, for tagging files to see whether they have been seen before, and so on. The output of the hash function is often called the ‘digest’ or ‘hash code’ of the input. In almost all applications we require that hash functions produce digests which look like they have been pulled uniformly at random from the range of the hash function.

Cryptographic hash functions are on the other hand much more complicated beasts; and as can be expected from something cryptographic they have to satisfy a number of security considerations, as well as mapping large inputs to fixed length random looking outputs. To understand the type of security considerations one might impose on a cryptographic hash function it is perhaps worth looking at a specific application example.

In standard password systems (see our previous blog on this), a password file consists of the following triples of data

For each username (e.g. ‘John Doe’) a random value is given called the ‘salt’ (e.g. ‘1234’). To obtain the hash we take the users password (e.g. ‘PassW0rd’) and then apply a cryptographic hash function to the string ‘1234PassW0rd’ so as to obtain the hash digest, e.g.
H(‘1234PassW0rd’)=473128
When a user logs into the system with their password, the system then looks up their username and given salt value and compute the associated digest and compares this with
the value stored in the file.

The salt is used to protect against users who have the same password. For example if user ‘Fred Blogs’ also used the password ‘PassW0rd’, the two hash digests would (hopefully) be different as the hash function would be applied to two different salt values. For example if the salt for Fred was ‘4321’ then we hope it would be unlikely that H(‘4321PassW0rd’)=473128. Using a salt makes it harder for an attacker who compromises Fred’s password to learn that it is also the password for Joe.

In our example let us suppose that H(‘4321PassW0rd’)=985211. In which case our password file could look something like
John Doe :1234:473128
Fred Blogs:4321:985211
Alice Doe :5287:521984
and so on, where Alice may have the password ‘1L0veB0b’.

So let us consider why a normal hash function, such as the one above based on the base 256 expansion of a string modulo p, is not suitable in this application. Suppose the attacker learned that the digest for Alice’s password was 521984, and that the salt was 5287. With our simple ‘normal’ hash function it is easy to come up with a password, maybe not Alice’s actual password, which produces the same digest. Thus for a cryptographic hash function we need it to be hard to find ANY string xxxxxx such that
H(5287xxxxxx)=521984
More generally, given a digest value D it should be hard to find an input value M such that H(M)=D. We say that the cryptographic hash function should be one-way, or preimage-resistant.

But cryptographic hash functions need other properties. To understand these, informally, we need to consider another application. Cryptographic hash functions are often used to produce a digest of a message, and then the digest is passed to a digital signature algorithm for signing. Thus a digital signature algorithm does not sign the message, but the digest. Suppose our signing and verification algorithms are S(m,sk) and V(m,s,pk). Then we sign a large message M by computing
s = S(H(M),sk)
and we verify the signature by checking whether
V(H(M),s,pk)=true
Now suppose we could come up with two messages M and M’ for which H(M)=H(M’), then a signature on M would also be a signature on M’. This would clearly not be a good idea for applications using digital signatures. Thus we also require that our cryptographic hash function be such that finding two messages which hash to the same value should be hard. Such a hash function is said to be collision resistant; and again our earlier example is not collision resistant (an easy exercise).

So in summary at the bare minimum we require cryptographic hash functions to be preimage-resistant and collision resistant. In the next blog we show defining security for cryptographic hash functions is slightly more complex than the above intuitive definitions would imply.