In this second blog we discuss the security properties of cryptographic hash functions in more detail. Before we do this let us first examine a more well understood case of factoring. When we say factoring is hard, what do we mean? What the cryptographer means is that there is no efficient algorithm A which on input of an integer N will find the factors of N with high probability, i.e. there is no efficient A which satisfies with high probability

A(N)=p where p divides N and p<>1 or N

when N is composite.

However, this comes with some caveats. For example if we select N to be any random integer then such an algorithm does exist; since 50% of all integers are even then with probability 1/2 we can always factor N in polynomial time. Also if N is too small then it will be easy to factor.

Thus restrict the factoring problem to be for those N which we believe are hard to factor. For example we might select N from the set

S = { N : N = p*q, p and q primes of 1024 bits in length }.

Then our factoring problem is that there is no efficient algorithm which satisfies with high probability

A(N)=p where p divides N and p<>1 or N

when N is selected from S. A key part of this definition is efficient, there is always an algorithm which solves this problem; namely the one which internally lists all elements of S along with their factors. Such an algorithm only has to look up its input in the internal list and output the corresponding factor. However, such a list is exponentially long and hence such an algorithm A could not even be written down efficiently.

Suppose the probability that A factors an element from S is pr, and that this probability is over the random choices made by A (and not over the choice of input from S), then by repeating the algorithm 1/pr times we will indeed factor the input. Now if each run of A takes t time steps, then we can factor with work effort w=t/pr.

With this understanding of what we mean by factoring is hard, let us examine the security of hash functions. We first consider the one-way property, or preimage resistance. Suppose we are given a surjective function

H: D —> C

with domain D and codomain (a.k.a. range) C. A function is said to be preimage resistant if given an element c in C, there is no efficient algorithm A which on input of c will produce a value d in D, with high probability, such that H(d)=c. Just like when we listed the elements of S above, such an algorithm can be given by just listing the elements of C. Thus a requirement for the non-existance of algorithm A is that the size of C should be exponential. Indeed for a suitably defined function H we hope that the work effect (run time of A divided by the probability of success in finding a preimage) is given by a constant times |C|.

Now consider the case of collision resistance. Here we are trying to find distinct m and m’ such that H(m)=H(m’), where we assume (since H is a hash function) that |D|>|C|, so collisions do indeed exist. Intuitively we would like to say a function is collision resistant if there no efficient algorithm which will output m and m’. However, this is impossible. Since we know H has collisions, there will be such an algorithm; namely, the one which has encoded into one specific collision. The problem is that we are talking about a fixed function f. In our factoring example this would have been like talking about an algorithm which has to factor one specific integer, and that one alone.

To get around this technical issue theoretical cryptography talks about keyed hash functions. Here we assume we have a family of hash functions H_k, indexed by a key k. Then the idea is that we define hardness of collision resistance for the family as there being no algorithm A which, on input of k, will output two distinct values m and m’ such that H_k(m)=H_k(m’). Now, assuming the family is large enough, we could not encode the collision for each k into the algorithm A and so we can meaningfully ask whether a function family is collision resistant.

The problem is with this theoretical slight of hand is that it does not correspond to cryptographic practice. Practical cryptographic hash functions, like SHA-1, SHA-2 and SHA-3, are fixed; they are not hash function families. So in practice when we say a cryptographic hash function is collision resistant we mean that we do not know how to write down an algorithm, despite knowing one exists, which will find collisions with high probability. Such a security definition is said to be security via human ignorance.

In the next blog, we will discuss how hard it is to find collisions in a hash function. But before ending today’s blog we note a related concept to collision resistance called second preimage resistance. Here we take a fixed hash function f as above, and we give the algorithm A a value m from the domain. We ask that A outputs distinct value m’, such that H(m)=H(m’). Whilst this looks like collision resistance, it is actually more closely related to preimage resistance. As an exercise, you should think about why theoreticians do not need to take keyed hash functions for such a definition.

Second preimage resistance is needed when one discusses signature algorithms. Recall our signature algorithm from the previous post; given by S(m,sk) and V(m,s,pk). We sign a large message M by computing

s = S(H(M),sk)

and we verify the signature by checking whether

V(H(M),s,pk)=true

Now if the adversary obtains a pair (M,s), and he can find a second preimage M’ on M, then s is also a valid signature on the message M’. You should compare this to the attack in the previous blog where we used a break on collision resistance of the hash function to break the signature scheme. That attack required us to obtain a signature from the legitimate signer on a message of our choice, whereas this attack requires us to just obtain a signature on any message