The random oracle model has been used time and time again in security proofs within cryptography. However, no real implementation of a random oracle exists, and so are modeled by hash functions. Here I illustrate yet another paper, showing that modeling protocols with random oracles has theoretical limitations.
Background: The Random Oracle Model
Typically, cryptographers would like to prove their protocols secure in some fashion. For example, consider the following signature scheme:
Definition (Trapdoor Permutation).
A family of trapdoor permutations is a set of three algorithms:
$G$: A random algorithm which generates two keys, $pk$ and $sk$ where $pk$ is the public key and $sk$ is the secret key
$f$: An algorithm that takes in a key $pk$ and value $x$ such that $x \mapsto f_{pk}(x)$ is a bijection.
$f^{-1}$: An algorithm such that for every possible pair $(pk, sk)$, we have $$f^{-1}_{sk}(f_{pk}(x)) = x$$.
With this trapdoor permutation, we can construct the following signature scheme:
Signature Scheme. Let $(G, f, f^{-1})$ be a trapdoor permutation, and let $H$ be a random oracle.
- $\text{Gen}$: We run $G$ to get $(pk, sk)$.
- $\text{Sign}(m)$: We output $\sigma = f^{-1}_{sk}(H(m))$.
- $\text{Verify}(\sigma, m)$: We can verify $f_{pk}(\sigma) = H(m)$.
Here, $H$ is what we call our random oracle. The random oracle will choose a random output for any input, with no way to predict what the output is. This Random Oracle is the ideal functionality we want, however it is impossible to implement in real life. Therefore, we use standard hash functions such as SHA256 or BLAKE3, which are modeled by the random oracle in the ideal scenario.
This way, we can prove security of this protocol quite easily because an adversary can now assume that $H$ is completely random and uniform.
When the ROM Breaks
Now clearly, there are differences between hash functions and random oracles. For example, hash functions build from Merkle-Damgard suffer from length extension attacks. Knowing $\text{SHA256}(m)$ allows us to find $\text{SHA256}(m || x)$ for a specific $x$. This is distinctly unlike a random oracle, which will output a new independent random output for $m || x$.
Now, the length extension problem with SHA256 and other Merkle-Damgard based hash functions is not necessarily an issue. It can lead to issues, but in say encryption or password storage, this will not cause problems. So can we still "approximate" a random oracle using a hash function, and where exactly does this break? Interestingly, it was shown in [CGH98] that every hash function has a specific issue that distinguishes it from the random oracle model.
So here, our goal is given some hash function $H$ and a random oracle $O$, to be able to provide some kind of input which can distinguish between these two. Through this, we can then build a protocol which is secure in the random oracle model, but completely insecure with any ROM implementation.
Correlation Intractability
When discussing hash functions, typically hash functions (or keyed hash functions) are implemented as some sort of family of functions which can be evaluated in polynomial time.
This is made precise by the definition of a function ensemble, which is a group of functions $$ f_s: \{0, 1\}^{*} \to \{0, 1\}^{\ell} $$
such that these functions can be evaluated in polynomial time. Note that the group of functions varies over $s$, which is the "description" or "seed" of the function.
Every hash function that can be implemented can be described as a function ensemble: note that the seed is the program of the hash function.
Now, we want our function ensemble based hash function to satisfy random oracle like properties. For example, it should be hard to find a preimage (so given $y = H(x)$ can you find $x$?).
To do this, we can talk about evasive binary relations.
Definition (Evasive Binary Relation). A relation $R$ is evasive if for any probabilistic polynomial time oracle machine $M$, there is a negligible function such that $$\text{Pr}[x \gets M^\mathcal{O}\text{, and }(x, \mathcal{O}(x)) \in R] = \text{negl}$$ where $\mathcal{O}$ is a uniformly chosen function.
In essence, an evasive binary relation captures relations which are "hard to find". For example, the relation $\{(x, 0^{\ell})\}$ is evasive, as it is hard to find a string $x$ such that $(x, \mathcal{O}(x))$ is in the relation. It is hard to find $\mathcal{O}(x)$ equal to all zeroes.
Definition (Correlation Intractable). We say that a function ensemble is correlation intractable if for every PPT machine $M$, the following statement holds for all evasive relations $R$:
$$ \text{Pr}_{s}[x \gets M, (x, f_s(x)) \in R] = \text{negl} $$
A hash function is correlation intractable if it is hard to find a preimage $x$ such that $f_s(x)$ is in an evasive relation. This is one of the key points of a hash function, namely that it is always hard to find a preimage for "rare" occurring events, such as a zero string.
Impossibility
However, this property is impossible for any hash function to have, even if a random oracle has this property. Why?
Consider the evasive relation $R^\mathcal{F}$ defined by the set of all $(s, f_s(s))$ for a function ensemble $\mathcal{F}$. So here, we define a relation by evaluating the hash on its description or program. For example, we could take the code used to implement SHA256, and pass it into the SHA256 function.
Now notice that this relation is evasive, as for every $s$, there is at most one output $f_s(s)$ satisfying $(s, f_s(s)) \in R^\mathcal{F}$, and so intuitively, getting $f_s(s)$ is "rare" (see the paper for details).
However, consider the polynomial time machine $I: x \mapsto x$. We can see that evaluating this PPT machine $I(s)$ gives us $s$. Therefore, this violates correlation intractability, since
$$ \text{Pr}[(I(s), f_s(I(s))) \in R^\mathcal{F}] = 1 $$
Therefore we see that evaluating $f_s(s)$ gives us a unique value that breaks this correlation intractability property which random oracles possess.
Protocols
From this, we can build quite a fun protocol, which is secure in the random oracle model, but not secure when the oracle $\mathcal{O}$ is replaced with a function ensemble $\mathcal{F}$.
Let us consider a standard signature scheme $(\text{Gen}, \text{Sign}, \text{Verify})$, which is secure and does not need to use the random oracle model.
Then consider the new signing process $\text{Sign}'$:
- If $(m, f_s(m)) \in R^\mathcal{F}$, output the secret key.
- Else, output $\text{Sign}(m)$.
Notice that if this is implemented by a function ensemble, then this is not secure as an input of $s$ will satisfy the first check, and the signer outputs the secret key.
We can also construct a new verification process $\text{Verify}'$:
- If $(m, f_s(m)) \in R^\mathcal{F}$, accept
- Else, output $\text{Verify}(\sigma, m)$
Note that in both of these protocols, we are inserting a "backdoor", where if our input is the program of the hash, we immediately return and accept. Now, if this is analyzed under the random oracle model, that check occurs a negligible amount, and so the protocol is still secure. However, since no function ensemble can be correlation intractable, this check is insecure when implemented with a function ensemble.
Conclusion
What does this mean? Well, it doesn't mean the random oracle model is broken, but does mean it might need some patching. Passing the program string into a hash is very contrived, and does not seem great.
However, this result is quite important. It shows the theoretical limitations of random oracle proofs. In a sense, it reminds me of Turing's proof that no Turing machine can solve the halting problem, displaying the limits of computation. This also seems somewhat similar to Godel's incompleteness theorems, which showed the limits of the axiomatic system of mathematics.
Though cryptography is mainly about formalizing security in terms of models and mathematics, the random oracle model is quite controversial. However, cryptography relies on a number of hardness problems and assumptions. For example, we have relied for quite a while on the hardness of factorizing, and the fact that there are algorithms in $NP$ which are not in $P$. Quite a number of cryptographic protocols use very fundamental assumptions in complexity theory. And though the random oracle model differs from the real world, there are fundamental limitations which we may have to leave as an assumption.