Cryptography Lecture 10–11
1. Lecture 10: Hash Functions
1.1. Motivation: fingerprinting large objects
The lecture starts with a motivating example from large-object comparison.
Suppose we have a very large object, for example a DNA segment or gene snippet. The professor gives an example where each object may be around one gigabyte, and there may be hundreds of thousands of possible database entries or mutations.
A direct comparison is expensive:
- sending the entire object to the server is expensive;
- comparing the whole object against every database entry is expensive;
- if there are many database entries, the communication and computation cost becomes huge.
The idea is to compute a short fingerprint, also called a digest, of the large object.
Instead of comparing
\[ x \]
directly with every database object, we compute
\[ H(x) \]
and compare the short digest with stored digests.
A hash function therefore takes a potentially very large input and compresses it to a short output, for example \(128\) bits or \(256\) bits.
The professor emphasizes that this is useful even outside cryptography:
- speeding up comparisons;
- membership tests in large databases;
- reducing communication cost;
- obtaining a short handle for a large object.
The important intuition is:
\[ \boxed{ \text{Hashing compresses large objects into short fingerprints.} } \]
However, since the digest is much shorter than the input, it cannot be a statistically unique identifier for every possible input. Collisions must exist. Cryptography only asks that collisions should be hard to find.
1.2. Tangent: approximate matching and locality-preserving hashing
The professor briefly discusses a related but different problem.
Sometimes we do not want exact equality. For example, for text or images, we may want to know whether two objects are close:
- a sentence with one comma missing;
- a sentence with one word changed;
- an image shifted or rotated slightly;
- images that are semantically similar.
For this, ordinary cryptographic hash functions are not suitable, because a tiny change in the input should normally make the hash value look unrelated.
Instead, one can use locality-preserving hashing.
The rough idea is:
\[ d(x,y) \text{ small} \quad \Longrightarrow \quad d(H(x), H(y)) \text{ small}. \]
That is, if two objects are close in the source space, then their hashes should also be close.
This is useful for approximate matching, but it is not the same as a cryptographic hash function.
The professor mentions that images live in some metric space. A naive metric could compare images as large pixel vectors, but more semantic metrics such as optical flow can better capture shifts, rotations, or visually similar images.
The lecture then returns to cryptographic hash functions.
1.3. Syntax of hash functions
A hash function family is described by two algorithms:
\[ (\mathsf{Gen}, H). \]
1.3.1. Key generation / seed generation
\[ \mathsf{Gen}(1^\lambda) \]
is a probabilistic algorithm.
It takes the security parameter \(1^\lambda\) and outputs a key or seed
\[ s. \]
The professor prefers the word seed here, because unlike private encryption keys, this value is usually public.
So one should not think of \(s\) as a secret key in the same sense as in private key encryption.
1.3.2. Hash evaluation
For a fixed seed \(s\),
\[ H_s(x) \]
is a deterministic algorithm.
It takes:
\[ x \in \{0,1\}^* \]
and outputs
\[ h \in \{0,1\}^{\ell}. \]
Usually,
\[ \ell = \ell(\lambda). \]
The output length depends only on the security parameter, not on the length of the input.
So a hash function has the type:
\[ H_s : \{0,1\}^* \to \{0,1\}^{\ell(\lambda)}. \]
The key feature is that arbitrary-length inputs are mapped to fixed-length outputs.
1.3.3. Fixed-length hash functions / compression functions
If the function is only defined on inputs of a fixed length \(\ell'\), for example
\[ H_s : \{0,1\}^{\ell'} \to \{0,1\}^{\ell}, \qquad \ell' > \ell, \]
then it is called a fixed-length hash function or a compression function.
The lecture later uses this distinction when discussing the Merkle–Damgård paradigm, where one starts from a compression function and builds a hash function for longer inputs.
1.3.4. If \(\mathsf{Gen}\) is omitted
Sometimes the generation algorithm is not explicitly specified.
In that case, we assume
\[ s \leftarrow_{\$} \{0,1\}^{\lambda}. \]
That is, \(s\) is chosen uniformly at random.
1.4. Collision resistance
The main security notion introduced in this lecture is collision resistance.
A collision is a pair of distinct inputs
\[ x \neq x' \]
such that
\[ H_s(x) = H_s(x'). \]
Because the hash output is shorter than the input space, collisions always exist. The security requirement is only that they are computationally hard to find.
1.4.1. The collision-finding experiment
The experiment is called
\[ \mathsf{Hash\text{-}Col}_{\mathcal A}(\lambda). \]
It proceeds as follows:
The challenger samples
\[ s \leftarrow \mathsf{Gen}(1^\lambda). \]
- The challenger gives \(s\) to the adversary \(\mathcal A\).
The adversary outputs two strings
\[ x, x'. \]
The experiment outputs \(1\) if
\[ x \neq x' \quad\text{and}\quad H_s(x) = H_s(x'). \]
Otherwise it outputs \(0\).
So the adversary wins exactly when it finds a real hash collision.
1.4.2. Definition
A hash function \((\mathsf{Gen}, H)\) is collision-resistant if for every PPT adversary \(\mathcal A\), there exists a negligible function \(\nu\) such that for all \(\lambda \in \mathbb N\),
\[ \Pr[\mathsf{Hash\text{-}Col}_{\mathcal A}(\lambda)=1] < \nu(\lambda). \]
Equivalently, the hash function is insecure if there exists a PPT adversary \(\mathcal A\) and a non-negligible function \(\varepsilon\) such that
\[ \Pr[\mathsf{Hash\text{-}Col}_{\mathcal A}(\lambda)=1] \geq \varepsilon(\lambda). \]
The professor emphasizes that this experiment captures what it means for a hash function to be insecure: an adversary can produce a verifiable solution to a hard problem, namely a collision.
This differs from earlier indistinguishability games, where the adversary mainly had to guess a hidden bit.
Here the adversary outputs an object that can be directly checked.
1.5. Relation to PRGs, PRFs, and PRPs
The professor compares hash functions with earlier primitives.
1.5.1. PRGs are not automatically collision-resistant hash functions
A pseudorandom generator is not automatically a collision-resistant hash function.
A PRG has a different purpose: it expands a short seed into a longer pseudorandom-looking output.
Collision resistance asks for something different: it asks that, even after seeing the public seed \(s\), it should be hard to find
\[ x \neq x' \]
with
\[ H_s(x)=H_s(x'). \]
1.5.2. PRPs do not have the right syntax
A pseudorandom permutation maps fixed-length strings to fixed-length strings of the same length:
\[ F_k : \{0,1\}^{\lambda} \to \{0,1\}^{\lambda}. \]
A hash function usually compresses arbitrary-length inputs to short fixed-length outputs:
\[ H_s : \{0,1\}^* \to \{0,1\}^{\ell}. \]
So a PRP is not immediately a hash function.
1.5.3. PRFs only give security when the key is hidden
A pseudorandom function is secure only when its key is hidden.
In the PRF security experiment, the adversary only gets oracle access to the function. It does not see the key.
Hash functions are different. The seed \(s\) is public.
Therefore, for hash functions we need security even when the adversary knows \(s\).
The professor states the key contrast as:
\[ \boxed{ \text{PRF/PRG security usually assumes the key is hidden,} } \]
but
\[ \boxed{ \text{hash functions must remain secure even when the seed is public.} } \]
This is one reason why the word seed is more appropriate than key for hash functions.
1.6. Example 1: appending zeros preserves collision resistance
Assume \(H\) is collision-resistant.
Define
\[ H'_s(x) = H_s(x \parallel 0^\lambda). \]
Question:
\[ \text{Is } H' \text{ collision-resistant?} \]
Answer: \[ \boxed{\text{Yes.}} \]
1.6.1. Proof idea
Suppose \(H'\) is not collision-resistant.
Then there exists an adversary that finds
\[ x \neq x' \]
such that
\[ H'_s(x) = H'_s(x'). \]
By definition of \(H'\), this means
\[ H_s(x \parallel 0^\lambda) = H_s(x' \parallel 0^\lambda). \]
Since
\[ x \neq x', \]
we also have
\[ x \parallel 0^\lambda \neq x' \parallel 0^\lambda. \]
Therefore,
\[ (x \parallel 0^\lambda,\; x' \parallel 0^\lambda) \]
is a collision for \(H\).
So any collision for \(H'\) immediately gives a collision for \(H\).
Thus, if \(H\) is collision-resistant, then \(H'\) is also collision-resistant.
This is a standard reduction-by-contraposition argument:
\[ \text{collision for } H' \quad\Longrightarrow\quad \text{collision for } H. \]
1.7. Example 2: dropping the last bit destroys collision resistance
Now define
\[ H''_s(x_1,\ldots,x_n) = H_s(x_1,\ldots,x_{n-1}). \]
That is, \(H''\) ignores the last input bit.
Question:
\[ \text{Is } H'' \text{ collision-resistant?} \]
Answer: \[ \boxed{\text{No.}} \]
For example, take two inputs that differ only in the last bit:
\[ 0\ldots 00 \]
and
\[ 0\ldots 01. \]
They are distinct, but \(H''\) drops the last bit, so both are hashed as
\[ 0\ldots 0. \]
Therefore,
\[ H''_s(0\ldots 00) = H''_s(0\ldots 01). \]
This gives an easy collision.
The lesson is:
\[ \boxed{ \text{A cryptographic hash value must depend on every input bit.} } \]
The professor briefly adds that there are some very recent advanced research ideas where encodings allow hash computations to depend only on a logarithmic number of bits of an encoded message. But this is bleeding-edge research and not the standard notion used in this lecture.
For the classical cryptographic hash functions discussed here, ignoring even one input bit is fatal for collision resistance.
1.8. Beyond collision resistance: idealized hash functions
Collision resistance is useful, but sometimes it is not enough.
There are situations where we want stronger properties. For example, we may want it to be hard to find arbitrary correlations between the hash function and some other function.
One example discussed in the lecture is:
Given a fixed function \(f\), it should be hard to find \(x\) such that
\[ H_s(x) = f(x). \]
This is not merely a collision between two hash inputs. It is a more general correlation between \(H_s\) and another function.
The professor notes that the details matter. If the target value or function is chosen independently before the seed \(s\), then collision resistance may already imply some such properties. But if the adversary can choose the target after seeing \(s\), then some formulations become unachievable.
For example, if the adversary is allowed to choose a target \(T\) after seeing \(s\), it can simply choose
\[ T = H_s(x) \]
for some \(x\).
So single-target correlation notions can easily become either trivial or impossible.
The broader concept mentioned is correlation intractability, which may appear later in connection with the Fiat–Shamir paradigm.
1.9. Random oracle model
To model an ideal hash function, practitioners often use the random oracle model.
A random oracle is a uniformly random function
\[ \mathcal H : \{0,1\}^* \to \{0,1\}^{\lambda} \]
that can only be accessed by oracle or black-box access.
This means algorithms can query the oracle on inputs \(x\) and receive \(\mathcal H(x)\), but they cannot inspect the code or implementation.
The professor gives a programming analogy:
\[ \boxed{ \text{Think of a random oracle as a library function whose implementation is hidden.} } \]
1.9.1. Important property
If \(\mathcal H\) has not been queried at a point \(x\), then from the adversary’s point of view,
\[ \mathcal H(x) \]
is still uniformly random.
That is, a random oracle is uniform and independent on all unqueried points.
1.9.2. Why this helps in proofs
In security proofs, reductions can simulate and control the random oracle.
This often makes proofs much easier.
Instead of reasoning about the internal structure of a real hash function, the proof can use the fact that every fresh oracle answer is uniformly random.
This allows many proofs to become combinatorial or statistical.
1.10. Problems with the random oracle model
The professor strongly emphasizes that the random oracle model is conceptually problematic.
Real hash functions are not random oracles.
A real hash function has:
- a small description;
- concrete code;
- a concrete implementation;
- public algorithms.
A uniformly random function from a huge domain cannot have a small circuit or short implementation. To store it exactly, one would need its entire function table.
So the random oracle model does not literally describe real hash functions.
1.10.1. The two-step heuristic
In practice, people often use a two-step approach:
- Prove a construction secure assuming the hash function is a random oracle.
- In the real world, replace the random oracle with a concrete hash function, such as a member of the SHA family.
This is only a heuristic.
There is a gap:
\[ \text{secure with random oracle} \;\not\Rightarrow\; \text{secure with every real hash function}. \]
The professor states that proofs in the random oracle model are better than no proof at all, but they do not give the same kind of guarantee as a proof in the standard model.
1.10.2. Counterexamples
There exist protocols that are secure in the random oracle model but become insecure when the random oracle is replaced by any real-world hash function.
The professor says that many theoreticians dislike or frown upon the random oracle model because of this conceptual flaw.
However, the model remains popular because it leads to efficient constructions, especially efficient signature schemes.
1.10.3. What a random-oracle proof still tells us
If a scheme is secure in the random oracle model, then it is not trivially insecure.
If it becomes insecure with a real hash function, then the real-world adversary must do something non-trivial and interesting with the actual hash function implementation.
So the heuristic value is:
\[ \boxed{ \text{RO security rules out many simple attacks, but not all real-world attacks.} } \]
1.11. Example: random oracle gives a collision-resistant hash function
Let \(\mathcal H\) be a random oracle.
Define
\[ H_s(x) = \mathcal H(s \parallel x). \]
This is a salted hash construction. The seed \(s\) is prefixed to the input.
The claim is that \(H_s\) is collision-resistant when \(\mathcal H\) is modeled as a random oracle.
1.11.1. Proof sketch
The adversary has oracle access to \(\mathcal H\).
Since the adversary is PPT, it can make at most polynomially many oracle queries. Let the number of queries be
\[ q = \operatorname{poly}(\lambda). \]
Suppose the queried inputs are
\[ x_1,\ldots,x_q. \]
For any two distinct queries \(x_i \neq x_j\), the values
\[ \mathcal H(s \parallel x_i) \]
and
\[ \mathcal H(s \parallel x_j) \]
are independent uniformly random \(\lambda\)-bit strings.
Therefore,
\[ \Pr[ \mathcal H(s \parallel x_i) = \mathcal H(s \parallel x_j) ] = 2^{-\lambda}. \]
Now use the union bound over all pairs:
\[ \Pr[ \exists i,j : \mathcal H(s \parallel x_i) = \mathcal H(s \parallel x_j) ] \leq q^2 \cdot 2^{-\lambda}. \]
Since \(q\) is polynomial and \(2^{-\lambda}\) is negligible,
\[ q^2 \cdot 2^{-\lambda} \]
is negligible.
Thus the adversary only finds a collision with negligible probability.
The professor emphasizes that proofs in the random oracle model are often easy because the adversary must essentially get lucky.
1.12. Merkle–Damgård paradigm
The lecture then discusses how to extend a fixed-length compression function to a hash function with a larger domain.
Suppose we have a compression function
\[ H_K : \{0,1\}^{2\lambda} \to \{0,1\}^{\lambda}. \]
Can we build a hash function
\[ H'_K : \{0,1\}^{4\lambda} \to \{0,1\}^{\lambda}? \]
The idea is to iterate the compression function.
For input
\[ x = x_1 \parallel x_2, \]
where
\[ x_1,x_2 \in \{0,1\}^{2\lambda}, \]
define
\[ H'_K(x_1,x_2) = H_K\bigl(H_K(x_1) \parallel H_K(x_2)\bigr). \]
So the construction first hashes the two halves separately, then concatenates the two intermediate hash values and hashes them again.
This is a tree-like iteration of the compression function.
The professor says that one can recursively extend this idea to longer strings. If necessary, one can pad the input length to a power of two. Padding to the next power of two increases the length by at most a factor of two.
1.12.1. Simplified Merkle–Damgård theorem
If
\[ H_K : \{0,1\}^{2\lambda} \to \{0,1\}^{\lambda} \]
is collision-resistant, then
\[ H'_K : \{0,1\}^{4\lambda} \to \{0,1\}^{\lambda} \]
defined by
\[ H'_K(x_1,x_2) = H_K\bigl(H_K(x_1) \parallel H_K(x_2)\bigr) \]
is also collision-resistant.
1.12.2. Proof sketch
Use contraposition.
Assume there is a PPT adversary \(\mathcal A\) that finds a collision for \(H'_K\).
So \(\mathcal A\) outputs
\[ x = x_1 \parallel x_2 \]
and
\[ x' = x'_1 \parallel x'_2 \]
such that
\[ x \neq x' \]
but
\[ H'_K(x) = H'_K(x'). \]
Expanding the definition:
\[ H_K\bigl(H_K(x_1) \parallel H_K(x_2)\bigr) = H_K\bigl(H_K(x'_1) \parallel H_K(x'_2)\bigr). \]
There are three relevant cases.
- Case 1: left subcollision
If
\[ x_1 \neq x'_1 \]
and
\[ H_K(x_1)=H_K(x'_1), \]
then
\[ (x_1,x'_1) \]
is already a collision for \(H_K\).
- Case 2: right subcollision
If
\[ x_2 \neq x'_2 \]
and
\[ H_K(x_2)=H_K(x'_2), \]
then
\[ (x_2,x'_2) \]
is already a collision for \(H_K\).
- Case 3: no subcollision, but top-level collision
Otherwise, at least one of the intermediate values differs.
That is, the pair
\[ H_K(x_1) \parallel H_K(x_2) \]
is different from
\[ H_K(x'_1) \parallel H_K(x'_2). \]
But the top-level hashes are equal:
\[ H_K\bigl(H_K(x_1) \parallel H_K(x_2)\bigr) = H_K\bigl(H_K(x'_1) \parallel H_K(x'_2)\bigr). \]
Therefore,
\[ \Bigl( H_K(x_1) \parallel H_K(x_2), \; H_K(x'_1) \parallel H_K(x'_2) \Bigr) \]
is a collision for \(H_K\).
Thus, any collision for \(H'_K\) yields a collision for \(H_K\).
Therefore, if \(H_K\) is collision-resistant, then \(H'_K\) is also collision-resistant.
1.13. Lecture 10 takeaways
The lecture ends with the following takeaways:
- Hash functions compress large objects into short digests.
- Since the digest is short, collisions must exist.
- The digest is only computationally unique.
- Collision resistance captures the idea that collisions are hard to find.
- Hash functions differ from PRFs because their seed is public.
- Random oracles model ideal hash functions with stronger behavior.
- Random oracle proofs are often easy, but the model is only heuristic.
- Proofs in the random oracle model do not automatically hold in the real world.
- Merkle–Damgård-style iteration can extend a compression function to longer inputs while preserving collision resistance.
2. Lecture 11: Authentication
2.1. Addendum to hash functions: practical hashing and SHA-3
Lecture 11 begins with an addendum to the previous lecture.
The professor notes that for earlier primitives, such as PRGs, PRFs, and PRPs, he had usually mentioned how they are implemented in the real world. For hash functions, he had forgotten to include such a slide, so he adds a practical hashing discussion.
The main point is that practical cryptographic base primitives must be extremely efficient.
For primitives like AES and hash functions, the bottleneck is usually not whether we believe the primitive is secure. From a cryptanalytic perspective, standard constructions are believed to be very secure. The important practical issue is speed.
These primitives may be called billions of times on servers, so efficiency is critical.
2.2. Sponge construction and Keccak / SHA-3
The practical construction discussed is the sponge construction.
The professor identifies Keccak as the NIST-standardized construction behind SHA-3.
The sponge construction has two internal registers or parts of state:
- \(r\), often called the rate;
- \(c\), often called the capacity.
Both are initialized to zero.
The input message is split into blocks:
\[ p_0,p_1,\ldots,p_{n-1}. \]
Each block has length roughly corresponding to the rate \(r\), for example the professor mentions an order of magnitude such as \(64\) bits for illustration.
2.2.1. Absorbing phase
The sponge first absorbs or soaks the message.
For each block \(p_i\):
- XOR \(p_i\) into the \(r\)-part of the state.
- Apply a state transformation \(f\) to the whole internal state.
Schematically:
\[ (r,c) \leftarrow f(r \oplus p_i, c). \]
This is repeated until the whole message has been absorbed.
2.2.2. Squeezing phase
After absorption, the construction squeezes output blocks.
Instead of XORing in more message blocks, it repeatedly applies \(f\) and reads output from the \(r\)-part of the state.
This allows the construction to output a digest of the desired length, such as \(128\) or \(256\) bits.
2.2.3. The role of \(f\)
The internal function \(f\) is typically a permutation on the internal state.
The professor emphasizes that \(f\) is not the hash function itself.
The full sponge construction is the hash function.
Also, \(f\) is not a pseudorandom permutation in the formal keyed sense used earlier in the course. It is an unkeyed state transition function, chosen heuristically and analyzed cryptographically.
A useful mental model is that \(f\) should be a sufficiently complex permutation that preserves the entropy of the state while mixing it rapidly.
2.2.4. Security status
The professor says that one can show this construction secure in idealized models, for example when \(f\) is modeled as a random oracle or ideal primitive.
But in the real world, the best one can do is cryptanalysis.
So for SHA-3 / Keccak, confidence comes from standardization, design analysis, and cryptanalytic effort, not from a clean standard-model theorem saying that the concrete function is collision-resistant.
2.3. Transition to authentication
The lecture then moves to the new topic: authentication.
In the course-topic picture, this is still symmetric-key cryptography, but now the goal is authenticity rather than privacy.
Previously, the course studied symmetric-key encryption, whose goal is secrecy. Now the course studies message authentication codes, whose goal is message integrity.
The main message is:
\[ \boxed{ \text{Secrecy and integrity are different security goals.} } \]
Encryption by itself does not necessarily protect integrity.
2.4. Attack that does not break secrecy
The professor recalls an earlier example with two armies.
One army wants to send an encrypted command to another army. The possible messages are:
\[ m_0 = \operatorname{Binary}(\text{"Attack"}) \]
and
\[ m_1 = \operatorname{Binary}(\text{"Wait"}). \]
Suppose the message is encrypted with a one-time pad:
\[ c = K \oplus m_b. \]
An adversary intercepts \(c\). The adversary may not know whether \(b=0\) or \(b=1\), so secrecy is not necessarily broken.
But the adversary can compute
\[ c' = c \oplus m_0 \oplus m_1. \]
Then:
- if \(c\) encrypted \(m_0\), then \(c'\) decrypts to \(m_1\);
- if \(c\) encrypted \(m_1\), then \(c'\) decrypts to \(m_0\).
So the adversary can flip the meaning of the command without learning the original message.
This attack does not break secrecy, but it breaks integrity.
The professor notes that the same style of malleability attack also applies to the encryption schemes seen so far where encryption is essentially XORing a pseudorandom mask onto the message, such as:
- one-time pad;
- PRG-based private-key encryption;
- PRF-based CPA-secure encryption.
Some block-cipher modes may already provide limited structure that makes this specific attack less direct, but encryption alone should not be relied on for authenticity.
2.5. Message integrity: real-world examples
The lecture gives two main real-world examples.
2.5.1. Bank transactions
Suppose a user sends an instruction:
\[ \text{send } 100 \text{ euros}. \]
An attacker may try to change it to:
\[ \text{send } 1000 \text{ euros}. \]
Encryption may hide the message, but it does not necessarily prevent tampering.
2.5.2. Cookies
HTTP is stateless. Cookies allow web services to store state on the client side.
For example, a shopping cart or game state might be stored locally.
If cookies are not authenticated, the user may modify local state:
- changing a shopping cart;
- changing a game balance;
- giving themselves more coins;
- tampering with stored state.
The solution is to authenticate the cookie.
The server holds a secret key \(K\) and stores or checks something like:
\[ \mathsf{Mac}_K(\text{cookie}). \]
The general idea is:
\[ \boxed{ \text{Only someone who knows the secret key should be able to authenticate messages.} } \]
2.6. Syntax of message authentication codes
A message authentication code, or MAC, consists of three PPT algorithms:
\[ (\mathsf{Gen}, \mathsf{Mac}, \mathsf{Verify}). \]
2.6.1. Key generation
\[ \mathsf{Gen}(1^\lambda) \]
is randomized.
It takes the security parameter and outputs a key
\[ K. \]
2.6.2. Tag generation
\[ \mathsf{Mac}(K,m) \]
takes a key \(K\) and a message \(m\), and outputs an authentication tag
\[ t. \]
The MAC algorithm may be randomized.
2.6.3. Verification
\[ \mathsf{Verify}(K,m,t) \]
takes a key, a message, and a tag, and outputs a bit
\[ b \in \{0,1\}. \]
The verification algorithm is deterministic.
The meaning is:
\[ b=1 \]
means accept, and
\[ b=0 \]
means reject.
2.7. Correctness of MACs
Correctness says that honestly generated tags verify.
For all security parameters \(\lambda\), all messages \(m\), and keys
\[ K \leftarrow \mathsf{Gen}(1^\lambda), \]
we require
\begin{equation*} \Pr[ \mathsf{Verify}(K,m,\mathsf{Mac}(K,m))=1 ] = 1. \end{equation*}This is only a functionality requirement.
The professor addresses a student question: why do we not also require that \(\mathsf{Verify}\) is not the constant function that always outputs \(1\)?
The answer is that correctness only says honest behavior works. Abuse resistance is captured by the security definition.
A verification algorithm that always accepts will be ruled out by security, because then an adversary could easily forge tags.
2.8. Security of MACs: adversary resources and goal
To define security, the professor asks two questions.
2.8.1. What resources should the adversary get?
In the real world, an adversary may observe authenticated messages.
The conservative modeling choice is to let the adversary choose the messages whose tags it sees.
So the adversary receives oracle access to
\[ \mathsf{Mac}(K,\cdot). \]
It can ask for tags on messages
\[ m_1,\ldots,m_q \]
of its choice and receive
\[ t_i = \mathsf{Mac}(K,m_i). \]
This is analogous to chosen-plaintext attacks for encryption.
2.8.2. What is the adversary’s goal?
The real-world goal would be to authenticate a new meaningful message.
But “meaningful” is hard to formalize.
As in encryption security definitions, the formal definition avoids defining meaningfulness. Instead, it asks the adversary to authenticate any new message.
So the adversary wins if it outputs a valid tag for a message it did not previously query.
The cryptographic term is:
\[ \boxed{ \text{forge an authentication tag.} } \]
2.9. EUF-CMA security
The standard security notion is existential unforgeability under adaptive chosen message attacks.
Abbreviation:
\[ \mathsf{EUF\text{-}CMA}. \]
The name decomposes as follows:
- EUF = existential unforgeability, describing the adversary’s goal;
- CMA = chosen message attack, describing the adversary’s resource.
2.9.1. The experiment
The experiment is
\[ \mathsf{EUF\text{-}CMA}_{\mathcal A}(\lambda). \]
The challenger samples
\[ K \leftarrow \mathsf{Gen}(1^\lambda). \]
The adversary gets oracle access to
\[ \mathsf{Mac}(K,\cdot). \]
The adversary adaptively queries messages
\[ m_1,\ldots,m_q \]
and receives tags
\[ t_i = \mathsf{Mac}(K,m_i). \]
The adversary outputs
\[ (m^*,t^*). \]
The adversary wins if
\[ m^* \notin \{m_1,\ldots,m_q\} \]
and
\[ \mathsf{Verify}(K,m^*,t^*)=1. \]
2.9.2. Definition
A MAC is EUF-CMA secure if for every PPT adversary \(\mathcal A\), there exists a negligible function \(\nu\) such that for all \(\lambda\),
\[ \Pr[ \mathsf{EUF\text{-}CMA}_{\mathcal A}(\lambda)=1 ] < \nu(\lambda). \]
The professor notes that a random guess for a valid tag should succeed only with negligible probability, such as \(2^{-\lambda}\). This guessing probability is absorbed into the negligible term.
2.10. Strong unforgeability: sEUF-CMA
The lecture then defines a stronger notion:
\[ \mathsf{sEUF\text{-}CMA}. \]
In ordinary EUF-CMA, the adversary must output a tag for a new message:
\[ m^* \notin \{m_1,\ldots,m_q\}. \]
In strong EUF-CMA, the adversary is allowed to reuse an old message, but not an old message-tag pair.
The adversary wins if
\[ (m^*,t^*) \notin \{(m_1,t_1),\ldots,(m_q,t_q)\} \]
and
\[ \mathsf{Verify}(K,m^*,t^*)=1. \]
So if the adversary reuses a previously queried message \(m_i\), it must produce a different valid tag \(t^* \neq t_i\).
2.10.1. Relation between the notions
Strong unforgeability is stricter:
\[ \boxed{ \mathsf{sEUF\text{-}CMA} \Longrightarrow \mathsf{EUF\text{-}CMA}. } \]
Equivalently, if an adversary breaks EUF-CMA, then it also breaks sEUF-CMA, because a new-message forgery is automatically a new message-tag pair.
The professor spends time clarifying the implication direction.
The reason the strong notion is stricter is that it forbids more attacks. It also forbids producing a second valid tag for an already-authenticated message.
For many natural applications, this may seem unnecessary, but later cryptographic transformations may require strong unforgeability.
2.11. Using MACs and replay attacks
The professor then discusses using MACs in practice.
Suppose Bob wants to send Alice \(100\) euros.
A natural authenticated message might be:
\[ ((100\text{€}, \text{Bob}, \text{Alice}), t). \]
The tag \(t\) authenticates the tuple.
Now an attacker cannot simply change Alice to the attacker’s name or change \(100\) to \(1000\), because that would require forging a valid tag for a new message.
Such tampering is captured by EUF-CMA security.
2.11.1. Replay attack
However, the attacker can still replay the same authenticated message:
\[ ((100\text{€}, \text{Bob}, \text{Alice}), t) \]
again and again.
Each copy is still valid, because the tag is valid for that exact message.
So the bank might process the transfer multiple times.
This is called a replay attack.
MACs do not prevent replay attacks by themselves.
This is not a contradiction to EUF-CMA security, because replaying an old authenticated message is not a forgery under the security experiment.
The adversary does not produce a new message or a new message-tag pair. It simply reuses an old one.
2.11.2. Preventing replay attacks
To prevent replay attacks, the protocol needs additional mechanisms, such as a timestamp:
\[ ((100\text{€}, \text{Bob}, \text{Alice}, \text{time}), t). \]
The verifier can check whether the timestamp is fresh.
Other possible mechanisms include nonces, sequence numbers, or stateful replay protection, although the transcript mainly discusses timestamps.
The key lesson is:
\[ \boxed{ \text{MACs protect integrity, but they are not a catch-all solution.} } \]
Replay protection is an additional protocol-level requirement.
2.12. Constructing MACs from PRFs
The next question is how to construct MACs.
The security definition says it should be hard to find a valid tag for a new message.
This suggests using a function that is unpredictable on new inputs.
The primitive from earlier lectures with exactly this flavor is a pseudorandom function.
So the construction uses a PRF.
2.13. MACs for fixed-length messages
Assume for now that messages have fixed bit-length \(n\).
Let
\[ \mathsf{PRF} : \{0,1\}^{\lambda} \times \{0,1\}^{n} \to \{0,1\}^{\lambda} \]
be a pseudorandom function.
Define a MAC as follows.
2.13.1. Key generation
\[ \mathsf{Gen}(1^\lambda): \qquad K \leftarrow_{\$} \{0,1\}^{\lambda}. \]
2.13.2. Tag generation
\[ \mathsf{Mac}(K,m): \qquad t := \mathsf{PRF}_K(m). \]
2.13.3. Verification
\[ \mathsf{Verify}(K,m,t): \]
output \(1\) iff
\[ t = \mathsf{PRF}_K(m). \]
Otherwise output \(0\).
2.13.4. Correctness
Correctness is immediate:
\[ \mathsf{Verify}(K,m,\mathsf{Mac}(K,m))=1 \]
because
\[ \mathsf{Mac}(K,m)=\mathsf{PRF}_K(m). \]
So verification recomputes exactly the same value.
2.14. Theorem: PRF-based MAC is strongly secure
The theorem stated in the lecture is:
If \(\mathsf{PRF}\) is a pseudorandom function, then the above construction is a strongly EUF-CMA secure MAC.
Formally:
\[ \boxed{ \mathsf{PRF}\text{ secure} \Longrightarrow \mathsf{Mac}_K(m)=\mathsf{PRF}_K(m) \text{ is sEUF-CMA secure.} } \]
The professor proves this by a hybrid argument and reduction to PRF security.
2.15. Proof idea: replace PRF by a truly random function
Assume toward contradiction that the MAC is not sEUF-CMA secure.
Then there is a PPT adversary \(\mathcal A\) that wins the sEUF-CMA experiment with non-negligible probability \(\varepsilon\).
In the real experiment, tags are computed as
\[ t_i = \mathsf{PRF}_K(m_i). \]
Now define a hybrid experiment \(X'\), where the PRF is replaced by a truly random function
\[ H : \{0,1\}^n \to \{0,1\}^{\lambda}. \]
So in \(X'\), tags are
\[ t_i = H(m_i). \]
Verification checks whether
\[ t^* = H(m^*). \]
The proof has two parts:
- The adversary’s winning probability in the real experiment and in \(X'\) differs only negligibly, otherwise we could distinguish the PRF from a random function.
- The adversary’s winning probability in \(X'\) is negligible, because on a new input the random function value is uniformly random and unknown.
2.16. Inner reduction: distinguishing PRF from random function
Suppose the adversary’s success probability differs non-negligibly between:
- the real sEUF-CMA experiment using \(\mathsf{PRF}_K\);
- the hybrid experiment \(X'\) using a random function \(H\).
Then build a distinguisher \(\mathcal D\) against PRF security.
The distinguisher \(\mathcal D\) gets oracle access to a function \(O\), where \(O\) is either:
\[ O = \mathsf{PRF}_K \]
or
\[ O = H \]
for a truly random function \(H\).
\(\mathcal D\) simulates the MAC experiment for \(\mathcal A\).
When \(\mathcal A\) queries a message \(m_i\), \(\mathcal D\) forwards it to its own oracle:
\[ t_i := O(m_i). \]
Then \(\mathcal D\) returns \(t_i\) to \(\mathcal A\).
\(\mathcal D\) stores the list
\[ L = \{(m_1,t_1),\ldots,(m_q,t_q)\}. \]
Eventually \(\mathcal A\) outputs
\[ (m^*,t^*). \]
The distinguisher checks whether
\[ (m^*,t^*) \notin L \]
and
\[ t^* = O(m^*). \]
If yes, \(\mathcal D\) outputs \(1\); otherwise it outputs \(0\).
2.16.1. Case 1: \(O=\mathsf{PRF}_K\)
Then \(\mathcal D\) simulates the real sEUF-CMA experiment exactly.
Therefore,
\[ \Pr[\mathcal D^{\mathsf{PRF}_K}=1] = \Pr[\mathsf{sEUF\text{-}CMA}_{\mathcal A}=1]. \]
2.16.2. Case 2: \(O=H\)
Then \(\mathcal D\) simulates the hybrid experiment \(X'\) exactly.
Therefore,
\[ \Pr[\mathcal D^{H}=1] = \Pr[X'_{\mathcal A}=1]. \]
If these probabilities differ non-negligibly, then \(\mathcal D\) distinguishes \(\mathsf{PRF}_K\) from \(H\) with non-negligible advantage, contradicting PRF security.
Thus,
\begin{equation*} \Pr[\mathsf{sEUF\text{-}CMA}_{\mathcal A}=1] \leq \Pr[X'_{\mathcal A}=1] + \operatorname{negl}(\lambda). \end{equation*}It remains to show that
\[ \Pr[X'_{\mathcal A}=1] \]
is negligible.
2.17. Bounding the adversary’s success in the random-function hybrid
In \(X'\), the adversary sees
\[ (m_1,H(m_1)),\ldots,(m_q,H(m_q)). \]
Then it outputs
\[ (m^*,t^*). \]
There are two cases.
2.17.1. Case 1: \(m^*=m_i\) for some queried message
Since \(H\) is a function, there is only one valid tag for \(m_i\), namely
\[ H(m_i)=t_i. \]
If the adversary outputs
\[ t^*=t_i, \]
then
\[ (m^*,t^*)=(m_i,t_i) \]
is already in the query list, so the sEUF-CMA experiment outputs \(0\).
If the adversary outputs
\[ t^* \neq t_i, \]
then verification fails, because the only valid tag is \(H(m_i)\).
So in this case the adversary cannot win.
2.17.2. Case 2: \(m^*\notin \{m_1,\ldots,m_q\}\)
Then \(H(m^*)\) has never been queried.
Since \(H\) is a truly random function,
\[ H(m^*) \]
is uniformly random from the adversary’s point of view.
The adversary’s chosen \(t^*\) is independent of this fresh random value.
Thus,
\[ \Pr[t^*=H(m^*)] = 2^{-\lambda}. \]
So
\[ \Pr[X'_{\mathcal A}=1] \leq 2^{-\lambda}. \]
Since \(2^{-\lambda}\) is negligible, and the difference between \(X'\) and the real experiment is negligible, the adversary’s real winning probability is negligible.
Therefore, the PRF-based MAC is sEUF-CMA secure.
2.18. Preview: MACs for long messages
At the end of the lecture, the professor says that the next topic is extending MACs from fixed-length messages to arbitrary-length messages.
The idea will be to use hash functions.
The construction shown in the slides, although only previewed at the end of the transcript, is:
Let
\[ H_s : \{0,1\}^* \to \{0,1\}^{\lambda} \]
be a collision-resistant hash function, and let
\[ \mathsf{MAC} = (\mathsf{Gen},\mathsf{Mac},\mathsf{Verify}) \]
be a MAC for messages of length \(\lambda\).
Define a new MAC for long messages:
\[ K = (s,K') \]
where
\[ K' \leftarrow \mathsf{Gen}(1^\lambda), \qquad s \xleftarrow{\$} \{0,1\}^{\lambda}. \]
Then
\[ \mathsf{Mac}'(K,m) = \mathsf{Mac}(K',H_s(m)). \]
Verification recomputes the hash and verifies the MAC tag on the digest.
The corresponding theorem is:
If \(H\) is collision-resistant and the underlying MAC is sEUF-CMA secure, then the composed MAC is also sEUF-CMA secure.
The professor says this will be discussed next time.
2.19. Lecture 11 takeaways
The lecture ends with these key messages:
- Practical hash functions such as SHA-3 use highly optimized constructions, such as the sponge construction.
- The sponge construction has absorbing and squeezing phases.
- Keccak is the NIST-standardized SHA-3 construction.
- Message secrecy and message integrity are fundamentally different.
- Encryption does not automatically guarantee integrity.
- MACs protect message integrity.
- EUF-CMA security says an adversary cannot forge a valid tag for a new message.
- sEUF-CMA security also prevents producing a new valid tag for an old message.
- MACs do not prevent replay attacks.
- Replay protection requires extra protocol mechanisms, such as timestamps.
- PRFs give a clean construction of fixed-length MACs.
- The proof of security uses a hybrid argument: replace the PRF by a random function, then show that forging requires guessing a fresh random value.