Basics of Private Key Encryption 1

1. Context of this lecture

This lecture continues from the previous discussion of perfect secrecy.

The lecture has two main parts:

Finish the proof that the three definitions of perfect secrecy are equivalent.
Explain why perfect secrecy is too strong for modern cryptography, and introduce the move toward computational and asymptotic security.

The lecture starts by revisiting the last missing implication in the equivalence theorem:

\[ (1) \Rightarrow (3). \]

Then it proves Shannon’s theorem, which shows that perfectly secret encryption necessarily requires keys as large as the message space.

After that, the lecture begins the next topic: modern private-key encryption, where security is only required against computationally bounded adversaries.

2. Recap: three definitions of perfect secrecy

We have an encryption scheme

\[ (\mathsf{KeyGen}, \mathsf{Enc}, \mathsf{Dec}) \]

with key space $\mathcal K$, message space $\mathcal M$, and ciphertext space $\mathcal C$.

2.1. Definition 1: message-ciphertext independence

An encryption scheme is perfectly secret if, for every random variable $M$ supported on $\mathcal M$, the message $M$ and the ciphertext

\[ C = \mathsf{Enc}(K,M) \]

are independent, where

\[ K \leftarrow \mathsf{KeyGen}(). \]

Equivalently, for all $m \in \mathcal M$ and all $c \in \mathcal C$ with

\[ \Pr[\mathsf{Enc}(K,M)=c] > 0, \]

we have

\[ \Pr[M=m \mid \mathsf{Enc}(K,M)=c] = \Pr[M=m]. \]

Intuition:

seeing the ciphertext does not change the posterior distribution of the message;
the adversary learns nothing about the plaintext from the ciphertext.

2.2. Definition 2: identical ciphertext distributions

An encryption scheme is perfectly secret if, for all messages

\[ m,m' \in \mathcal M \]

and all ciphertexts

\[ c \in \mathcal C, \]

we have

\[ \Pr[\mathsf{Enc}(K,m)=c] = \Pr[\mathsf{Enc}(K,m')=c], \]

where

\[ K \leftarrow \mathsf{KeyGen}(). \]

Intuition:

every message induces exactly the same ciphertext distribution;
no ciphertext can statistically favor one plaintext over another.

2.3. Definition 3: ciphertext indistinguishability experiment

Definition 3 makes the adversary explicit.

The experiment is:

The adversary $\mathcal A$ outputs two messages

\[ m_0,m_1 \in \mathcal M. \]
The challenger samples

\[ K \leftarrow \mathsf{KeyGen}() \]

and

\[ b \leftarrow_{\$} \{0,1\}. \]
The challenger computes

\[ c^* \leftarrow \mathsf{Enc}(K,m_b) \]

and sends $c^*$ to $\mathcal A$.
The adversary outputs a guess

\[ b' \in \{0,1\}. \]
The experiment outputs $1$ if $b'=b$, and $0$ otherwise.

Let

\[ \mathsf{IND}_{\mathcal A} \]

be the random variable describing the output of this experiment.

The scheme is perfectly secret if, for every adversary $\mathcal A$,

\[ \Pr[\mathsf{IND}_{\mathcal A}=1] = \frac12. \]

Intuition:

even if the adversary chooses two candidate messages,
and receives an encryption of one of them,
it cannot guess which one was encrypted with probability better than random guessing.

3. Theorem: the three definitions are equivalent

The theorem states:

\[ \text{Definitions 1, 2, and 3 of perfect secrecy are equivalent.} \]

The previous lecture already proved:

\[ (2) \Rightarrow (1) \]

and

\[ (3) \Rightarrow (2). \]

This lecture finishes the remaining implication:

\[ (1) \Rightarrow (3). \]

4. Proof of $(1) \Rightarrow (3)$

4.1. Goal

Assume Definition 1 holds.

We want to prove Definition 3.

That is, for every adversary $\mathcal A$, we need to show

\[ \Pr[\mathsf{IND}_{\mathcal A}=1] = \frac12. \]

The lecture first proves this for deterministic adversaries.

For randomized adversaries, the same idea generalizes by averaging over the adversary’s random coins, using linearity of expectation.

4.2. Setup

Let $\mathcal A$ be any deterministic adversary.

The adversary outputs two messages:

\[ m_0,m_1 \in \mathcal M. \]

The challenger samples

\[ b \leftarrow_{\$} \{0,1\} \]

and encrypts

\[ m_b. \]

Define a random variable $M$ by

\begin{equation*} M = \begin{cases} m_0 & \text{with probability } \frac12,\\ m_1 & \text{with probability } \frac12. \end{cases} \end{equation*}

Equivalently,

\[ M = m_b. \]

Thus the IND experiment’s challenge ciphertext can be written as

\[ C^* = \mathsf{Enc}(K,M). \]

After receiving $C^*$, the adversary outputs

\[ b' = \mathcal A(C^*). \]

Define a helper function $m(\cdot)$ on ciphertexts as follows:

\[ m(c) := m_{\mathcal A(c)}. \]

In particular,

\[ m(C^*) = m_{b'}. \]

So $m(C^*)$ is the message corresponding to the adversary’s guess.

4.3. Winning condition

The adversary wins iff

\[ b'=b. \]

If $m_0 \neq m_1$, this is equivalent to

\[ m_{b'} = m_b. \]

Since

\[ M = m_b \]

and

\[ m(C^*) = m_{b'}, \]

we get

\[ \mathsf{IND}_{\mathcal A}=1 \iff M = m(C^*). \]

If $m_0=m_1$, then the ciphertext encrypts the same message regardless of $b$, and the adversary cannot have meaningful information about $b$. In that case the winning probability is trivially $\frac12$. Therefore the interesting case is $m_0 \neq m_1$.

4.4. Expand by conditioning on ciphertexts

We compute:

\[ \Pr[\mathsf{IND}_{\mathcal A}=1] = \Pr[M=m(C^*)]. \]

Using the law of total probability over all possible ciphertexts $c\in\mathcal C$,

\[ \Pr[M=m(C^*)] = \sum_{c\in\mathcal C} \Pr[M=m(c)\mid C^*=c]\cdot \Pr[C^*=c]. \]

Since

\[ C^* = \mathsf{Enc}(K,M), \]

this becomes

\[ \Pr[M=m(C^*)] = \sum_{c\in\mathcal C} \Pr[M=m(c)\mid \mathsf{Enc}(K,M)=c]\cdot \Pr[\mathsf{Enc}(K,M)=c]. \]

4.5. Apply Definition 1

By Definition 1, $M$ and $\mathsf{Enc}(K,M)$ are independent.

Therefore, for every ciphertext $c$,

\[ \Pr[M=m(c)\mid \mathsf{Enc}(K,M)=c] = \Pr[M=m(c)]. \]

\[ \Pr[M=m(C^*)] = \sum_{c\in\mathcal C} \Pr[M=m(c)]\cdot \Pr[\mathsf{Enc}(K,M)=c]. \]

Now $m(c)$ is a fixed value once $c$ is fixed.

Because $M$ is uniform over $\{m_0,m_1\}$, we have

\[ \Pr[M=m(c)] = \frac12. \]

Hence

\[ \Pr[M=m(C^*)] = \sum_{c\in\mathcal C} \frac12 \cdot \Pr[\mathsf{Enc}(K,M)=c]. \]

Pull out the constant:

\[ \Pr[M=m(C^*)] = \frac12 \sum_{c\in\mathcal C} \Pr[\mathsf{Enc}(K,M)=c]. \]

The ciphertext must take some value in $\mathcal C$, so

\begin{equation*} \sum_{c\in\mathcal C} \Pr[\mathsf{Enc}(K,M)=c] = 1. \end{equation*}

Therefore

\[ \Pr[M=m(C^*)] = \frac12. \]

Thus

\[ \Pr[\mathsf{IND}_{\mathcal A}=1] = \frac12. \]

This proves Definition 3.

4.6. Conclusion

We have shown:

\[ (1) \Rightarrow (3). \]

Together with the previous implications,

\[ (2) \Rightarrow (1) \]

and

\[ (3) \Rightarrow (2), \]

we get the full equivalence:

\[ (1) \Longleftrightarrow (2) \Longleftrightarrow (3). \]

5. Shannon’s theorem

5.1. Motivation

Perfect secrecy gives an extremely strong security guarantee.

The one-time pad is perfectly secret, but it has two major drawbacks:

the key must be as long as the message;
the key can only be used once.

The natural question is:

Can we construct a better perfectly secret encryption scheme with shorter reusable keys?

Shannon’s theorem says no.

5.2. Statement

Let

\[ (\mathsf{KeyGen},\mathsf{Enc},\mathsf{Dec}) \]

be a perfectly secret encryption scheme with key space $\mathcal K$, message space $\mathcal M$, and ciphertext space $\mathcal C$.

Then

\begin{equation*} |\mathcal K| \ge |\mathcal M|. \end{equation*}

This means that the key space must be at least as large as the message space.

\[ \mathcal K = \{0,1\}^{\ell} \]

and

\[ \mathcal M = \{0,1\}^{n}, \]

then

\begin{equation*} |\mathcal K| = 2^\ell \end{equation*}

and

\begin{equation*} |\mathcal M| = 2^n. \end{equation*}

Therefore

\[ 2^\ell \ge 2^n, \]

\[ \ell \ge n. \]

Thus the key length must be at least the message length.

5.3. Proof idea

We prove the theorem by contraposition.

Instead of proving

\[ \text{perfect secrecy} \Rightarrow |\mathcal K| \ge |\mathcal M|, \]

we prove

\begin{equation*} |\mathcal K| < |\mathcal M| \Rightarrow \text{not perfectly secret}. \end{equation*}

We will show that if

\begin{equation*} |\mathcal K| < |\mathcal M|, \end{equation*}

then Definition 2 of perfect secrecy is violated.

Definition 2 requires that for all $m,m'\in\mathcal M$ and all $c\in\mathcal C$,

\[ \Pr[\mathsf{Enc}(K,m)=c] = \Pr[\mathsf{Enc}(K,m')=c]. \]

So it is enough to find messages $m,m'$ and a ciphertext $\hat c$ such that

\[ \Pr[\mathsf{Enc}(K,m)=\hat c] > 0 \]

but

\[ \Pr[\mathsf{Enc}(K,m')=\hat c] = 0. \]

That would directly contradict Definition 2.

5.4. Constructing the contradiction

Assume

\begin{equation*} |\mathcal K| < |\mathcal M|. \end{equation*}

Pick an arbitrary message

\[ m \in \mathcal M. \]

Pick an arbitrary key

\[ k_0 \in \mathcal K. \]

This key does not need to be sampled randomly; it can be any key.

Define

\[ \hat c := \mathsf{Enc}(k_0,m). \]

Because $k_0$ is a possible key, the probability that encryption of $m$ outputs $\hat c$ is positive:

\[ \Pr[\mathsf{Enc}(K,m)=\hat c] > 0. \]

Now define the set

\[ S := \{\, m' \in \mathcal M \mid \exists k \in \mathcal K: \mathsf{Dec}(k,\hat c)=m' \,\}. \]

In words:

fix the ciphertext $\hat c$;
decrypt $\hat c$ under every possible key;
collect all messages that can appear as a decryption result.

So $S$ is the set of all messages reachable from $\hat c$ by decrypting with some key.

5.5. First observation: $m\in S$

Since

\[ \hat c = \mathsf{Enc}(k_0,m), \]

correctness gives

\[ \mathsf{Dec}(k_0,\hat c) = \mathsf{Dec}(k_0,\mathsf{Enc}(k_0,m)) = m. \]

Therefore

\[ m \in S. \]

5.6. Second observation: $|S|\le |\mathcal K|$

For every fixed key $k$, because decryption is deterministic,

\[ \mathsf{Dec}(k,\hat c) \]

has at most one output message.

So each key contributes at most one message to $S$.

Define the function

\[ f:\mathcal K \to S \]

\[ f(k) := \mathsf{Dec}(k,\hat c). \]

By definition of $S$, the function $f$ is surjective onto $S$.

Therefore,

\begin{equation*} |S| \le |\mathcal K|. \end{equation*}

Since we assumed

\begin{equation*} |\mathcal K| < |\mathcal M|, \end{equation*}

we get

\begin{equation*} |S| < |\mathcal M|. \end{equation*}

So $S$ cannot contain all messages.

Thus

\[ \mathcal M \setminus S \neq \varnothing. \]

Therefore there exists some message

\[ m' \in \mathcal M \setminus S. \]

By definition of $S$, this means that for every key $k\in\mathcal K$,

\[ \mathsf{Dec}(k,\hat c) \neq m'. \]

5.7. Why $m'$ cannot encrypt to $\hat c$

Suppose, for contradiction, that there exists a key $k'\in\mathcal K$ such that

\[ \mathsf{Enc}(k',m') = \hat c. \]

Then by correctness,

\[ \mathsf{Dec}(k',\hat c) = \mathsf{Dec}(k',\mathsf{Enc}(k',m')) = m'. \]

But then

\[ m' \in S, \]

contradicting the choice

\[ m' \in \mathcal M \setminus S. \]

Therefore no key can encrypt $m'$ to $\hat c$.

Hence

\[ \Pr[\mathsf{Enc}(K,m')=\hat c] = 0. \]

But we already had

\[ \Pr[\mathsf{Enc}(K,m)=\hat c] > 0. \]

\[ \Pr[\mathsf{Enc}(K,m)=\hat c] \neq \Pr[\mathsf{Enc}(K,m')=\hat c]. \]

This violates Definition 2 of perfect secrecy.

Therefore, if

\begin{equation*} |\mathcal K| < |\mathcal M|, \end{equation*}

the scheme cannot be perfectly secret.

By contraposition, every perfectly secret scheme satisfies

\begin{equation*} |\mathcal K| \ge |\mathcal M|. \end{equation*}

5.8. Consequence

Shannon’s theorem is a fundamental limitation on perfect secrecy.

Perfect secrecy is possible, but only at the cost of huge key material.

For bitstrings, if messages have length $n$, then keys need length at least $n$.

So perfect secrecy does not allow short reusable keys.

6. Summary of the perfect secrecy part

The lecture summarizes the previous two lectures as follows:

We defined the syntax of encryption schemes.
We defined correctness.
We defined perfect secrecy as ciphertext-message independence.
We proved that the one-time pad is perfectly secret.
We introduced two alternative definitions of perfect secrecy.
We proved that all three definitions are equivalent.
We proved Shannon’s theorem:
\begin{equation*} |\mathcal K| \ge |\mathcal M|. \end{equation*}
Therefore perfectly secret encryption requires keys as large as the messages they encrypt.
Such keys can only be used once.

The third definition, based on the indistinguishability experiment, will be the most useful one for modern cryptography.

7. From perfect secrecy to modern private-key encryption

The lecture then moves to modern private-key encryption.

We remain in the symmetric-key part of cryptography.

The course-topic map is roughly:

symmetric-key cryptography;
public-key cryptography;
privacy;
authenticity;
advanced cryptographic tasks.

The current focus is still symmetric-key encryption.

7.1. Perfect secrecy is an overkill

Perfect secrecy holds against every conceivable adversary.

That includes adversaries with unlimited computational power.

But in the real world, adversaries are computationally bounded.

For example, an adversary that needs

\[ 10^{100} \]

times the age of the universe to break a scheme is not practically relevant.

So we want to move from:

\[ \text{all adversaries} \]

to:

\[ \text{realistic efficient adversaries}. \]

The core idea is:

\[ \text{security should hold against computationally bounded adversaries.} \]

7.2. Perfect secrecy may also be an underkill

Perfect secrecy only protects secrecy.

But encryption might still be vulnerable to attacks that do not reveal the plaintext.

So secrecy alone may not capture every security goal we intuitively want.

The lecture gives an attack on the one-time pad that does not break secrecy, but still causes damage.

8. Side note: an attack that does not break secrecy

8.1. Scenario

There are two armies trying to coordinate an attack on a castle.

One commander wants to send one of two possible messages to the other commander:

\[ m_0 = \mathsf{Binary}(\text{"Attack"}) \]

and

\[ m_1 = \mathsf{Binary}(\text{"Wait!"}). \]

The message is encrypted using a one-time pad:

\[ c = K \oplus m_b, \]

where

\[ b \in \{0,1\}. \]

The adversary intercepts the ciphertext $c$.

The adversary does not need to learn whether the original message was “Attack” or “Wait”.

Instead, the adversary only wants to disrupt coordination.

8.2. Attack

The adversary computes

\[ c' = c \oplus m_0 \oplus m_1. \]

Since

\[ c = K \oplus m_b, \]

we get

\[ c' = K \oplus m_b \oplus m_0 \oplus m_1. \]

If $b=0$, then

\[ m_b = m_0, \]

\[ c' = K \oplus m_0 \oplus m_0 \oplus m_1 = K \oplus m_1. \]

If $b=1$, then

\[ m_b = m_1, \]

\[ c' = K \oplus m_1 \oplus m_0 \oplus m_1 = K \oplus m_0. \]

Thus in both cases,

\[ c' = K \oplus m_{1-b}. \]

So $c'$ decrypts to the opposite message.

8.3. Effect

The adversary flips the meaning of the message:

\[ \text{Attack} \leftrightarrow \text{Wait}. \]

The adversary does not learn the message.

But the adversary can still make one army attack while the other waits.

Therefore:

the one-time pad remains perfectly secret;
the adversary still achieves a harmful goal;
secrecy alone does not imply message integrity or authenticity.

This motivates later notions such as message authentication.

9. Computational security: first idea

Now the lecture returns to the problem of secrecy with short keys.

The main idea is:

\[ \text{limit the runtime of adversaries}. \]

Instead of considering all adversaries, consider only adversaries whose runtime is bounded by some realistic upper bound $T$.

For example, textbooks used to consider

\[ T = 2^{80} \]

as an extremely large bound.

Today, one often considers something closer to

\[ T = 2^{128} \]

to have more security margin.

The lecture notes that:

\[ 2^{10} \approx 10^3, \]

\[ 2^{20} \approx 10^6, \]

\[ 2^{30} \approx 10^9, \]

\[ 2^{40} \approx 10^{12}. \]

\[ 2^{80} \]

is enormous.

9.1. Attempt 1: $T$-IND security

Take Definition 3 of perfect secrecy and restrict the adversaries to $T$-bounded adversaries.

A first attempted definition:

An encryption scheme is $T$-IND secure if, for every $T$-bounded adversary $\mathcal A$,

\[ \Pr[\mathsf{IND}_{\mathcal A}=1] = \frac12. \]

The experiment is the same indistinguishability experiment:

$\mathcal A$ sends $m_0,m_1$.
Challenger samples $K\leftarrow\mathsf{KeyGen}()$.
Challenger samples $b\leftarrow_{\$}\{0,1\}$.
Challenger sends

\[ c^* \leftarrow \mathsf{Enc}(K,m_b). \]
$\mathcal A$ outputs $b'$.
The experiment outputs $1$ iff $b'=b$.

9.2. What does $T$ measure?

A student asks whether $T$ means “operations”.

The lecturer says that, informally, $T$ can be thought of as operations.

But if we want to include parallel computation, then a better theoretical model is circuit size.

Circuit size captures the total number of operations, both sequential and parallel.

So $T$ is not a perfectly technology-independent measure.

10. Why Attempt 1 fails

The attempted definition is too strong.

It still essentially forces perfect secrecy.

The reason comes from the earlier proof of

\[ (3) \Rightarrow (2). \]

Suppose Definition 2 fails.

Then there exist messages $m_0,m_1$ and a ciphertext $c$ such that

\[ \Pr[\mathsf{Enc}(K,m_0)=c] > \Pr[\mathsf{Enc}(K,m_1)=c]. \]

Then we can construct a very simple adversary $\mathcal A$:

Output $m_0,m_1$.
Receive challenge ciphertext $c^*$.
If

\[ c^*=c, \]

output

\[ b'=0. \]
Otherwise output a random bit:

\[ b'\leftarrow_{\$}\{0,1\}. \]

This adversary has constant complexity:

it only compares $c^*$ with $c$;
if they match, it outputs $0$;
otherwise it guesses randomly.

Yet this adversary wins with probability strictly larger than $\frac12$.

Therefore, even a constant-time adversary can exploit any deviation from perfect secrecy.

So requiring

\[ \Pr[\mathsf{IND}_{\mathcal A}=1] = \frac12 \]

for all efficient adversaries still forces something like perfect secrecy.

Conclusion:

It is not enough to relax only the adversary runtime.

We must also relax the winning condition.

11. Concrete indistinguishability security

The proper concrete definition relaxes two things:

only $T$-bounded adversaries are considered;
adversaries may have a small advantage $\epsilon$.

11.1. Advantage

The advantage of an adversary $\mathcal A$ is

\begin{equation*} \mathsf{Adv}_{\mathcal A} = \Pr[\mathsf{IND}_{\mathcal A}=1] - \frac12. \end{equation*}

The term $\frac12$ is the success probability of the trivial random-guessing adversary.

So the advantage measures how much better $\mathcal A$ does compared to random guessing.

11.2. Definition: $(T,\epsilon)$-IND security

An encryption scheme is $(T,\epsilon)$-IND secure if, for every $T$-bounded adversary $\mathcal A$,

\[ \Pr[\mathsf{IND}_{\mathcal A}=1] < \frac12 + \epsilon. \]

Equivalently,

\[ \mathsf{Adv}_{\mathcal A} < \epsilon. \]

This rules out adversaries that have non-negligible success beyond random guessing, while allowing tiny statistical deviations.

11.3. Choosing $T$ and $\epsilon$

In practice, one chooses parameters so that the best known attack with time complexity $T$ has success probability less than

\[ \frac12+\epsilon. \]

Typical historical parameters might be:

\[ T = 2^{80} \]

and

\[ \epsilon = 2^{-60}. \]

But this raises an important question:

What is the unit of $T$?

Possible units include:

milliseconds;
elementary CPU operations;
CPU cycles;
circuit gates;
hardware operations;
ASIC cost;
FPGA cost;
quantum operations.

Therefore concrete security is inherently technology-dependent.

A concrete security statement only makes sense relative to a specific computational model or technology.

12. From concrete security to asymptotic security

To avoid dependence on specific hardware, cryptography uses ideas from computational complexity theory.

12.1. Complexity theory perspective

In complexity theory, problems consist of infinitely many instances.

The runtime of an algorithm is measured as a function of the input size.

For example, merge sort on $n$ elements runs in time

\[ O(n\log n). \]

A function $f(n)$ is in

\[ O(n\log n) \]

if there exists a constant $c>0$ such that, for all sufficiently large $n$,

\[ f(n) \le c n\log n. \]

The lecturer informally writes something like:

\[ f(n)\in O(n\log n) \iff \exists c>0: \forall n\in\mathbb N: f(n)\le c n\log n. \]

More standardly, the inequality is required for all sufficiently large $n$.

An algorithm is efficient if it runs in polynomial time:

\[ f(n)\in \mathsf{poly}(n). \]

That means there exists a constant $d>0$ such that

\[ f(n)\le n^d \]

for all sufficiently large $n$.

12.2. Why this helps

Different reasonable computational models can simulate each other with polynomial overhead.

For example:

modern CPUs can simulate Turing machines with polynomial overhead;
Turing machines can simulate many other reasonable models with polynomial overhead.

Thus polynomial-time computation is a robust way to talk about efficient computation without tying the definition to one specific machine.

13. Security parameter

In cryptography, the input size is not always a good measure of security.

For example, we may want to encrypt a single bit securely.

If security were measured only as a function of message length, then security would not improve for one-bit messages.

Therefore cryptography introduces a separate size parameter:

\[ \lambda. \]

This is called the security parameter.

The security parameter is intended to characterize the hardness of breaking the scheme.

We study what happens asymptotically as

\[ \lambda \to \infty. \]

In practice, people instantiate schemes with concrete values such as

\[ \lambda = 128. \]

But asymptotic security itself is about the behavior as $\lambda$ grows without bound.

13.1. Key generation now takes $1^\lambda$

In the modern syntax, key generation takes the security parameter as input:

\[ \mathsf{KeyGen}(1^\lambda). \]

Here

\[ 1^\lambda \]

means the unary string consisting of $\lambda$ many ones:

\[ 1^\lambda = \underbrace{11\cdots 1}_{\lambda\text{ times}}. \]

The reason for unary encoding is a complexity-theoretic convention.

Algorithms are allowed to run polynomial time in the length of their input.

If we encoded $\lambda$ in binary, then the input length would be only

\[ \log \lambda. \]

Then polynomial time in the input length would mean polynomial in $\log\lambda$, which is not what we want.

Using $1^\lambda$ gives an input of length $\lambda$, so polynomial time means

\[ \mathsf{poly}(\lambda). \]

14. PPT adversaries

The computational model used in modern cryptography is PPT.

PPT means:

\[ \text{probabilistic polynomial time}. \]

A PPT algorithm is:

randomized;
runs in time polynomial in the security parameter $\lambda$.

Thus efficient adversaries are modeled as PPT machines.

The lecture also mentions Boolean circuits.

Boolean circuits are related to PPT computation, but not exactly the same model.

The reason is that Boolean circuits are a non-uniform model of computation.

A PPT machine is a randomized polynomial-time algorithm.

The important point is that a PPT machine has one fixed program that works for inputs of all lengths.

For example, one sorting algorithm can sort:

\[ 10 \text{ elements},\quad 1000 \text{ elements},\quad 100000 \text{ elements}. \]

The same algorithm is used for all input sizes.

This is called a uniform model of computation.

Informally:

\[ \text{one algorithm handles all input lengths.} \]

14.1. Boolean circuits: one circuit for one input length

A Boolean circuit consists of gates such as:

\[ \mathsf{AND},\quad \mathsf{OR},\quad \mathsf{NOT},\quad \mathsf{XOR}. \]

A fixed Boolean circuit usually only handles inputs of one fixed length.

For example, one circuit may take: \[ 128\text{-bit inputs}, \]

while another circuit may take:

\[ 256\text{-bit inputs}. \]

So, to solve a problem for all input lengths, we need a whole family of circuits:

\[ \{C_n\}_{n\in\mathbb N}, \]

where $C_n$ is the circuit for inputs of length $n$.

14.2. Non-uniformity

The key difference is this:

For a PPT machine, we specify one algorithm:

\[ A. \] For Boolean circuits, we specify an infinite sequence of circuits:

\[ C_1,C_2,C_3,\dots \] If we do not require a single algorithm to generate these circuits, then each circuit $C_n$ can be designed separately.

This is called non-uniform computation.

In other words:

\[ \text{different input lengths may use completely different circuits.} \]

There is no requirement that all these circuits come from one common program.

14.3. Why non-uniform circuits are more powerful

A non-uniform circuit family can contain information that is hard-wired for each input length.

This is similar to giving an algorithm extra advice depending only on the input length.

Informally:

\[ \text{non-uniform computation} \approx \text{uniform computation} + \text{advice depending on } n. \]

The advice may depend on the input length $n$, but not on the concrete input $x$.

So for all inputs of length $128$, the circuit $C_{128}$ may contain some information that is useful specifically for length $128$.

A PPT machine does not have this freedom, because it has only one fixed program for all input lengths.

In a randomized Turing-machine model running in polynomial time:

one finite machine description works for all input lengths.

In a circuit model:

one usually has a different circuit for each input length;
therefore an algorithm is an infinite sequence of circuits.

This creates a conceptual difference between uniform and non-uniform computation.

For this course, the lecture says we will think of efficient adversaries as PPT machines.

15. What should replace $\epsilon$ asymptotically?

In concrete security, we had a fixed advantage bound

\[ \epsilon. \]

In asymptotic security, the advantage becomes a function of $\lambda$:

\[ \epsilon(\lambda). \]

The question is:

How small should $\epsilon(\lambda)$ be?

Options:

exponentially small, such as

\[ 2^{-\lambda}; \]
smaller than every inverse polynomial.

The lecture chooses the second condition.

We want the adversary’s advantage to be smaller than

\[ \frac1{p(\lambda)} \]

for every polynomial $p$, once $\lambda$ is sufficiently large.

This leads to negligible functions.

16. Negligible functions

16.1. Definition

A function

\[ f:\mathbb N\to \mathbb R_{\ge 0} \]

is called negligible if, for every polynomial $p$, there exists an integer $N\in\mathbb N$ such that for all $n>N$,

\[ f(n) < \frac1{p(n)}. \]

Equivalently, it is enough to check polynomials of the form

\[ p(n)=n^c \]

for constants $c>0$.

So $f$ is negligible iff for every constant $c>0$, there exists $N$ such that for all $n>N$,

\[ f(n) < n^{-c}. \]

Notation:

\[ \mathsf{negl}(n) \]

denotes an unspecified negligible function.

16.2. Intuition

A negligible function goes to zero faster than every inverse polynomial.

We do not care what the function does for small $n$.

We only care that once $n$ is sufficiently large, the function becomes smaller than every inverse polynomial.

So negligible means:

\[ f(n) = o(n^{-c}) \]

for every constant $c>0$.

17. Examples of negligible and non-negligible functions

17.1. Example 1: $f_1(n)=2^{-n}$

\[ f_1(n)=2^{-n} \]

is negligible.

To show this, we need:

\[ 2^{-n} < n^{-c} \]

for every constant $c>0$ and sufficiently large $n$.

This is equivalent to

\[ 2^n > n^c. \]

Exponential functions eventually dominate every polynomial.

Therefore

\[ 2^{-n} \]

is negligible.

17.2. Example 2: $f_2(n)=n^{-100}$

\[ f_2(n)=n^{-100} \]

is not negligible.

Reason:

To be negligible, it would need to be smaller than every inverse polynomial.

But take

\[ n^{-101}. \]

We have

\[ n^{-100} > n^{-101} \]

for all $n>1$.

Thus $n^{-100}$ is not smaller than every inverse polynomial.

\[ n^{-100} \]

is not negligible.

17.3. Example 3: $f_3(n)=2^{-\log n}$

Assuming $\log$ means base $2$, we have

\[ 2^{-\log n} = \frac1{2^{\log n}} = \frac1n. \]

Thus

\[ f_3(n)=\frac1n. \]

This is an inverse polynomial.

Therefore it is not negligible.

17.4. Example 4: $f_4(n)=2^{-(\log n)^2}$

We rewrite:

\[ 2^{-(\log n)^2} = \left(2^{\log n}\right)^{-\log n} = n^{-\log n}. \]

\[ f_4(n)=n^{-\log n}. \]

For any constant $c>0$, eventually

\[ \log n > c. \]

Therefore, for sufficiently large $n$,

\[ n^{-\log n} < n^{-c}. \]

Hence

\[ 2^{-(\log n)^2} \]

is negligible.

17.5. Example 5: $f_5(n)=n^{-\log\log n}$

\[ f_5(n)=n^{-\log\log n} \]

is negligible.

Reason:

Although

\[ \log\log n \]

grows very slowly, it is still unbounded.

For every constant $c>0$, eventually

\[ \log\log n > c. \]

Therefore, for sufficiently large $n$,

\[ n^{-\log\log n} < n^{-c}. \]

\[ n^{-\log\log n} \]

is negligible.

17.6. Simple rule

The lecture gives the following rule:

A function $f$ is negligible iff

\[ -\log(f(n)) \ge \omega(\log n). \]

Meaning of $\omega(\log n)$

The notation \[ \omega(\log n) \] means that a function grows strictly faster than $\log n$.

More formally, if \[ g(n)=\omega(\log n), \]

then

\[ \frac{g(n)}{\log n}\to \infty \] as

\[ n\to\infty. \] So $g(n)$ is not just larger than $\log n$ by a constant factor.

Instead, the ratio \[ \frac{g(n)}{\log n} \] must grow without bound.

Equivalently, $f(n)$ is negligible when the logarithmic decay of $f$ grows asymptotically faster than $\log n$.

For example:

\[ f(n)=2^{-n} \]

has

\[ -\log f(n)=n, \]

and

\[ n=\omega(\log n), \]

so it is negligible.

For

\[ f(n)=2^{-\log n}, \]

we have

\[ -\log f(n)=\log n, \]

which is not $\omega(\log n)$, so it is not negligible.

For

\[ f(n)=2^{-(\log n)^2}, \]

we have

\[ -\log f(n)=(\log n)^2, \]

and

\[ (\log n)^2=\omega(\log n), \]

so it is negligible.

For

\[ f(n)=n^{-\log\log n}, \]

we compute:

\[ -\log f(n) = -\log\left(n^{-\log\log n}\right). \]

Using

\[ \log(x^y)=y\log x, \]

we get

\[ -\log f(n) = \log\log n \cdot \log n. \]

Since

\[ \log\log n \to \infty, \]

we have

\[ \log\log n \cdot \log n = \omega(\log n). \]

Thus $f(n)$ is negligible.

18. Properties of negligible functions

18.1. Lemma

If $\nu_1$ and $\nu_2$ are negligible functions, and $p$ is a polynomial, then:

$\nu_1+\nu_2$ is negligible.
$p\cdot \nu_1$ is negligible.

18.2. Proof of property 1: sum of two negligible functions

We want to show:

\[ \nu_1(\lambda)+\nu_2(\lambda) \]

is negligible.

Let $c>0$ be arbitrary.

Since $\nu_1$ is negligible, for sufficiently large $\lambda$,

\[ \nu_1(\lambda) < \lambda^{-(c+1)}. \]

Since $\nu_2$ is negligible, for sufficiently large $\lambda$,

\[ \nu_2(\lambda) < \lambda^{-(c+1)}. \]

Let $\lambda'$ be large enough so that both inequalities hold for all

\[ \lambda > \lambda'. \]

Then for all $\lambda>\lambda'$,

\[ \nu_1(\lambda)+\nu_2(\lambda) < \lambda^{-(c+1)}+\lambda^{-(c+1)}. \]

Thus

\[ \nu_1(\lambda)+\nu_2(\lambda) < 2\lambda^{-(c+1)}. \]

For $\lambda>2$,

\[ 2\lambda^{-(c+1)} < \lambda^{-c}. \]

Therefore, for sufficiently large $\lambda$,

\[ \nu_1(\lambda)+\nu_2(\lambda) < \lambda^{-c}. \]

Since $c>0$ was arbitrary,

\[ \nu_1+\nu_2 \]

is negligible.

18.3. Proof of property 2: polynomial times negligible is negligible

Let

\[ p(\lambda) \]

be a polynomial.

For simplicity, suppose

\[ p(\lambda) \le \lambda^d \]

for some constant $d>0$ and sufficiently large $\lambda$.

Since $\nu_1$ is negligible, for every $c>0$, in particular for $c+d$, there exists a sufficiently large $\lambda$ such that

\[ \nu_1(\lambda) < \lambda^{-(c+d)}. \]

Then

\[ p(\lambda)\nu_1(\lambda) \le \lambda^d \cdot \lambda^{-(c+d)} = \lambda^{-c}. \]

Since this holds for every $c>0$ and sufficiently large $\lambda$,

\[ p\cdot \nu_1 \]

is negligible.

The lecture also expresses this using asymptotic notation:

\[ \nu_1(\lambda)=\lambda^{-\omega(1)}. \]

\[ p(\lambda)=\lambda^c, \]

then

\[ p(\lambda)\nu_1(\lambda) = \lambda^c\cdot \lambda^{-\omega(1)} = \lambda^{c-\omega(1)} = \lambda^{-\omega(1)}. \]

So multiplying a negligible function by a polynomial still gives a negligible function.

19. Where the lecture stops

The lecture ends just after proving the basic closure properties of negligible functions.

The next lecture will return to the full asymptotic definition of encryption security.

The upcoming steps are:

update the syntax of encryption schemes to include the security parameter;
define encryption schemes as PPT algorithms;
define IND-security against PPT adversaries with negligible advantage.

The slides indicate the next full definition will look like:

\[ \Pr[\mathsf{IND}_{\mathcal A}(\lambda)=1] < \frac12+\nu(\lambda), \]

where $\nu$ is negligible.

20. Big-picture summary

This lecture completes the transition from information-theoretic secrecy to computational secrecy.

The key points are:

The three definitions of perfect secrecy are equivalent.
The last implication $(1)\Rightarrow(3)$ is proved by defining a two-point message distribution induced by the adversary’s chosen messages.
Shannon’s theorem shows that perfect secrecy requires
\begin{equation*} |\mathcal K| \ge |\mathcal M|. \end{equation*}
Therefore perfectly secret encryption cannot have keys shorter than messages.
Perfect secrecy is too strong because it protects against all adversaries, even computationally unbounded ones.
Perfect secrecy is also incomplete as a practical security goal because it does not prevent attacks that modify ciphertexts.
The one-time pad is malleable:

\[ c' = c\oplus m_0\oplus m_1 \]

flips the encrypted message between $m_0$ and $m_1$.
Modern cryptography restricts attention to efficient adversaries.
Merely restricting runtime is not enough; the success probability must also be relaxed.
Concrete security uses $(T,\epsilon)$-security:

\[ \Pr[\mathsf{IND}_{\mathcal A}=1] < \frac12+\epsilon. \]
Concrete security depends on the technology or computational model used to interpret $T$.
Asymptotic security avoids this by using:
- a security parameter $\lambda$;
- PPT adversaries;
- negligible advantage.
Negligible functions are functions that vanish faster than every inverse polynomial.
Negligible functions are stable under:
- addition;
- multiplication by polynomials.

1. Context of this lecture

2. Recap: three definitions of perfect secrecy

2.1. Definition 1: message-ciphertext independence

2.2. Definition 2: identical ciphertext distributions

2.3. Definition 3: ciphertext indistinguishability experiment

3. Theorem: the three definitions are equivalent

4. Proof of \((1) \Rightarrow (3)\)

4.1. Goal

4.2. Setup

4.3. Winning condition

4.4. Expand by conditioning on ciphertexts

4.5. Apply Definition 1

4.6. Conclusion

5. Shannon’s theorem

5.1. Motivation

5.2. Statement

5.3. Proof idea

5.4. Constructing the contradiction

5.5. First observation: \(m\in S\)

5.6. Second observation: \(|S|\le |\mathcal K|\)

5.7. Why \(m'\) cannot encrypt to \(\hat c\)

5.8. Consequence

6. Summary of the perfect secrecy part

7. From perfect secrecy to modern private-key encryption

7.1. Perfect secrecy is an overkill

7.2. Perfect secrecy may also be an underkill

8. Side note: an attack that does not break secrecy

8.1. Scenario

8.2. Attack

8.3. Effect

9. Computational security: first idea

9.1. Attempt 1: \(T\)-IND security

9.2. What does \(T\) measure?

10. Why Attempt 1 fails

11. Concrete indistinguishability security

11.1. Advantage

11.2. Definition: \((T,\epsilon)\)-IND security

11.3. Choosing \(T\) and \(\epsilon\)

12. From concrete security to asymptotic security

12.1. Complexity theory perspective

12.2. Why this helps

13. Security parameter

13.1. Key generation now takes \(1^\lambda\)

14. PPT adversaries

14.1. Boolean circuits: one circuit for one input length

14.2. Non-uniformity

14.3. Why non-uniform circuits are more powerful

15. What should replace \(\epsilon\) asymptotically?

16. Negligible functions

16.1. Definition

16.2. Intuition

17. Examples of negligible and non-negligible functions

17.1. Example 1: \(f_1(n)=2^{-n}\)

17.2. Example 2: \(f_2(n)=n^{-100}\)

17.3. Example 3: \(f_3(n)=2^{-\log n}\)

17.4. Example 4: \(f_4(n)=2^{-(\log n)^2}\)

17.5. Example 5: \(f_5(n)=n^{-\log\log n}\)

17.6. Simple rule

18. Properties of negligible functions

18.1. Lemma

18.2. Proof of property 1: sum of two negligible functions

18.3. Proof of property 2: polynomial times negligible is negligible

19. Where the lecture stops

20. Big-picture summary