Announcements¶

Homework 6 has been posted on Piazza. It is due on Friday 3/31 on Gradescope.
Homework 5 has been graded, and resubmissions are due on Sunday 4/2.
We will post Homework 7 and the class project later this week.

Recap: Secure multi-party computation¶

Secure multi-party computation (MPC) solves a seemingly-impossible problem:

Many people, each with their own private data that they do not wish to share, can still perform data analysis collaboratively and only reveal the result!

MPC provides a data confidentiality guarantee.

As a consequence, it operates differently than blockchains and broadcasts, which achieve decentralization based on the principle of replication.
Instead, MPC operates by federating data across $n$ servers and giving each server a randomized secret share of the data, such that no subset of $t$ parties can learn anything about the data.

(If the parties are non-faulty and only trying to break confidentiality, then we call these parties semi-honest.)

We can also add integrity and availability using verifiable secret sharing (or VSS). In this case, we can withstand so-called malicious parties that are trying to break both confidentiality and correctness.

VSS is similar to the Byzantine generals question. You are a leader and you're providing information to many other parties. The main differences here are that:

we are adding confidentiality through secret sharing, and
there are two stages to the protocol, a dispersal stage and a retrieval stage.

But the other trust questions from the Byzantine generals problem remain. You don't necessarily trust the other parties entirely, and they don't entirely trust you either.

Byzantine generals

Source: Wikipedia

We showed three types of MPC constructions:

In Lecture 13, we constructed MPC against $t = 1$ semi-honest adversary out of $n = 3$ parties. In Lecture 15, we extended this construction to support a wide range of database queries.
In Lecture 13, we also constructed MPC against $t = 1$ malicious adversary out of $n = 4$ parties.
In Lecture 14, we constructed a verifiable secret sharing scheme for any number of $n$ parties, such that at most $f < n/3$ of them were faulty (since we didn't use a PKI, we know this is the best possible) and $t < 2n/3$ were colluding to break confidentiality. This can be used as the basis of MPC constructions with integrity and availability.

More generally, it is possible to construct MPC for a wide range of choices for the total number of parties $n$ and confidentiality threshold $t$.

Semi-honest MPC for any $t < n$.
Malicious MPC (with faulty parties) for any $t < n$.
Robust MPC (which guarantees availability) for any $t < n/2$.

Lecture 16: Zero Knowledge Proofs¶

16.1 Motivating question¶

Can you convince someone that a statement is true, without showing them the evidence or reason why?

Example 1. Two interleaved locks

Two padlocks, image taken from https://www.publicdomainpictures.net/en/view-image.php?image=129016

Example 2. Sudoku puzzle

Sudoku puzzle

Constraints to check a Sudoku puzzle

For each row:
- Assert(row contains 1-9)
For each column:
- Assert(row contains 1-9)
For each grid:
- Assert(row contains 1-9)
Assert(contains the public constants from the original puzzle)

Sudoku puzzle with solution

Example 3. Opening a door inside a cave

Peggy entering a cave

Source: Wikipedia

Victor challenges Peggy

Source: Wikipedia

Peggy exits the cave

Source: Wikipedia

Example 4. Proof of knowledge of a discrete logarithm. (We saw this in Lecture 6.)

Prover chooses a random exponent $k$ and sends $g^k$ to the verifier.
Verifier chooses one of two challenges: it requests either
- The discrete log of $g^k$, or
- The discrete log of $g^k \cdot g^x = g^{k+x}$.
Prover responds to the challenge by sending either $k$ or $k+x$, as appropriate.

The verifier accepts this proof only if $g$ raised to the power of the final message equals the challenge.

A dramatic claim¶

There exists a zero knowledge proof for any statement!

(Well at least: for any statement that can be efficiently checked, if someone gave you a potential solution.)

Let's try to learn some lessons from the three proofs we've seen so far.

Question. What characteristics do all of these proofs have in common?

There are two parties: a prover and a verifier
The prover performs at least some actions that the verifier doesn't directly see
The actions are randomized, and there may even be some (small!) probability that the proof is incorrect.

16.2 Zero knowledge proofs: the idea¶

A zero knowledge proof is a protocol involving two parties: a prover and a verifier.

(Rather than calling the parties Alice and Bob, oftentimes the parties are called Peggy and Victor.)

prover Peggy($x, w$)		verifier Victor($x$)
	$\underset{\longleftarrow}{\pi} \\ \longrightarrow \\ \vdots \\ \longrightarrow $	output: Accept or Reject

Before the protocol starts:

There is a claimed statement $x$ (i.e., the Sudoku puzzle) that is common knowledge: both the prover Peggy and verifier Victor know it.
The prover Peggy also has a witness $w$ (i.e., the solution to the Sukodu puzzle).
It is also common knowledge that there is an efficiently-computable check program $C(x, w)$ that can determine that the statement is true, given $x$ and $w$. (So: Peggy can run $C$, but Victor cannot since he doesn't have $w$.)

Once the protocol ends: the verifier outputs its judgment about whether to Accept or Reject the claim that $x$ is true.

(Note: the prover has no output. It already knows everything, and isn't trying to learn new information through the interaction. Instead it is just trying to convince the verifier that $x$ is true.)

What makes this problem interesting is: neither party fully trusts the other one.

If Victor fully trusted Peggy, he could just take her word that the claimed statement is true. Instead, he wants to be convinced that the statement $x$ is true. That is: he is concerned about the integrity of the proof.

If Peggy fully trusted Victor, then she could just send him the solution $w$ and let him verify it for himself. Instead, Peggy wants to keep $w$ to herself. That is, she is concerned about the confidentiality of the proof.

The parties can have as many or as few rounds of interaction as the verifier wants. You can think of the interactive dialogue as Victor issuing challenges and Peggy sending responses to these challenges.

By the end of the interactive conversation, the verifier should be convinced that:

The statement is true.
Optionally, Peggy knows why the statement is true. This is called a proof of knowledge.

To understand the difference between a proof and a proof of knowledge: consider the following two statements given a group element $h = g^s$, where $g$ and $h$ are publicly known.

There exists a discrete log $s = \log_{g}(h)$.
I know the discrete log $s = \log_{g}(h)$.

Proving a negative¶

Victor's concern about the integrity of the proof is, at a high level, similar to many of the integrity concerns we've seen throughout this course (e.g., consistency among Byzantine generals, verifiable MPC, etc).

But Peggy's concern about the confidentiality of the witness is new and strange!

Let's consider the opposite concept for a moment.

Question. If Peggy wanted to be certain that Victor did learn why the statement is true, how could she accomplish this goal?

One option is that Peggy could send Victor the witness $w$ and allow Victor to check the proof for himself.

If Peggy is concerned that the message might get lost in transit, Peggy could broadcast $w$ to many servers or post it to a public blockchain, and let Victor download it.

In summary: it's easy to prove that Victor has access to some particular data.

But how do you prove a negative? How do you prove that Victor does not have some data?

It is not sufficient to claim that Peggy never sent $w$ to Victor in the interactive protocol. Why is this?

Perhaps Peggy sent some related data, like $w+1$, which is technically not $w$ but that would allow Victor to deduce $w$ anyway.
Perhaps Peggy sent the first bit of $w$, which is not the entire secret but is part of it.
Perhaps Peggy sends nothing about $w$... but her algorithm is vulnerable to being duped in some way. That is: maybe a faulty Victor can find a clever way to force Peggy into making one of the above two mistakes.

Instead, we must find a way to show that Victor has learned nothing beyond what he already knew before talking with Peggy.

That is, in the entire conversation, Victor has only learned 1 bit of new information -- that the statement $x$ is true. Beyond this, everything else that Peggy sent is random noise.

16.3 Definition of zero knowledge proofs¶

prover Peggy($x, w$)		verifier Victor($x$)
	$\underset{\longleftarrow}{\pi} \\ \longrightarrow \\ \vdots \\ \longrightarrow $	output: Accept or Reject

Formally, a zero knowledge proof is an interactive protocol $\pi$ that satisfies three properties.

Completeness: every valid proof should be accepted by the verifier.

In more detail: if the statement $x$ is true and both Peggy and Victor act honestly, then at the end of the protocol Victor will output True.

Soundness: nearly all invalid proofs should be rejected by the verifier.

In more detail: if the statement $x$ is false and Victor is honest, then it is infeasible for any prover Peggy (even a malicious/faulty one) to deceive Victor. That is, Victor will output False, except maybe with some probability $\varepsilon$ of making an error and reaching a decision of True instead.

An extension of this property, called knowledge soundness, states that:

if Peggy can convince Victor to accept the proof, then she must have a witness $w$ that explains why the statement is true.

I'm not going to provide a rigorous definition of this property in this class.

Zero knowledge: if the statement $x$ is true and Peggy is honest, then it is infeasible for any verifier Victor (even a malicious/faulty one) to learn anything from the proof beyond the fact that the statement is true.

One of the formative achievements in modern cryptography is to devise a way to formalize this claim -- that is, to prove a negative.

Here's how it works. The main insight is to create a new computer program called the simulator $S$.

The simulator is a figment of Victor's imagination -- it does not really exist.

The simulator's job is to take on the role of the prover and to forge a proof.

Wait hang on, that doesn't sound right! If a proof can be forged, then it wouldn't be sound!

To resolve this conundrum, we only insist that the simulator forge a proof back to Victor himself.

That is: the simulator will take on the role of the prover, and it will interact with Victor. Every time that Victor issues a challenge, the simulator will respond to it too, just like the real prover does.

So we can imagine two scenarios: either Victor is talking to a real prover Peggy (who has a witness) or to this imaginary simulator (who doesn't know the witness).

prover Peggy($x, w$)		verifier Victor($x$)
	$\underset{\longleftarrow}{\pi_\text{Real}} \\ \longrightarrow \\ \vdots \\ \longrightarrow $	output: Accept or Reject

simulator S($x$)		verifier Victor($x$)
	$\underset{\longleftarrow}{\pi_\text{Fake}} \\ \longrightarrow \\ \vdots \\ \longrightarrow $	output: Accept or Reject

A proof satisfies the zero knowledge property if:

For any verifier $V$ and statement $x$, there exists a simulator $S$ such that it is infeasible to tell the difference between the random variables $\pi_\text{Real}$ and $\pi_\text{Fake}$.

It is easiest to think through the concept of simulation using an example.

Let's go back to our Sudoku puzzle from the start of class.

Sudoku puzzle

Every time the verifier Victor issues a challenge to open one row, colum, or grid, it already "knows" what it is expecting to see:

the numbers 1 through 9, in a random order, except for the numbers that were already part of the puzzle.

Hence, it isn't really the cards themselves that convince Victor that the proof is true. The cards essentially provide "no data."

Instead what convinces Victor are the facts that:

The publicly visible cards correspond to the actual puzzle $x$ that Victor wanted to know whether it is solveable.
Peggy already committed to the cards beforehand (i.e., placed them facedown and cannot change them later), before Victor's challenge
Peggy is capable of responding to many challenges from Victor in a consistent way -- that is, from the same commitment of the cards.

16.4 Constructing ZK proofs: Sudoku over the Internet¶

The zero knowledge proof of a Sudoku puzzle works well in our classroom. But it required taking physical actions, such as:

The prover placed cards facedown and then walked away from the table. This convinces the verifier that the cards are fixed and cannot change, even though the verifier cannot see the cards.
Optionally, we attached two cards together using paper clips. Once again, the verifier was convinced that the two cards didn't get separated, even though the verifier cannot see the cards.

Suppose that Peggy wants to convince Victor that she knows the solution to a Sudoku puzzle, but now they are communicating via a text messaging program over the Internet.

Peggy could perform the card-based proof and take pictures after each step, but this wouldn't convince Victor anymore.

Question. Why not?

Sudoku proof using playing cards

Source: Moni Naor

Instead, we need a digital equivalent to a playing card, where we can have it be "facedown" (i.e., hidden) and yet impossible to change (i.e., binding).

This is a commitment scheme!

Recall that a commitment scheme has two methods.

Commit(m; r): given a message $m$ and (optionally) some randomness $r$, construct a commitment $c$.
Open(c, m, r): with the commitment $c$ and the inputs used to create it, it's possible to output the message $m$ and randomness $r$, in order to convince the receiver that $m$ is the correct message contained inside of the commitment.

Committer		Receiver
message $m$	$\underset{\longrightarrow}{c}$

Additionally I'm going to use one more property of Sudoku puzzles: if I permute all of the numbers in the range 1-9 (both the public and secret numbers), then the result is still a Sudoku puzzle.

Original	1	2	3	4	5	6	7	8	9
Permuted	3	2	7	9	4	5	6	1	8

So: if Peggy knows the solution to one Sudoku puzzle, then she really knows the answer to $9! = 362,880$ Sudoku puzzles.

Using commitments and permutations, we can make a zero knowledge proof for Sudoku over the Internet.

The protocol¶

Here is the setup. Both Peggy and Victor know a Sudoku puzzle, and Peggy also knows its solution.

Sudoku puzzle with solution

Source: Andreas Pogiatzis

Step 1. Peggy creates a random new puzzle and its solution, by creating a random permutation of the numbers 1 through 9.

Source: Andreas Pogiatzis

Step 2. Peggy commits to each cell of the puzzle. (Remember that each commitment needs its own one-time randomness!) She sends all 81 commitments to Victor.

Peggy commits to the entries of the permuted puzzle

Source: Andreas Pogiatzis

Step 3. Victor requests that Peggy do one of the following.

Open all 9 commitments on a row of Victor's choice.
Open all 9 commitments on a column of Victor's choice.
Open all 9 commitments on a grid of Victor's choice.
Open all of the public (yellow) values and share the permutation $M$ that links the original puzzle with the permuted puzzle.

Note that there are a total of 28 possible actions that Victor can request here.

Step 4. Peggy performs the requested action by opening the commitments to the relevant cells, but no others.

Victor then checks that the commitments satisfy the desired property.

If a row, column, or grid was opened: check that the numbers are 1 through 9 in some order.
If the public values and $M$ are opened, check that the opened puzzle really is a permuted version of the original (public) puzzle.

If so, he outputs Accept. Otherwise he outputs Reject.

Analyzing the digital Sudoku proof¶

Does this proof satisfy the three criteria of a zero knowledge proof? Let's check.

Completeness is the easiest property to think about. If Peggy is telling the truth, then her permuted and committed Sudoku puzzle will be correct. Hence, it will satisfy any of the 28 actions that Victor could request.

Soundness is a bit more complicated to reason about. Remember that soundness is about catching a faulty prover.

If the statement $x$ is false and Victor is honest, then it is infeasible for any prover Peggy (even a malicious/faulty one) to deceive Victor. That is, Victor will output False, except maybe with some probability $\varepsilon$ of making an error and reaching a decision of True instead.

If Peggy is lying and the Sudoku puzzle actually has no solution, then her permuted & committed puzzle has to be wrong somewhere. (After all: if it satisfies the rules for all rows, columns, and grids and connects to the real statement $x$, then it is a valid Sudoku solution!)

We don't know which of the 28 conditions Peggy has violated, but it has to be at least one of them. Hence, if the statement $x$ is false, then Victor will catch Peggy with at least probability $\frac{1}{28}$; or in other words, Peggy will get away with cheating with probability at most $\varepsilon = \frac{27}{28} = 0.9643$.

That's... actually not very satisfying.

Question. How can we reduce the soundness error?

We can amplify the soundness via repetition. Just run the proof many times, either in sequence or in parallel. Each time, Peggy chooses a new permutation $M$ and new commitments for each cell. And each time, Victor has a chance to catch Peggy in a lie.

If we repeat the protocol $k$ times, then Peggy gets away with cheating with probability at most $(27/28)^k$. This probability vanishes to 0 exponentially fast in $k$. For instance if we choose $k = 2440$, then the probability of cheating on all repetitions is less than $1/2^{128}$, which is our usual crypto standard for "infeasible."

(Admittedly this requires that Peggy and Victor construct, exchange, and open a lot of Sudoku puzzles. Oh well; at least computing power and network communication are cheap nowadays?)

Zero knowledge is also a subtle point to reason about. Remember that our goal is to construct a simulator whose actions look identical to the real prover Peggy's work in the real world.

prover Peggy($x, w$)		verifier Victor($x$)
	$\underset{\longleftarrow}{\pi_\text{Real}} \\ \longrightarrow \\ \vdots \\ \longrightarrow $	output: Accept or Reject

simulator S($x$)		verifier Victor($x$)
	$\underset{\longleftarrow}{\pi_\text{Fake}} \\ \longrightarrow \\ \vdots \\ \longrightarrow $	output: Accept or Reject

But also: the simulator isn't real; it's just a figment of Victor's mind. And as a result, the simulator already "knows" the decisions that Victor will take -- specifically, which of the 28 checks Victor will choose to test Peggy on.

As a result, the simulator (who doesn't know the Sudoku solution $w$) can interact with Victor by using a fake puzzle that is only correct on the opened commitments, but nowhere else.

To make this claim rigorous, let's change the protocol slightly by adding a new round of interaction at the start of the zero knowledge proof.

Victor commits to his choice of action in step 3 below.
Peggy sends all 81 commitments to the permuted puzzle to Victor.
Victor opens his commitment from step 1 and requests that Peggy do one of 28 specified actions.
Peggy performs the requested action by opening the commitments to the relevant cells. Victor checks that they satisfy the desired property.

Also when we say that the simulator and Victor are the same party, what we mean formally is that they are both just computer programs that are controlled by the same entity. We'll see in a second how to use this property.

Here's how the simulator works.

Since the simulator doesn't know the real solution $w$, it makes up a random Sudoku puzzle that has no connection to reality.
Next, the simulator interacts with Victor to perform steps 1, 2, and 3 of the protocol. So far, the simulator has only sent commitments to the fake puzzle, but commitments are hiding so Victor cannot tell that the simulator is using a fake puzzle. (Victor would find out if we continued on to step 4, though.)
From step 3 of the proof, the simulator has now learned which action Victor will use as his test. Now the simulator rewinds Victor's state back to the end of step 1, creates a new puzzle that is correct on the desired row/column/grid/public values, and completes the rest of the proof in the same way as the honest prover would.

That's it! This "rewinding" trick is what makes the analysis work. This is how we formalize the intuitive claim from before that:

it isn't really the cards themselves that convince Victor that the proof is true, [but instead that] Peggy already committed to the cards beforehand (i.e., placed them facedown and cannot change them later).

16.5 From Sudoku to everything¶

Great, now we have a digital way to prove that a Sudoku puzzle can be solved. What can we do with this?

Amazingly, you can do quite a lot! If you take a complexity theory course, you will learn that Sudoku puzzles (if you extend them to be larger than $9 \times 9$ have the property of being NP-complete.

In simple terms, what this means is that you can take any problem that has an efficiently-computable check program $C(x, w)$ and convert it into a Sudoku puzzle, so that the original problem is true if and only if the corresponding Sudoku puzzle is true.

So here are some problems that you can prove in zero knowledge.

I know the password corresponding to username bob123. (Could be used for authentication to a website.)
The age on my government-issued ID is greater than 21. (Could be used to enter a bar, without revealing all of the data on your identification card.)
I have enough money in my account to cover a financial transaction. (This will be the basis for privacy-preserving cryptocurrencies, where people can validate transactions without knowing the sender, receiver, or amount sent!)

...And many more.

At this point, we are at an analogous point with ZK proofs as where we were at the end of Lecture 13 with MPC. The punchline is that:

You can prove any statement in zero knowledge!
But also... this may not be the most efficient way to construct a proof.

In particular, the approach we have designed today is inefficient in at least three metrics.

Computation: doing this reduction from a general problem to a Sudoku puzzle is computationally intensive, and it would be better to find new ways to prove directly the statements that are of interest to us. In the next few lectures, we will design proofs that can be calculated in seconds or even milliseconds.
Interaction: the proof we constructed today only convinces a single verifier Victor; if you want to convince someone else then you'll have to start all over again. In future lectures, we will design a single proof that you can post to a blockchain and that will convince everyone!
Communication: in order to reduce the soundness error, we had to create $k = 2440$ permuted/committed Sudoku puzzles and send them all over the Internet. On a blockchain, remember that you pay fees that are directly proportional to the amount of data that you post onto the blockchain, so posting the proof that we constructed today would be cost-prohibitive. Instead, we will design proofs that take up just a few hundred bytes of data -- essentially the same as a signature!