Secure multi-party computation (MPC) solves a seemingly-impossible problem:
Many people, each with their own private data that they do not wish to share, can still perform data analysis collaboratively and only reveal the result!
MPC provides a data confidentiality guarantee.
As a consequence, it operates differently than blockchains and broadcasts, which achieve decentralization based on the principle of replication.
Instead, MPC operates by federating data across $n$ servers and giving each server a randomized secret share of the data, such that no subset of $t$ parties can learn anything about the data.
(If the parties are non-faulty and only trying to break confidentiality, then we call these parties semi-honest.)
We can also add integrity and availability using verifiable secret sharing (or VSS). In this case, we can withstand so-called malicious parties that are trying to break both confidentiality and correctness.
VSS is similar to the Byzantine generals question. You are a leader and you're providing information to many other parties. The main differences here are that:
But the other trust questions from the Byzantine generals problem remain. You don't necessarily trust the other parties entirely, and they don't entirely trust you either.
Source: Wikipedia
We showed three types of MPC constructions:
More generally, it is possible to construct MPC for a wide range of choices for the total number of parties $n$ and confidentiality threshold $t$.
Can you convince someone that a statement is true, without showing them the evidence or reason why?
Example 1. Two interleaved locks
Example 2. Sudoku puzzle
Constraints to check a Sudoku puzzle
Example 3. Opening a door inside a cave
Source: Wikipedia
Source: Wikipedia
Source: Wikipedia
Example 4. Proof of knowledge of a discrete logarithm. (We saw this in Lecture 6.)
The verifier accepts this proof only if $g$ raised to the power of the final message equals the challenge.
There exists a zero knowledge proof for any statement!
(Well at least: for any statement that can be efficiently checked, if someone gave you a potential solution.)
Let's try to learn some lessons from the three proofs we've seen so far.
Question. What characteristics do all of these proofs have in common?
A zero knowledge proof is a protocol involving two parties: a prover and a verifier.
(Rather than calling the parties Alice and Bob, oftentimes the parties are called Peggy and Victor.)
prover Peggy($x, w$) | verifier Victor($x$) | |
---|---|---|
$\underset{\longleftarrow}{\pi} \\ \longrightarrow \\ \vdots \\ \longrightarrow $ |
Before the protocol starts:
Once the protocol ends: the verifier outputs its judgment about whether to Accept or Reject the claim that $x$ is true.
(Note: the prover has no output. It already knows everything, and isn't trying to learn new information through the interaction. Instead it is just trying to convince the verifier that $x$ is true.)
What makes this problem interesting is: neither party fully trusts the other one.
The parties can have as many or as few rounds of interaction as the verifier wants. You can think of the interactive dialogue as Victor issuing challenges and Peggy sending responses to these challenges.
By the end of the interactive conversation, the verifier should be convinced that:
To understand the difference between a proof and a proof of knowledge: consider the following two statements given a group element $h = g^s$, where $g$ and $h$ are publicly known.
Victor's concern about the integrity of the proof is, at a high level, similar to many of the integrity concerns we've seen throughout this course (e.g., consistency among Byzantine generals, verifiable MPC, etc).
But Peggy's concern about the confidentiality of the witness is new and strange!
Let's consider the opposite concept for a moment.
Question. If Peggy wanted to be certain that Victor did learn why the statement is true, how could she accomplish this goal?
One option is that Peggy could send Victor the witness $w$ and allow Victor to check the proof for himself.
If Peggy is concerned that the message might get lost in transit, Peggy could broadcast $w$ to many servers or post it to a public blockchain, and let Victor download it.
In summary: it's easy to prove that Victor has access to some particular data.
But how do you prove a negative? How do you prove that Victor does not have some data?
It is not sufficient to claim that Peggy never sent $w$ to Victor in the interactive protocol. Why is this?
Instead, we must find a way to show that Victor has learned nothing beyond what he already knew before talking with Peggy.
That is, in the entire conversation, Victor has only learned 1 bit of new information -- that the statement $x$ is true. Beyond this, everything else that Peggy sent is random noise.
prover Peggy($x, w$) | verifier Victor($x$) | |
---|---|---|
$\underset{\longleftarrow}{\pi} \\ \longrightarrow \\ \vdots \\ \longrightarrow $ |
Formally, a zero knowledge proof is an interactive protocol $\pi$ that satisfies three properties.
Completeness: every valid proof should be accepted by the verifier.
In more detail: if the statement $x$ is true and both Peggy and Victor act honestly, then at the end of the protocol Victor will output True.
Soundness: nearly all invalid proofs should be rejected by the verifier.
In more detail: if the statement $x$ is false and Victor is honest, then it is infeasible for any prover Peggy (even a malicious/faulty one) to deceive Victor. That is, Victor will output False, except maybe with some probability $\varepsilon$ of making an error and reaching a decision of True instead.
An extension of this property, called knowledge soundness, states that:
if Peggy can convince Victor to accept the proof, then she must have a witness $w$ that explains why the statement is true.
I'm not going to provide a rigorous definition of this property in this class.
One of the formative achievements in modern cryptography is to devise a way to formalize this claim -- that is, to prove a negative.
Here's how it works. The main insight is to create a new computer program called the simulator $S$.
The simulator is a figment of Victor's imagination -- it does not really exist.
The simulator's job is to take on the role of the prover and to forge a proof.
Wait hang on, that doesn't sound right! If a proof can be forged, then it wouldn't be sound!
To resolve this conundrum, we only insist that the simulator forge a proof back to Victor himself.
That is: the simulator will take on the role of the prover, and it will interact with Victor. Every time that Victor issues a challenge, the simulator will respond to it too, just like the real prover does.
So we can imagine two scenarios: either Victor is talking to a real prover Peggy (who has a witness) or to this imaginary simulator (who doesn't know the witness).
prover Peggy($x, w$) | verifier Victor($x$) | |
---|---|---|
$\underset{\longleftarrow}{\pi_\text{Real}} \\ \longrightarrow \\ \vdots \\ \longrightarrow $ |
simulator S($x$) | verifier Victor($x$) | |
---|---|---|
$\underset{\longleftarrow}{\pi_\text{Fake}} \\ \longrightarrow \\ \vdots \\ \longrightarrow $ |
A proof satisfies the zero knowledge property if:
For any verifier $V$ and statement $x$, there exists a simulator $S$ such that it is infeasible to tell the difference between the random variables $\pi_\text{Real}$ and $\pi_\text{Fake}$.
It is easiest to think through the concept of simulation using an example.
Let's go back to our Sudoku puzzle from the start of class.
Every time the verifier Victor issues a challenge to open one row, colum, or grid, it already "knows" what it is expecting to see:
the numbers 1 through 9, in a random order, except for the numbers that were already part of the puzzle.
Hence, it isn't really the cards themselves that convince Victor that the proof is true. The cards essentially provide "no data."
Instead what convinces Victor are the facts that:
The zero knowledge proof of a Sudoku puzzle works well in our classroom. But it required taking physical actions, such as:
Suppose that Peggy wants to convince Victor that she knows the solution to a Sudoku puzzle, but now they are communicating via a text messaging program over the Internet.
Peggy could perform the card-based proof and take pictures after each step, but this wouldn't convince Victor anymore.
Question. Why not?
Source: Moni Naor
Instead, we need a digital equivalent to a playing card, where we can have it be "facedown" (i.e., hidden) and yet impossible to change (i.e., binding).
This is a commitment scheme!
Recall that a commitment scheme has two methods.
Committer | Receiver | |
---|---|---|
$\underset{\longrightarrow}{c}$ |
Additionally I'm going to use one more property of Sudoku puzzles: if I permute all of the numbers in the range 1-9 (both the public and secret numbers), then the result is still a Sudoku puzzle.
Original | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
---|---|---|---|---|---|---|---|---|---|
Permuted | 3 | 2 | 7 | 9 | 4 | 5 | 6 | 1 | 8 |
So: if Peggy knows the solution to one Sudoku puzzle, then she really knows the answer to $9! = 362,880$ Sudoku puzzles.
Using commitments and permutations, we can make a zero knowledge proof for Sudoku over the Internet.
Here is the setup. Both Peggy and Victor know a Sudoku puzzle, and Peggy also knows its solution.
Source: Andreas Pogiatzis
Step 1. Peggy creates a random new puzzle and its solution, by creating a random permutation of the numbers 1 through 9.
Source: Andreas Pogiatzis
Step 2. Peggy commits to each cell of the puzzle. (Remember that each commitment needs its own one-time randomness!) She sends all 81 commitments to Victor.
Source: Andreas Pogiatzis
Step 3. Victor requests that Peggy do one of the following.
Note that there are a total of 28 possible actions that Victor can request here.
Step 4. Peggy performs the requested action by opening the commitments to the relevant cells, but no others.
Victor then checks that the commitments satisfy the desired property.
If so, he outputs Accept. Otherwise he outputs Reject.
Does this proof satisfy the three criteria of a zero knowledge proof? Let's check.
Completeness is the easiest property to think about. If Peggy is telling the truth, then her permuted and committed Sudoku puzzle will be correct. Hence, it will satisfy any of the 28 actions that Victor could request.
Soundness is a bit more complicated to reason about. Remember that soundness is about catching a faulty prover.
If the statement $x$ is false and Victor is honest, then it is infeasible for any prover Peggy (even a malicious/faulty one) to deceive Victor. That is, Victor will output False, except maybe with some probability $\varepsilon$ of making an error and reaching a decision of True instead.
If Peggy is lying and the Sudoku puzzle actually has no solution, then her permuted & committed puzzle has to be wrong somewhere. (After all: if it satisfies the rules for all rows, columns, and grids and connects to the real statement $x$, then it is a valid Sudoku solution!)
We don't know which of the 28 conditions Peggy has violated, but it has to be at least one of them. Hence, if the statement $x$ is false, then Victor will catch Peggy with at least probability $\frac{1}{28}$; or in other words, Peggy will get away with cheating with probability at most $\varepsilon = \frac{27}{28} = 0.9643$.
That's... actually not very satisfying.
Question. How can we reduce the soundness error?
We can amplify the soundness via repetition. Just run the proof many times, either in sequence or in parallel. Each time, Peggy chooses a new permutation $M$ and new commitments for each cell. And each time, Victor has a chance to catch Peggy in a lie.
If we repeat the protocol $k$ times, then Peggy gets away with cheating with probability at most $(27/28)^k$. This probability vanishes to 0 exponentially fast in $k$. For instance if we choose $k = 2440$, then the probability of cheating on all repetitions is less than $1/2^{128}$, which is our usual crypto standard for "infeasible."
(Admittedly this requires that Peggy and Victor construct, exchange, and open a lot of Sudoku puzzles. Oh well; at least computing power and network communication are cheap nowadays?)
Zero knowledge is also a subtle point to reason about. Remember that our goal is to construct a simulator whose actions look identical to the real prover Peggy's work in the real world.
prover Peggy($x, w$) | verifier Victor($x$) | |
---|---|---|
$\underset{\longleftarrow}{\pi_\text{Real}} \\ \longrightarrow \\ \vdots \\ \longrightarrow $ |
simulator S($x$) | verifier Victor($x$) | |
---|---|---|
$\underset{\longleftarrow}{\pi_\text{Fake}} \\ \longrightarrow \\ \vdots \\ \longrightarrow $ |
But also: the simulator isn't real; it's just a figment of Victor's mind. And as a result, the simulator already "knows" the decisions that Victor will take -- specifically, which of the 28 checks Victor will choose to test Peggy on.
As a result, the simulator (who doesn't know the Sudoku solution $w$) can interact with Victor by using a fake puzzle that is only correct on the opened commitments, but nowhere else.
To make this claim rigorous, let's change the protocol slightly by adding a new round of interaction at the start of the zero knowledge proof.
Also when we say that the simulator and Victor are the same party, what we mean formally is that they are both just computer programs that are controlled by the same entity. We'll see in a second how to use this property.
Here's how the simulator works.
Since the simulator doesn't know the real solution $w$, it makes up a random Sudoku puzzle that has no connection to reality.
Next, the simulator interacts with Victor to perform steps 1, 2, and 3 of the protocol. So far, the simulator has only sent commitments to the fake puzzle, but commitments are hiding so Victor cannot tell that the simulator is using a fake puzzle. (Victor would find out if we continued on to step 4, though.)
From step 3 of the proof, the simulator has now learned which action Victor will use as his test. Now the simulator rewinds Victor's state back to the end of step 1, creates a new puzzle that is correct on the desired row/column/grid/public values, and completes the rest of the proof in the same way as the honest prover would.
That's it! This "rewinding" trick is what makes the analysis work. This is how we formalize the intuitive claim from before that:
it isn't really the cards themselves that convince Victor that the proof is true, [but instead that] Peggy already committed to the cards beforehand (i.e., placed them facedown and cannot change them later).
Great, now we have a digital way to prove that a Sudoku puzzle can be solved. What can we do with this?
Amazingly, you can do quite a lot! If you take a complexity theory course, you will learn that Sudoku puzzles (if you extend them to be larger than $9 \times 9$ have the property of being NP-complete.
In simple terms, what this means is that you can take any problem that has an efficiently-computable check program $C(x, w)$ and convert it into a Sudoku puzzle, so that the original problem is true if and only if the corresponding Sudoku puzzle is true.
So here are some problems that you can prove in zero knowledge.
...And many more.
At this point, we are at an analogous point with ZK proofs as where we were at the end of Lecture 13 with MPC. The punchline is that:
In particular, the approach we have designed today is inefficient in at least three metrics.
Computation: doing this reduction from a general problem to a Sudoku puzzle is computationally intensive, and it would be better to find new ways to prove directly the statements that are of interest to us. In the next few lectures, we will design proofs that can be calculated in seconds or even milliseconds.
Interaction: the proof we constructed today only convinces a single verifier Victor; if you want to convince someone else then you'll have to start all over again. In future lectures, we will design a single proof that you can post to a blockchain and that will convince everyone!
Communication: in order to reduce the soundness error, we had to create $k = 2440$ permuted/committed Sudoku puzzles and send them all over the Internet. On a blockchain, remember that you pay fees that are directly proportional to the amount of data that you post onto the blockchain, so posting the proof that we constructed today would be cost-prohibitive. Instead, we will design proofs that take up just a few hundred bytes of data -- essentially the same as a signature!