Current assignments
Office hours
I cannot make my usual office hours on Thursday. So for this week only, I will do virtual office hours on Wednesday at 3-4pm on Zoom. If you cannot make this time, send any questions over Piazza or ping me and we can try to find an alternative meeting time.
Re-submission of prior homework
You have one more chance to re-submit each homework assignment, until Friday 4/14. In particular, if you have any homework with a grade of R or N, make sure to (re-)submit because even one of these scores has a big impact on your base grade!
Base grade | Scores on the 8 assignments |
---|---|
A- | All M or higher, at least 3 scores of E |
B | At least 1 E, at most 1 R, rest are M |
C+ | At most 3 scores of R or 1 score of N |
C- | Anything else |
(Reminder: for graduate students registered in DS 653, a project score of E is required in order to receive a base grade of A-.)
Boston University has a substantial interest in improving the sustainability of the campus, as described in the Universityβs Zero Waste Plan and evidenced by all of the sustainability features of the new CCDS building. The building also includes many sensors to track data like energy consumption, temperature settings, and even the weight of trash thrown into the waste bins.
For instance, here is an image showing all of the data gathered from the waste bin sensors on Floor of the building, over the course of March 2023.
With increased awareness in Environmental Social and Governance (ESG) reporting, Boston University would like to explore whether it is possible to track sustainability metrics on the blockchain. Providing public access to this data must be done in a way that protects:
In this project, your task is to take on the role of one of three entities: a sensor manufacturer, blockchain consulting firm, or data dashboard developer. You must design and prototype a system to enable data science with a focus on data confidentiality, integrity, and availability. You can use and extend any tools from this course to safeguard the data.
You may work in groups of 1, 2, or 3 students. Each team must send a response to the Piazza thread that announces this project indicating:
At most three teams can take on any one role, and it is first-come-first-served: if three prior posts on this thread have already claimed a role, then you have to choose a different role.
You are welcome to talk with members of the other teams; indeed it is a good thing if you design a system where the outputs of one company are used as inputs for another company, so that it is simple for Boston University to buy all of your products and have an end-to-end system for data collection, storage, analysis, and visualization. On the other hand, if there is another company that is performing the same task as you, then they are your competitors! Your grade may be based in part on which competing vendor has performed their task the best, so it is not in your interest to collaborate with them.
Additionally, all of the usual collaboration rules from the syllabus are in effect here. As a reminder: you can use any external resource such as computers/calculators, Piazza, lecture notes, textbooks, other websites, and your fellow classmates. That said, please always obey the following rules:
There are three types of deliverables that you must provide to the instructors in order to receive credit for this project.
Your grade in the project will be based on the extent that you leverage the concepts that we have covered in class in order to design and prototype a system that allows for data analysis without data sharing.
M
, you must make use of the concepts that you have learned in class to design and prototype a solution for your company.E
, you must extend and build upon the concepts that you have learned in class, in order to design new algorithms or prototype new tools that solve your companyβs requirements in a clever and innovative fashion.Grades of R
and N
are also possible.
(Reminder for students registered for DS 653: you must receive a score of E on the project in order to be eligible for the highest base grade.)
When grading the project, you will be judged on your individual contribution as well as that of the overall team. Larger teams are expected to design algorithms and build prototypes that cover more of the challenges available to your chosen company type.
For group projects of 2 or 3 people: in order to determine the individual component of the project grade, your written report must include a joint statement (i.e., written by everyone in the group) describing each personβs individual contribution to the overall project results.
Here is the timeline for the project, which will run from April 4 to May 2.
Zero knowledge solves a seemingly-impossible problem:
Can a prover convince a verifier that a statement is true, without revealing the evidence or reason why?
prover Peggy($x, w$) | verifier Victor($x$) | |
---|---|---|
$\underset{\longleftarrow}{\pi} \\ \longrightarrow \\ \vdots \\ \longrightarrow $ |
There are a few items that are assumed to be common knowledge before a ZK protocol begins:
The prover Peggy also has a witness $w$ (i.e., the solution to the Sukodu puzzle).
By the end of the interactive conversation, the verifier should be convinced that:
Once the protocol ends: the verifier outputs its judgment about whether to Accept or Reject the claim that $x$ is true.
Formally, a zero knowledge proof satisfies three properties.
The performance of a ZK proof depends on:
Last time, we studied proofs based on the strange idea that the prover could pretend to run an entire MPC calculation by herself. These proofs were great in terms of computing but not so much in terms of communication or rounds of interaction.
Source: Trail of Bits blog
Source: Trail of Bits blog
We have two goals for today.
Make the ZK proofs from last week non-interactive, meaning that only one round of interaction is required from the prover to the verifier.
Design a new kind of ZK proof that has a small amount of communication between the prover and verifier (and will also be non-interactive based on technique #1). Also the proof will be fast to verify, but there is a larger computation cost for the prover to make the proof. This is also the kind of proof you will use in Homework 7.
Our first action today is to make a slight modification to the proof from last week, in order to transform it into a 1-round protocol -- that is, a single message sent from Peggy to Victor.
This is powerful, because if we can make the proof non-interactive then Peggy can post the proof for anyone to verify! That is: the proof no longer is targeted toward a designated verifier, but instead becomes publicly verifiable.
The key insight here comes from inspecting Victor's message in round 2 of the protocol in more detail.
Source: Trail of Bits blog
Note that Victor's message has the following properties:
As a result, Victor's algorithm is so simple that we would like to ask Peggy to perform this action on Victor's behalf. This would eliminate Victor's need to interact in the ZK proof at all; he only needs to perform the final verification step at the end.
Question. What could go wrong if we ask Peggy to create the challenge (of which parties to open) rather than Victor?
The concern is that Peggy could choose them dishonestly, rather than randomly.
Specifically, Peggy might have executed some of the MPC parties in a faulty manner. If we allow Peggy to control which parties are opened, then she might choose always to avoid the faulty parties so that she is never caught in a lie.
So: we want Peggy to select the challenge uniformly at random, but in a way that is outside her control to tamper with.
Fiat and Shamir came up with a clever way to do this: use a hash function! Namely, remember that one strong assumption we can make about a hash function $H$ is that it acts like a random oracle, i.e., as if it is a randomly-generated truth table.
If an adversary [Mallory] has not explicitly queried the oracle on some point $x$, then the value of $H(x)$ is completely random... at least as far as [Mallory] is concerned.
β Jon Katz and Yehuda Lindell, Introduction to Modern Cryptography
The Fiat-Shamir transform converts an interactive proof into a non-interactive one, by telling the prover Peggy to compute the challenge as a hash function applied to the statement $x$ and her first message of the protocol.
In this way, Peggy's own commitments end up selecting the challenge message, but in a way that is difficult for her to predict or control.
In detail, Peggy performs the following:
This construction can be proved to satisfy all of the ZK properties, although I won't show that here.
Remember that one advantage of a non-interactive zero knowledge proof is that we can post it to a blockchain, in order to convince the whole world that (for instance) a financial transaction is sound. We'll discuss ZK in the context of blockchains next time.
When they proposed this transformation, Amos Fiat and Adi Shamir's original goal wasn't to build non-interactive ZK. Actually, they wanted to build digital signatures!
If you think about it, a digital signature is just a special case of a zero knowledge proof.
Recall the definition of a digital signature.
Definition. A digital signature contains three algorithms:
true
or false
All three algorithms should be efficiently computable. Moreover, the digital signature must satisfy two guarantees: correctness and unforgeability.
Here's how we can use a ZK proof to build a signature.
Using this approach, can build a digital signature scheme purely using a hash function $H$ and zero knowledge proofs... which, using the MPC-in-the-head approach, also only need a hash function to construct the commitment scheme.
So, this is a new type of proof that doesn't rely on a discrete log or factoring-type assumption.
Source: Trail of Bits blog
That's helpful because discrete logs and factoring are known to be breakable if someone ever builds a sufficiently-large quantum computer.
To plan ahead for this potential problem, the U.S. National Institute of Standards and Technology (NIST) have been running a post-quantum cryptography competition for the past decade to study and eventually standardize replacements for the current systems that we use for public-key cryptography. The idea that I described above is just one of many proposals they have considered.
Switching gears, for the rest of today's lecture I will show you a new way to construct a zero knowledge proof. This new system prioritizes smaller proof sizes at the expense of longer time to generate a proof.
This new proof system is called a zero knowledge succinct non-interactive argument of knowledge, or zk-SNARK.
That is, the proof that we construct will be:
A few remarks before we continue:
The most important thing to take away from today's lecture is that all of this is even possible -- that I can send you a really short message (think: a few hundred bytes of data) that will convince you that I performed a data science analysis correctly. And the message is short even if:
For instance, here are some potential applications of zk-SNARKs:
In most of today's lecture, I'm not going to worry about the zero knowledge property of hiding the actual data. It turns out that this is the easy part. The hardest part of building a zk-SNARK is the succinctness property.
To see why this is hard, let's think about what you might do to verify a computation done by someone else, if ZK and succinctness were not needed.
Here is one approach: when the untrusted cloud performs the computation, you can also ask them to write down their work at each step. After they're done, you can review all of their work and check that they've done it correctly.
For example, let's pretend that you outsource to the cloud the act of calculating $2^{20}$. For now, let's pretend that the only way to calculate this number is to multiply a register by 2 a total of $20$ times.
Suppose the cloud messes up in one step of the calculation, and here is the work that they show you.
Rather than calculating $8192 \times 2 = 16,384$ the cloud instead produces the cell $15,625$.
As a consequence, the cloud tells you that $2^{20} = 1$ million.
Source: Vitalik Buterin's blog
If you can see the entire working state of the cloud computer, then you can catch this bug!
But what if the cloud can only send you a small amount of data, of your choice. What should you request?
One natural answer is to ask the cloud to send over just a portion of its working state, so that you can spot-check a subset of its work.
But this approach is likely to miss the bug. Even worse: a malicious/faulty cloud provider could be careful only on the part of the calculation that you ask for, and spend its time making sure that this part is correct even if the computer is incredibly buggy elsewhere!
What you need is some way to bind together all of the numbers $1, 2, 4, 8, \ldots$ into some object that is:
Source: Vitalik Buterin's blog
Question. Can you think of a mathematical object with these properties?
One common answer that many zk-SNARKs adopt is to use polynomials. They have the property that if you change a polynomial at even one point, then you end up changing it at nearly every point.
Or in other words:
Two different polynomials $P$ and $Q$, each of degree $d$, intersect in at most $d$ points.
Source: Why and How zk-SNARK Works
Unfortunately, polynomials themselves aren't succinct. If you want to describe a polynomial of a large degree $d$, you need to write $d+1$ coefficients.
The good news is that polynomial commitments are succinct. You saw these in yesterday's Lab in the context of an AVSS protocol, and it turns out that they are helpful in constructing zk-SNARKs as well.
A polynomial commitment is a special case of a commitment scheme. It allows the prover to create a short message that binds the prover to a specific polynomial, without needing to reveal it (since that would be both privacy-destroying and space-consuming).
Furthermore, given the commitment to polynomials $\operatorname{com}(P)$ and $\operatorname{com}(Q)$, we can do a few operations over these commitments.
(In yesterday's Lab, you saw a commitment scheme that only had the first two properties. It turns out that it is possible to build a more sophisticated commitment scheme that supports multiplication too, although it is beyond the scope of this class to show the math of how that's done.)
The punchline here is: if we can transform the action of "verifying an arbitrary program outsourced to the cloud" into "verifying some polynomial arithmetic," we would be done!
So let's spend the rest of today's lecture doing exactly this.
[This part of the lecture largely follows the contents of Vitalik Buterin's blog post Quadratic Arithmetic Programs: from Zero to Hero.]
Amazingly, there is a generic way to compile an arbitrary computer program (say, written in Python or Circom) into a polynomial. The details are complicated, so I'll show you an overview today by walking through an example.
In our example, the prover wants to convince verifier of the following statement: it knows some value $x$ such that $x^3 + x + 5 = 35$.
(Admittedly, this is not a very impressive claim. It's not hard to solve this cubic equation for yourself. Hint: the answer is $x = 3$. But let's pretend that the verifier does not know this, and wants to be convinced that the solution is correct.)
Or to cast the question in another way: both the prover and verifier know the code of the following Python function. The prover wants to prove that she knows an input that will cause this function output to equal $35$.
def qeval(x):
y = x**3
return x + y + 5
The first step is to re-write the computer program in a new way, so that each individual line only contains a single addition and/or a single multiplication.
(Thanks to Turing-completeness, this is always possible, but may not always be easy to do.)
def qeval(x):
one = 1
sym_1 = x * x
y = sym_1 * x
sym_2 = y + x
out = sym_2 + 5*one
return out
The prover knows that an input of $x = 3$ will result in a result of $\operatorname{out} = 35$.
Actually we can make a stronger statement: the prover knows the value of all variables throughout the calculation.
Let's think of the entire working state of the prover as a vector $\vec{s} = [1, 3, 35, 9, 27, 30]$.
This working state is analogous to the prover's running calculation of $2^{20}$ from earlier. If the prover showed the entirety of $\vec{s}$ to the verifier, then they could check each step and become convinced that the prover's work is correct. But that would be neither succinct nor confidential.
Instead, we need some way to encode all of these variables into a polynomial, since that is both succinct and brittle (if the prover is cheating then we want the lie to be easy to detect).
Next, we convert each step of the program into a series of vector equations.
Specifically, we want to construct three (public) vectors $\vec{a}, \vec{b}$, and $\vec{c}$ corresponding to our computation, such that any solution vector $\vec{s}$ known by the prover satisfies the vector equation
$$ (\vec{s} \cdot \vec{a}) * (\vec{s} \cdot \vec{b}) - \vec{s} \cdot \vec{c} = 0,$$where $\cdot$ denotes a dot product and $*$ denotes taking the component-wise product of two vectors. (Reminder: the multiplication of two vectors is typically not defined... just add that to the stack of ways that this construction is unorthodox.)
Once again it is easiest to think about this using an example. Here is the R1CS equation corresponding to the final step of our function qeval
, which checks that out = sym_2 + 5*one
.
def qeval(x):
one = 1
sym_1 = x * x
y = sym_1 * x
sym_2 = y + x
out = sym_2 + 5*one
return out
Question. How would you construct vector equations corresponding to sym_1 = x * x
or y = sym_1 * x
?
Answer. We can check the calculation sym_1 = x * x
using:
We can check the calculation y = sym_1 * x
using:
Don't worry too much about the math details here. The important part to remember is that:
We can convert each line of the program
qeval
into a vector equation $(\vec{s} \cdot \vec{a}) * (\vec{s} \cdot \vec{b}) = \vec{s} \cdot \vec{c},$ where $\vec{a}, \vec{b}, \vec{c}$ are publicly known and the prover knows the secret $\vec{s}$.
As a result, in order to check the entire program, we must verify a system of vector equations.
The next step is to convert a system of vector equations into a single equation involving polynomials.
This is actually straightforward to do using polynomial interpolation. We take the constants from each of the equations and turn them into vectors of polynomials:
Once again, don't worry too much about the math involved here. The important thing is that we can represent any computer program as a series of public constants, with the property that any solution $\vec{s}$ has the property that:
In other words, multiplying the scalars within $\vec{s}$ times each of the polynomials in turn will lead to a polynomial that is zero everywhere. In this way, we can convert a system of vector equations into a single (admittedly more complicated) equation.
(Nitpicky detail: sometimes the result of this calculation won't be zero everywhere, but just on a large fraction of the space. Let's not worry about this.)
Why did we do all of this? The major benefit of this approach is that:
"Instead of checking the constraints in the R1CS individually, we can now check all of the constraints at the same time by doing the dot product check on the polynomials."
-- Vitalik Buterin, Quadratic Arithmetic Programs: from Zero to Hero
Concretely, we have converted the prover's claim into a new format.
x
that causes qeval(x) = 35
"Here's the punchline: checking a polynomial equation is easy, because it suffices to check it at a single point!
Suppose I have a polynomial $P$ in my head, and I claim that $P$ is the zero polynomial. You can check whether I'm telling the truth just by challenging me to reveal the value $P(r)$ on a random value of your choice.
So once we have done this (strange) transformation of a computer program into a polynomial equation, there's actually a simple three-step zero knowledge proof system.
Also, this proof can be made non-interactive using the Fiat-Shamir transformation from earlier in the lecture.
This is how the succinct proof systems in Homework 7 work, behind the scenes.
Fortunately for you, the strange "moon math" of R1CS and polynomials is all done behind-the-scenes. In the homework, you just have to write your computer program in a language that is capable of being compiled down into this strange representation, and there is open-source code to do all the rest for you.