AnnouncementsΒΆ

Current assignments

  • Homework 7 has been posted on Piazza. It is due on Friday 4/7 on Gradescope.
  • Final project announced today in class.
  • All previous homework (re-)submissions have been graded.

Office hours

I cannot make my usual office hours on Thursday. So for this week only, I will do virtual office hours on Wednesday at 3-4pm on Zoom. If you cannot make this time, send any questions over Piazza or ping me and we can try to find an alternative meeting time.

Re-submission of prior homework

You have one more chance to re-submit each homework assignment, until Friday 4/14. In particular, if you have any homework with a grade of R or N, make sure to (re-)submit because even one of these scores has a big impact on your base grade!

Base grade Scores on the 8 assignments
A- All M or higher, at least 3 scores of E
B At least 1 E, at most 1 R, rest are M
C+ At most 3 scores of R or 1 score of N
C- Anything else

(Reminder: for graduate students registered in DS 653, a project score of E is required in order to receive a base grade of A-.)

DS 453/653 Final Project: ESG on the BlockchainΒΆ

BackgroundΒΆ

Boston University has a substantial interest in improving the sustainability of the campus, as described in the University’s Zero Waste Plan and evidenced by all of the sustainability features of the new CCDS building. The building also includes many sensors to track data like energy consumption, temperature settings, and even the weight of trash thrown into the waste bins.

For instance, here is an image showing all of the data gathered from the waste bin sensors on Floor of the building, over the course of March 2023.

Waste sensor data, CCDS building 13th floor

With increased awareness in Environmental Social and Governance (ESG) reporting, Boston University would like to explore whether it is possible to track sustainability metrics on the blockchain. Providing public access to this data must be done in a way that protects:

  • Availability of aggregate data for all to use in conducting data science for climate and sustainability.
  • Integrity of this aggregate data so that people trust the sustainability metrics and push the university and the community toward improving these metrics over time.
  • Confidentiality of all raw data from the individual sensors, since learning fine-grained from an individual sensor level or from a short period of time might allow someone to determine whether a specific individual is in the building at any given time or when a professor holds office hours (note the larger food waste on the 13th floor on Thursdays!).

Your objectiveΒΆ

In this project, your task is to take on the role of one of three entities: a sensor manufacturer, blockchain consulting firm, or data dashboard developer. You must design and prototype a system to enable data science with a focus on data confidentiality, integrity, and availability. You can use and extend any tools from this course to safeguard the data.

  1. Sensor manufacturer: Your company has been hired by Boston University to build the sensors that capture the energy consumption, inside temperature, food waste, and more throughout the CCDS building. Your job is to provide all of this data to Boston University in a form where (a) they know that the raw data is accurate, and so that (b) they can demonstrate to the rest of the world that the data is accurate. You may assume that each sensor is connected to the Internet and records data every minute; that said, the sensors should not directly record all of these measurements to the blockchain because that would be too expensive in terms of data stored on-chain. Instead, you should think about systems that allow for data aggregation by a collection of servers run by your company (with data encrypted in transit), with only the daily totals posted to the blockchain (and this may also be potentially encrypted). Each sensor can also have its own computing power and data storage capacity, although you should try to minimize the use of both since they will directly add to the cost of the devices that your company sells. You may also assume that each individual sensor has a safe place to store a secret key, and that there is a PKI so everyone can check whether a message has been (say) digitally signed by a particular sensor. Remember that your goal is to protect the confidentiality, integrity, and availability of data at all times… potentially even against yourself, since the BU community (rather than your company) are the real owners of this data.
  1. Blockchain consulting firm: Your company has been hired by Boston University to develop a blockchain solution. Your goal is to take the results from the sensor vendor’s platform and make it available on a blockchain, again in a manner that protects confidentiality, integrity, and availability. The first step is to design and develop a blockchain platform that can store and manage waste-related data. The blockchain should be permissioned and secured to ensure the privacy and security of the data. The blockchain can (but doesn’t have to) sync up with public blockchains like Bitcoin, Ethereum, Algorand, ZCash, Mina, or any other blockchain as long as you justify your choice. Additionally, to promote ESG practices, you should design a system that incentivizes waste reduction efforts by offering rewards or discounts to members of the BU community who reduce their energy consumption, waste generation, etc. You can assume that all BU faculty and students who spend significant time on one floor of the building can β€œregister” as members of that floor, and that there is an administrator within BU IS&T who manually manages the list and associated PKI.
  1. Data dashboard developer: Your company has been hired by Boston University to design an easy-to-use dashboard that allows the BU community (and more) to understand, measure, and improve their sustainability. Your starting point is that sustainability data is available on the blockchain in some format. From there, you should develop a user-friendly dashboard to allow individuals and organizations to access data from all of the sensors, as well as provide insights and recommendations on waste reduction strategies. Remember that your intended targets are essentially β€œlight clients” – they are faculty and students looking through the web or on their phones, but do not have the storage or always-on connectivity to stay connected to the blockchain. Nevertheless, they want to be convinced that the aggregate data results that are shown to them are indeed based on the data from the blockchain. Additionally, while the sensor vendor has posted data to the blockchain on a daily basis, your dashboard should display aggregate totals and trends over time at a coarser-granularity of weeks, months, or quarters (depending on the metrics that you choose). As with the other teams, you should think about how to accomplish these goals with a focus on data confidentiality, integrity, and availability.

Signing up for a teamΒΆ

You may work in groups of 1, 2, or 3 students. Each team must send a response to the Piazza thread that announces this project indicating:

  • The names of team members, and
  • The company you want to represent.

At most three teams can take on any one role, and it is first-come-first-served: if three prior posts on this thread have already claimed a role, then you have to choose a different role.

Collaboration policyΒΆ

You are welcome to talk with members of the other teams; indeed it is a good thing if you design a system where the outputs of one company are used as inputs for another company, so that it is simple for Boston University to buy all of your products and have an end-to-end system for data collection, storage, analysis, and visualization. On the other hand, if there is another company that is performing the same task as you, then they are your competitors! Your grade may be based in part on which competing vendor has performed their task the best, so it is not in your interest to collaborate with them.

Additionally, all of the usual collaboration rules from the syllabus are in effect here. As a reminder: you can use any external resource such as computers/calculators, Piazza, lecture notes, textbooks, other websites, and your fellow classmates. That said, please always obey the following rules:

  • In your submission, you must list (a) names of classmates you worked with, (b) any websites you used besides the ones listed in the lecture notes or textbooks, and (c) any code that you used from other sources. Taking ideas without attribution will be considered plagiarism. You are graded on your original work.
  • You cannot copy solutions from anyone else, or give your solutions to a classmate to copy.

DeliverablesΒΆ

There are three types of deliverables that you must provide to the instructors in order to receive credit for this project.

  1. A report describing your company’s design, in as much detail as required for us to understand all the concepts behind your design, why you chose this approach, and any novelty of your work relative to the ideas we have discussed in class. There is no specific minimum or maximum length of this report.
  2. A prototype construction that implements at least some of your design. (Given the limited timeframe of the project, it is not required that you implement everything.) Your prototype can use any open-source code (with attribution) and any code generated in the homework assignments; that said, you will be evaluated on the novel way that you combine the existing ideas together and/or add on top of them.
  3. A short in-class presentation during the final class of the semester.

GradingΒΆ

Your grade in the project will be based on the extent that you leverage the concepts that we have covered in class in order to design and prototype a system that allows for data analysis without data sharing.

  • To earn a score of M, you must make use of the concepts that you have learned in class to design and prototype a solution for your company.
  • To earn a score of E, you must extend and build upon the concepts that you have learned in class, in order to design new algorithms or prototype new tools that solve your company’s requirements in a clever and innovative fashion.

Grades of R and N are also possible.

(Reminder for students registered for DS 653: you must receive a score of E on the project in order to be eligible for the highest base grade.)

When grading the project, you will be judged on your individual contribution as well as that of the overall team. Larger teams are expected to design algorithms and build prototypes that cover more of the challenges available to your chosen company type.

For group projects of 2 or 3 people: in order to determine the individual component of the project grade, your written report must include a joint statement (i.e., written by everyone in the group) describing each person’s individual contribution to the overall project results.

TimelineΒΆ

Here is the timeline for the project, which will run from April 4 to May 2.

  • Project description is released on Tuesday, April 4.
  • By Friday, April 7 (but ideally sooner): form a team and register for a company role.
  • By Friday, April 21: submit a first draft of your project. The instructors will grade it over that weekend.
  • By Monday, May 1: submit the final draft of your project written report and software.
  • During the final class on Tuesday, May 2: give a short presentation about your project, and submit your presentation materials to the instructors by the end of the day.

Lecture 18: Succinct Zero Knowledge ProofsΒΆ

RecapΒΆ

Zero knowledge solves a seemingly-impossible problem:

Can a prover convince a verifier that a statement is true, without revealing the evidence or reason why?

prover Peggy($x, w$) verifier Victor($x$)
Peggy $\underset{\longleftarrow}{\pi} \\ \longrightarrow \\ \vdots \\ \longrightarrow $ Victor output: Accept or Reject

There are a few items that are assumed to be common knowledge before a ZK protocol begins:

  • A claimed statement $x$ (i.e., the Sudoku puzzle)
  • An efficiently-computable check program $C(x, w)$ that can can test whether a claimed solution $w$ actually solves the statement. It has a boolean output: $\text{True}$ or $\text{False}$.

The prover Peggy also has a witness $w$ (i.e., the solution to the Sukodu puzzle).

By the end of the interactive conversation, the verifier should be convinced that:

  • The statement is true.
  • Optionally, Peggy knows why the statement is true. This is called a proof of knowledge.

Once the protocol ends: the verifier outputs its judgment about whether to Accept or Reject the claim that $x$ is true.

Formally, a zero knowledge proof satisfies three properties.

  1. Completeness: if both Peggy and Victor are honest, then every valid proof should be accepted.
  2. Soundness: even if Peggy is dishonest, then every proof of an invalid statement (i.e., a Sudoku puzzle that has no solution) should be rejected.
  3. Zero knowledge: even if Victor is dishonest, it is infeasible to learn anything from the proof beyond the fact that the statement $x$ is true.

The performance of a ZK proof depends on:

  • How much time it takes for the prover to compute the proof, and for the verifier to check it.
  • The communication between the parties -- that is, how large the proof is in size.
  • The number of rounds of interaction between the prover and verifier.

Last time, we studied proofs based on the strange idea that the prover could pretend to run an entire MPC calculation by herself. These proofs were great in terms of computing but not so much in terms of communication or rounds of interaction.

We have two goals for today.

  1. Make the ZK proofs from last week non-interactive, meaning that only one round of interaction is required from the prover to the verifier.

  2. Design a new kind of ZK proof that has a small amount of communication between the prover and verifier (and will also be non-interactive based on technique #1). Also the proof will be fast to verify, but there is a larger computation cost for the prover to make the proof. This is also the kind of proof you will use in Homework 7.

18.1 Non-interactive proofs from the Fiat-Shamir transformΒΆ

Our first action today is to make a slight modification to the proof from last week, in order to transform it into a 1-round protocol -- that is, a single message sent from Peggy to Victor.

This is powerful, because if we can make the proof non-interactive then Peggy can post the proof for anyone to verify! That is: the proof no longer is targeted toward a designated verifier, but instead becomes publicly verifiable.

The key insight here comes from inspecting Victor's message in round 2 of the protocol in more detail.

Note that Victor's message has the following properties:

  • All it contains is a uniformly random selection of which party/parties to open. Whereas Peggy's messages have lots of structure (i.e., commitments, or views of parties), Victor's messages have essentially no structure.
  • There is nothing that Victor keeps secret from Peggy after his message is revealed. In other words, Victor is stateless.

As a result, Victor's algorithm is so simple that we would like to ask Peggy to perform this action on Victor's behalf. This would eliminate Victor's need to interact in the ZK proof at all; he only needs to perform the final verification step at the end.

Question. What could go wrong if we ask Peggy to create the challenge (of which parties to open) rather than Victor?

The concern is that Peggy could choose them dishonestly, rather than randomly.

Specifically, Peggy might have executed some of the MPC parties in a faulty manner. If we allow Peggy to control which parties are opened, then she might choose always to avoid the faulty parties so that she is never caught in a lie.

So: we want Peggy to select the challenge uniformly at random, but in a way that is outside her control to tamper with.

Fiat and Shamir came up with a clever way to do this: use a hash function! Namely, remember that one strong assumption we can make about a hash function $H$ is that it acts like a random oracle, i.e., as if it is a randomly-generated truth table.

If an adversary [Mallory] has not explicitly queried the oracle on some point $x$, then the value of $H(x)$ is completely random... at least as far as [Mallory] is concerned.

– Jon Katz and Yehuda Lindell, Introduction to Modern Cryptography

The Fiat-Shamir transform converts an interactive proof into a non-interactive one, by telling the prover Peggy to compute the challenge as a hash function applied to the statement $x$ and her first message of the protocol.

In this way, Peggy's own commitments end up selecting the challenge message, but in a way that is difficult for her to predict or control.

In detail, Peggy performs the following:

  • Run the MPC parties and form commitments $\vec{c}$ to all of the views.
  • Compute $r = H(x, \vec{c})$ and treat this random value as Victor's challenge.
  • Open the corresponding views in response to Victor's challenge.
  • Send everything to Victor in one round of communication.

This construction can be proved to satisfy all of the ZK properties, although I won't show that here.

Remember that one advantage of a non-interactive zero knowledge proof is that we can post it to a blockchain, in order to convince the whole world that (for instance) a financial transaction is sound. We'll discuss ZK in the context of blockchains next time.

An aside: From ZK to digital signaturesΒΆ

When they proposed this transformation, Amos Fiat and Adi Shamir's original goal wasn't to build non-interactive ZK. Actually, they wanted to build digital signatures!

If you think about it, a digital signature is just a special case of a zero knowledge proof.

Recall the definition of a digital signature.

Definition. A digital signature contains three algorithms:

  • KeyGen: takes no input, produces a secret key $sk$ and a public key $pk$
  • Sign(sk, M): produces a signature $\sigma$
  • Verify(pk, M, $\sigma$): returns true or false

All three algorithms should be efficiently computable. Moreover, the digital signature must satisfy two guarantees: correctness and unforgeability.

Here's how we can use a ZK proof to build a signature.

  • KeyGen: choose a secret key $sk$, and a public key $pk = H(sk)$ using any preimage-resistant function $H$.
  • Sign(sk, M): create a ZK proof $\sigma$ for the statement "I know a value $sk$ that is the preimage to $pk$, and I approve the message $M$"
  • Verify(pk, M, $\sigma$): check the validity of the proof

Using this approach, can build a digital signature scheme purely using a hash function $H$ and zero knowledge proofs... which, using the MPC-in-the-head approach, also only need a hash function to construct the commitment scheme.

So, this is a new type of proof that doesn't rely on a discrete log or factoring-type assumption.

That's helpful because discrete logs and factoring are known to be breakable if someone ever builds a sufficiently-large quantum computer.

To plan ahead for this potential problem, the U.S. National Institute of Standards and Technology (NIST) have been running a post-quantum cryptography competition for the past decade to study and eventually standardize replacements for the current systems that we use for public-key cryptography. The idea that I described above is just one of many proposals they have considered.

18.2 zk-SNARKsΒΆ

Switching gears, for the rest of today's lecture I will show you a new way to construct a zero knowledge proof. This new system prioritizes smaller proof sizes at the expense of longer time to generate a proof.

This new proof system is called a zero knowledge succinct non-interactive argument of knowledge, or zk-SNARK.

That is, the proof that we construct will be:

  • zero knowledge
  • succinct: in other words, it will be short!
  • non-interactive: using the Fiat-Shamir transform, the proof contains just a single message that the prover sends to the verifier, rather than an interactive challenge-response protocol
  • argument of knowledge: this means that the verifier will be convinced that the statement has a solution and that the prover knows one solution

A few remarks before we continue:

  • So far, we have called the final property a "proof of knowledge." There is a technical difference between the terms "proof" and "argument" within cryptography, but this distinction won't matter to us. For the purposes of this course, you should think of the words "proof" and "argument" as synonyms.
  • The word "snark" comes from a poem by Lewis Carroll, in which several people are hunting a fictional animal that he calls a snark.
  • Fair warning upfront: the construction itself is the most math-heavy thing that I will show in this course. So I will only give an overview of the ideas, but won't show all the details.

The most important thing to take away from today's lecture is that all of this is even possible -- that I can send you a really short message (think: a few hundred bytes of data) that will convince you that I performed a data science analysis correctly. And the message is short even if:

  • the analysis involves a large volume of data (say, gigabytes or terabytes), and/or
  • the analysis itself is computationally intensive (say, a high-performance machine learning algorithm).

For instance, here are some potential applications of zk-SNARKs:

  • A bank can prove that from all of their transactions over the past year, they did not transact with some entity $Y$... but without revealing their full transaction log.
  • A person or company can publicly show that you have paid the correct amount of taxes, without revealing their finances.
  • You outsource a large computation to a cloud computing server, and then can validate that the result is correct without redoing the execution.

Why this is hardΒΆ

In most of today's lecture, I'm not going to worry about the zero knowledge property of hiding the actual data. It turns out that this is the easy part. The hardest part of building a zk-SNARK is the succinctness property.

To see why this is hard, let's think about what you might do to verify a computation done by someone else, if ZK and succinctness were not needed.

Here is one approach: when the untrusted cloud performs the computation, you can also ask them to write down their work at each step. After they're done, you can review all of their work and check that they've done it correctly.

For example, let's pretend that you outsource to the cloud the act of calculating $2^{20}$. For now, let's pretend that the only way to calculate this number is to multiply a register by 2 a total of $20$ times.

Suppose the cloud messes up in one step of the calculation, and here is the work that they show you.

Rather than calculating $8192 \times 2 = 16,384$ the cloud instead produces the cell $15,625$.

As a consequence, the cloud tells you that $2^{20} = 1$ million.

If you can see the entire working state of the cloud computer, then you can catch this bug!

But what if the cloud can only send you a small amount of data, of your choice. What should you request?

One natural answer is to ask the cloud to send over just a portion of its working state, so that you can spot-check a subset of its work.

But this approach is likely to miss the bug. Even worse: a malicious/faulty cloud provider could be careful only on the part of the calculation that you ask for, and spend its time making sure that this part is correct even if the computer is incredibly buggy elsewhere!

What you need is some way to bind together all of the numbers $1, 2, 4, 8, \ldots$ into some object that is:

  • Succinct, but also
  • Brittle, in the sense that if you tamper with even a single number then the object will be wildly different most of the time.

Question. Can you think of a mathematical object with these properties?

One common answer that many zk-SNARKs adopt is to use polynomials. They have the property that if you change a polynomial at even one point, then you end up changing it at nearly every point.

Or in other words:

Two different polynomials $P$ and $Q$, each of degree $d$, intersect in at most $d$ points.

Unfortunately, polynomials themselves aren't succinct. If you want to describe a polynomial of a large degree $d$, you need to write $d+1$ coefficients.

The good news is that polynomial commitments are succinct. You saw these in yesterday's Lab in the context of an AVSS protocol, and it turns out that they are helpful in constructing zk-SNARKs as well.

A polynomial commitment is a special case of a commitment scheme. It allows the prover to create a short message that binds the prover to a specific polynomial, without needing to reveal it (since that would be both privacy-destroying and space-consuming).

Furthermore, given the commitment to polynomials $\operatorname{com}(P)$ and $\operatorname{com}(Q)$, we can do a few operations over these commitments.

  • Evaluation at a point: the prover can open $P(w)$ at some point $w$, and convince the verifier that this opening is correct.
  • Addition of commitments: the verifier can compute on their own $\operatorname{com}(R)$ for $R = P + Q$.
  • Multiplication of commitments: the verifier can compute on their own $\operatorname{com}(S)$ for $S = P Q$.

(In yesterday's Lab, you saw a commitment scheme that only had the first two properties. It turns out that it is possible to build a more sophisticated commitment scheme that supports multiplication too, although it is beyond the scope of this class to show the math of how that's done.)

The punchline here is: if we can transform the action of "verifying an arbitrary program outsourced to the cloud" into "verifying some polynomial arithmetic," we would be done!

So let's spend the rest of today's lecture doing exactly this.

18.3 Converting programs into polynomialsΒΆ

[This part of the lecture largely follows the contents of Vitalik Buterin's blog post Quadratic Arithmetic Programs: from Zero to Hero.]

Amazingly, there is a generic way to compile an arbitrary computer program (say, written in Python or Circom) into a polynomial. The details are complicated, so I'll show you an overview today by walking through an example.

In our example, the prover wants to convince verifier of the following statement: it knows some value $x$ such that $x^3 + x + 5 = 35$.

(Admittedly, this is not a very impressive claim. It's not hard to solve this cubic equation for yourself. Hint: the answer is $x = 3$. But let's pretend that the verifier does not know this, and wants to be convinced that the solution is correct.)

Or to cast the question in another way: both the prover and verifier know the code of the following Python function. The prover wants to prove that she knows an input that will cause this function output to equal $35$.

InΒ [Β ]:
def qeval(x):
    y = x**3
    return x + y + 5

Step 1: FlatteningΒΆ

The first step is to re-write the computer program in a new way, so that each individual line only contains a single addition and/or a single multiplication.

(Thanks to Turing-completeness, this is always possible, but may not always be easy to do.)

InΒ [Β ]:
def qeval(x):
    one = 1
    sym_1 = x * x
    y = sym_1 * x
    sym_2 = y + x
    out = sym_2 + 5*one
    return out

The prover knows that an input of $x = 3$ will result in a result of $\operatorname{out} = 35$.

Actually we can make a stronger statement: the prover knows the value of all variables throughout the calculation.

  • $\operatorname{one} = 1$
  • $x = 3$
  • $\operatorname{out} = 35$
  • $\operatorname{sym}_1 = 9$
  • $y = 27$
  • $\operatorname{sym}_2 = 30$

Let's think of the entire working state of the prover as a vector $\vec{s} = [1, 3, 35, 9, 27, 30]$.

This working state is analogous to the prover's running calculation of $2^{20}$ from earlier. If the prover showed the entirety of $\vec{s}$ to the verifier, then they could check each step and become convinced that the prover's work is correct. But that would be neither succinct nor confidential.

Instead, we need some way to encode all of these variables into a polynomial, since that is both succinct and brittle (if the prover is cheating then we want the lie to be easy to detect).

Step 2: A Rank-1 Constraint System (R1CS)ΒΆ

Next, we convert each step of the program into a series of vector equations.

Specifically, we want to construct three (public) vectors $\vec{a}, \vec{b}$, and $\vec{c}$ corresponding to our computation, such that any solution vector $\vec{s}$ known by the prover satisfies the vector equation

$$ (\vec{s} \cdot \vec{a}) * (\vec{s} \cdot \vec{b}) - \vec{s} \cdot \vec{c} = 0,$$

where $\cdot$ denotes a dot product and $*$ denotes taking the component-wise product of two vectors. (Reminder: the multiplication of two vectors is typically not defined... just add that to the stack of ways that this construction is unorthodox.)

Once again it is easiest to think about this using an example. Here is the R1CS equation corresponding to the final step of our function qeval, which checks that out = sym_2 + 5*one.

InΒ [Β ]:
def qeval(x):
    one = 1
    sym_1 = x * x
    y = sym_1 * x
    sym_2 = y + x
    out = sym_2 + 5*one
    return out

Question. How would you construct vector equations corresponding to sym_1 = x * x or y = sym_1 * x?

Answer. We can check the calculation sym_1 = x * x using:

$$\begin{align} \vec{a} &= [0, 1, 0, 0, 0, 0] \\ \vec{b} &= [0, 1, 0, 0, 0, 0] \\ \vec{c} &= [0, 0, 0, 1, 0, 0]. \end{align}$$

We can check the calculation y = sym_1 * x using:

$$\begin{align} \vec{a} &= [0, 0, 0, 1, 0, 0] \\ \vec{b} &= [0, 1, 0, 0, 0, 0] \\ \vec{c} &= [0, 0, 0, 0, 1, 0]. \end{align}$$

Don't worry too much about the math details here. The important part to remember is that:

We can convert each line of the program qeval into a vector equation $(\vec{s} \cdot \vec{a}) * (\vec{s} \cdot \vec{b}) = \vec{s} \cdot \vec{c},$ where $\vec{a}, \vec{b}, \vec{c}$ are publicly known and the prover knows the secret $\vec{s}$.

As a result, in order to check the entire program, we must verify a system of vector equations.

Step 3: Quadratic equation on polynomialsΒΆ

The next step is to convert a system of vector equations into a single equation involving polynomials.

This is actually straightforward to do using polynomial interpolation. We take the constants from each of the equations and turn them into vectors of polynomials:

  • $\vec{A} = A_1, A_2, \ldots, A_6$
  • $\vec{B} = B_1, B_2, \ldots, B_6$, and
  • $\vec{C} = C_1, C_2, \ldots, C_6$.

Once again, don't worry too much about the math involved here. The important thing is that we can represent any computer program as a series of public constants, with the property that any solution $\vec{s}$ has the property that:

In other words, multiplying the scalars within $\vec{s}$ times each of the polynomials in turn will lead to a polynomial that is zero everywhere. In this way, we can convert a system of vector equations into a single (admittedly more complicated) equation.

(Nitpicky detail: sometimes the result of this calculation won't be zero everywhere, but just on a large fraction of the space. Let's not worry about this.)

Why did we do all of this? The major benefit of this approach is that:

"Instead of checking the constraints in the R1CS individually, we can now check all of the constraints at the same time by doing the dot product check on the polynomials."

-- Vitalik Buterin, Quadratic Arithmetic Programs: from Zero to Hero

Concretely, we have converted the prover's claim into a new format.

  • We started with "I know an input x that causes qeval(x) = 35"
  • Our new claim is a polynomial equation of the form "I know a vector $\vec{s}$ such that $(\vec{s} \cdot \vec{A}) * (\vec{s} \cdot \vec{B}) - \vec{s} \vec{C} = 0$" (where all of the polynomials in $\vec{A}, \vec{B}, \vec{C}$ are publicly known).

Here's the punchline: checking a polynomial equation is easy, because it suffices to check it at a single point!

Suppose I have a polynomial $P$ in my head, and I claim that $P$ is the zero polynomial. You can check whether I'm telling the truth just by challenging me to reveal the value $P(r)$ on a random value of your choice.

  • If $P$ really was the zero polynomial, then clearly $P(r) = 0$.
  • Otherwise, $P$ has at most $d$ roots (if it is of degree $d$), and there is a very small chance that you happened to pick one of the roots.

So once we have done this (strange) transformation of a computer program into a polynomial equation, there's actually a simple three-step zero knowledge proof system.

  1. The prover commits to its secret vector $\vec{s}$ containing the state of the program at all times.
  2. The verifier chooses a challenge point $r$.
  3. The prover uses the magic of polynomial commitments to reveal the value of the polynomial equation $(\vec{s} \cdot \vec{A}) * (\vec{s} \cdot \vec{B}) - \vec{s} \vec{C}$ when all polynomials are evaluated at $r$. The verifier accepts if and only if the answer is 0.

Also, this proof can be made non-interactive using the Fiat-Shamir transformation from earlier in the lecture.

This is how the succinct proof systems in Homework 7 work, behind the scenes.

Fortunately for you, the strange "moon math" of R1CS and polynomials is all done behind-the-scenes. In the homework, you just have to write your computer program in a language that is capable of being compiled down into this strange representation, and there is open-source code to do all the rest for you.