Announcements¶

Remaining graded material

Project: project draft due 4/21, and final version due 5/1
Final Exam on Monday, May 8 at 9-11am -- cumulative test that covers all lectures (through April 25), recitation labs, homework assignments, and assigned reading

Remaining lectures

Lecture by Nicolas on Tuesday 4/25 -- this is the last lecture that counts toward the final exam
Next week: guest lecture by Prof. Andy Sellars of the Law school on Thursday 4/27 -- no video recording
Project presentations on Tuesday 5/2 -- no video recording

Please submit your course evaluation at https://bu.campuslabs.com/courseeval/

Lecture 23: Proactive Secret Sharing¶

Recap¶

Throughout this course, we have studied four fundamental topics.

Cryptocurrencies and the blockchains within them, which enable financial transactions in a decentralized way, without the need to give your money to trusted intermediaries (aka, banks).
Broadcast protocols in the (a)synchronous setting, which enable consensus over the Internet without needing to rely on trusted intermediaries (aka, leaders).
Secure multi-party computation, which enables data analysis without the need to give your data to trusted intermediaries (aka, clouds).
Zero knowledge proofs, which enable the world to hold an entity (such as a bank, leader, cloud, or any one of us) accountable, without the need for trusted intermediaries (aka, auditors).

Our goal for today is to examine how we can keep -- and compute over -- secret data for very long periods of time.

To do so, we will use MPC to protect our secrets, and blockchains to store the corresponding shares.

Cryptographic secret sharing enables a group of parties to share a secret together. The secret is kept private from any individual in the group unless a threshold of parties in the group decide to reveal the secret.

It is the digital equivalent to a lockbox with multiple keys. If enough keys were inserted in the right places, the box would open and the secret would be revealed. Otherwise, the lockbox would remain closed and nobody would have access to it.

A secret sharing scheme is defined by two algorithms:

$\operatorname{split}(\text{secret } x, \text{threshold } t, \text{nodes } n) \to \text{shares } s_1, ... s_n$. This method spits a secret into $n$ shares such that any subset of shares of size $t$ can reconstruct the original secret.
$\operatorname{reconstruct}(s_{i_1}, ... s_{i_{t}})$ where $i_1, \ldots i_{t} \in \{1, ... n\}$ takes any $t$ of the secret shares and outputs the original secret $s$.

Secret sharing satisfies two properties.

Secrecy: The adversary learns nothing about the secret, if they have less than $t$ shares.
Correctness: Any $t$ shares can reconstruct the same secret.

(In Lecture 14 we also discussed a verification property. While we won't discuss that directly today, I'll remark that everything we will cover is compatible with verifiable secret sharing.)

D-20 dice

We have studied three types of secret sharing (there are more, but these are the most commonly-used techniques).

Additive secret sharing: choose $n-1$ shares $s_1, s_2, \ldots, s_{n-1}$ uniformly at random, and then set the final share $s_n$ such that the secret $x = s_1 + s_2 + \cdots + s_n \bmod{p}$ for some public modulus $p$.

Advantage: simple to understand and to construct.
Disadvantage: only supports a reconstruction threshold of $t = n$. So data availability may be a concern: if even one party losing its share, then the data is lost.

Replicated secret sharing: like additive secret sharing, except give each party $n - t + 1$ shares. For example, we have built an MPC protocol with $N = 3$ via replicated secret sharing, where each party stored 2 shares, so secrecy was provided against any 1 party.

Advantage: can support a lower threshold $t$, so provides some redundancy in case some shares are accidentally deleted or some parties maliciously choose not to participate in reconstruction.
Disadvanage: doesn't scale well; can have substantial storage cost if $n$ is large.

Shamir secret sharing: choose a random polynomial $P$ of degree $t$ whose y-intercept is the secret (i.e., $P(0) = x$). Then construct shares $s_i = P(i)$.

Advantage: can support arbitrary $t$ and $n$!
Disadvantage: slightly more complicated to describe.

23.1 Long-term storage of secrets¶

Secret sharing is an effective way to disperse a secret among a decentralized, federated group of servers, so that you don't have to trust any small coalition of them. You can disperse shares of your secret, and then retrieve them later.

So far, we have used secret sharing as the lynchpin of an MPC protocol: you distribute shares, the computing parties do a few seconds or minutes of work to calculate a function of the data, and then either you or somebody else retrieves the result.

Our goal for today is to examine how we can maintain secret data for very long periods of time -- think: many years.

At that time scale, you might have a few concerns about a decentralized approach.

Any individual computer might leave the network. Over a long enough time period, there might be many accidental failures.
Any individual computer might get compromised by the adversary. Over a long enough time period, there might be many intentional failures.

At the moment, secret sharing rests on the assumption that fewer than $t$ of the $n$ servers are corrupted over the entire duration in between the initial sharing stage and the final reconstruction stage.

If that duration lasts for many years, then it may be possible that $t$ or more of them fail eventually.

Our goal for today is to construct new secret sharing schemes that provide correctness and secrecy as long as the adversary never corrupts $t$ or more servers simultaneously.

This idea is called proactive secret sharing.

23.2 Refreshing shares¶

Let's start by handling issue of server compromise, but not servers adding or leaving the network. So, here is the scenario:

There are $n$ servers, all of whom are long-lived.
We will split time into a collection of short epochs (think: an hour or a day).
The adversary has the ability to corrupt servers, but also the servers have the ability to run anti-virus software and recover from corruption.

Our assumption is that the adversary only has control of less than $t$ servers in each epoch. This is often called a mobile adversary.

(Note that we have no cryptographic means to enforce this assumption, and if it is violated then the adversary can recover the secret.)

In particular, the adversary might corrupt every computer at some point in time!

Our goal is to design a secret sharing scheme that protects the data forever, despite this strong adversary.

To accomplish this goal, we will design a way to rerandomize or refresh the shares $s_1, s_2, \ldots, s_n$ corresponding to a secret $x$.

Here is a picture that shows our goal. Suppose there are $N = 9$ servers, and the adversary can corrupt fewer than $T = 4$ of them in any time epoch. We want to refresh the servers' states so that they never hold on to one share for too long.

Refreshing secret shares

[Source: Shai Halevi]

In more detail, refreshing must do the following:

The servers should move to a new set of shares $s_1', s_2', \ldots, s_n'$ that also correspond to the secret $x$
Other than the fact that they reconstruct to the same secret, the old and new shares should be independent. In other words, the new shares should introduce fresh randomness.
The servers must do this without reconstructing the secret.
After refreshing, the servers must delete the old shares so that they are no longer available for an adversary to steal.

Question. How can the servers do this?

Using correlated randomness¶

Since all of our secret sharing schemes support MPC, one way to accomplish this goal is to add a fresh sharing of the number 0.

That is, suppose that the servers magically have shares $z_1, z_2, \ldots z_n$ that are random subject to the fact that they reconstruct to 0. (These numbers have nothing to do with the shares of the original secret.) We call this correlated randomness.

With correlated randomness, each party can refresh to a new share $s_i' = s_i + z_i$, and then delete the old share $s_i$.

Concretely, let's consider what happens with additive secret sharing, in which the original secret $x$ was split into shares $ x = \sum s_i \bmod{p}$.

Suppose we magically have independently random shares $z_i$ such that $z_1 + z_2 + \cdots z_n = 0 \bmod{p}$.

If each party adds its share of $x$ to its share of $0$ to create $s_i' = s_i + z_i$, the result happens to be a sharing of the secret:

$$ \sum (s_i + z_i) = \sum s_i + \sum z_i = x + 0 = x.$$

Moreover:

The servers have refreshed their shares without ever learning $x$.
Even if an adversary compromises $n - 1$ of the original shares $s_1, \ldots s_n$ and also $n - 1$ of the new shares $s_1', \ldots, s_n'$, the adversary still learns nothing about $x$!

(To formalize the last part: the adversary has a system of 2 linear equations $x = \sum s_i$ and $x = \sum s_i'$ with 3 unknown variables -- the missing old share, the missing new share, and the secret $x$. For any possible choice of the secret $x$, there exist possible values that the missing shares could be in order to make the equations consistent. So the secret could be anything.)

The same idea works with Shamir secret sharing. If we add the evaluations of:

A polynomial $P$ with whose y-intercept is the secret $x$ and
A polynomial $Q$ with whose y-intercept is 0,

then the polynomial $P + Q$ has a y-intercept of $x$. So it is a polynomial with random coefficients, subject to the fact that the secret remains the same.

Constructing correlated randomness¶

So far, we have assumed that the servers magically have at their disposal a sharing $z_1, \ldots, z_n$ of the number 0.

Question. How do the parties construct this?

For now, let's consider additive secret sharing with semi-honest servers. How can the servers generate a collection of random numbers such that (a) each party only knows one of them, yet (b) they are all convinced that the sum is 0?

Here is an approach to generate correlated randomness.

Have each server generate $n$ random numbers, and send one random number to each server. That is: server $i$ generates random numbers $r_{i, 1}, r_{i, 2}, \ldots, r_{i,n}$ and sends $r_{i, j}$ to server $j$.
Each party takes the sum of all shares sent minus all shares received. Call this value $z_i$. That is:

$$ z_i = \sum_j r_{i, j} - \sum_j r_{j, i}.$$

Because every random number was sent by one server and received by one server: if we take the sum of all $z_i$ shares then all terms cancel. As a result, the $z_i$ shares must add up to 0.

Furthermore, each $z_i$ is the sum of a collection of random numbers, so it is itself random.

This approach is simple and elegant, yet also very powerful. It can be:

Amended to work with replicated secret sharing and Shamir secret sharing as well.
Extended to handle malicious adversaries using 'standard' MPC techniques like we learned about in Lecture 13.

23.3 Changing servers¶

Remember that over a long period of time, we had two concerns about secret sharing.

Any individual computer might leave the network. Over a long enough time period, there might be many accidental failures.
Any individual computer might get compromised by the adversary. Over a long enough time period, there might be many intentional failures.

So far, we have only handled the second issue. Assuming that the set of servers are the same forever, we showed how to refresh shares in order to withstand less than $t$ server compromises in each time epoch.

Question. What if we want to change (partially or completely) the set of servers who participate in the protocol over time?

That is: at each time epoch $i$, there is a publicly-known committee $C_i$ of servers. Our assumption remains the same -- that fewer than $t$ servers are corrupted in each epoch. How can the servers store data and then migrate fresh shares to the next one?

Refreshing secret shares, possibly to a different committee

[Source: Shai Halevi]

Unfortunately, there are two ideas that won't work:

Have the original servers send their shares $s_i$ to the new servers, and then tell the new servers to refresh them.
Have the original servers refresh the shares to create new $s_i'$, then send them to the new servers.

The problem here is that the adversary can corrupt $t-1$ old servers and $t-1$ new servers, and can learn enough of the old (or new) shares between them to reconstruct the secret.

The solution is to perform the acts of 'refreshing' and 'transmitting to the new committee' together!

Let's discuss the new approach concretely in the case of additive secret sharing. (It extends naturally to support Shamir secret sharing as well.)

Each server in the old committee has one share $s_i$ such that the secret $x = \sum s_i \bmod{p}$.

We can ask server $i$ to treat $s_i$ as though it is an actual secret, and use the secret-sharing algorithm to build sub-shares $s_{i, 1}, s_{i, 2}, \ldots, s_{i, n}$ that reconstruct to $s_i$.

Then server $i$ can distribute one sub-share to each of the servers in the new committee.

Pictorially, what's happening here is that we are building a matrix of random numbers that add up to $x$.

$$\begin{bmatrix} s_{1, 1} & s_{1, 2} & \cdots & s_{1, n} \\ s_{2, 1} & s_{2, 2} & \cdots & s_{2, n} \\ \vdots & & \ddots & \\ s_{n, 1} & s_{n, 2} & \cdots & s_{n, n} \\ \end{bmatrix}$$

The original servers used $s_i$ to build one row of this matrix. The new committee receives one column of this matrix. New server $j$ can add the sub-shares in column $j$ to create a new share $s_j'$.

Clearly, calculating the sum of all entries in the matrix row-wise (i.e., the sum of the old shares) is the same as adding the entries column-wise (i.e., the sum of the new shares).

This approach suffices to migrate the data to a new committee. And we can repeat this process over and over for as many time epochs as we want.

If we can find a committee of size $n$ such that we believe that fewer than $t$ of them will be corrupted at any moment in time, this protects the data forever!

Refreshing secret shares, possibly to a different committee

[Source: Shai Halevi]

And while we have only talked about semi-honest adversaries so far, this idea can be extended to withstand malicious adversaries too. Typically this is done using a generic zero-knowledge proof to show that each server acts honestly, although there are other possible approaches.

23.4 Scaling via the blockchain¶

[This section is based on the paper Can a Public Blockchain Keep a Secret?, by Fabrice Benhamouda, Craig Gentry, Sergey Gorbunov, Shai Halevi, Hugo Krawczyk, Chengyu Lin, Tal Rabin, and Leonid Reyzin.]

There is just one downside to the current approach: sending the entire matrix of subshares requires $O(n^2)$ communication in each epoch. This limits the scalability to large $n$.

For instance, if $n = 1$ million people want to participate in the protocol, then we will need to send 1 trillion messages in each epoch. That's going to be prohibitively costly to send over the Internet in each epoch.

(For what it's worth, commercial cloud providers charge around \$100 per terabyte of data sent over the Internet.)

As a philosophical matter: can a protocol really be called "decentralized" if we need to entrust our data to a small number of servers? That's just traditional cloud computing.

Here is our new (and final) problem scenario:

There is a collection of $n$ servers, which can join or leave the network at any time (think: $n = 1$ million)
As before, we split time into a collection of short epochs (think: an hour, aka a few Bitcoin blocks)
The adversary can compromise up to $f < n/4$ servers in each epoch (think: $f = 249,999$)
To improve efficiency, we only want a small committee of $k$ servers to hold shares of the secret at any time (think: $k = 100$)

Question. Can we accomplish this?

Okay, to make the problem slightly easier: you are welcome to use a blockchain. Now can we accomplish this?

At first glance, it may appear as though the answer to this question is no, since the committee size is small enough that the adversary can corrupt all of them.

But amazingly: the answer is yes. There are two key insights.

First insight¶

First, we should use a cryptographic sortition, as with Algorand. Remember that a sortition uses a verifiable random function (VRF) for a committee to form organically by itself, without any communication required (yet).

A cryptographic sortition provides the guarantee of player replaceability: a committee member sends a single message, revealing its identity only after completing its job.

Or put more simply: you only speak once.

Now the adversary is stuck. Before the committee members send a message to out themselves, the adversary doesn't know who to corrupt. After a committee member performs its work and sends its one message, it has already deleted the relevant data so there is nothing useful for the adversary to steal.

Since the adversary has no idea who to corrupt, the best it can do is to corrupt servers at random. So even though we allowed the adversary to corrupt a quarter of a million servers, still out of the 100-person committee the adversary likely corrupts fewer than 25 of them.

Cryptographic sortition solves the problem of committee creation, but doesn't give us a way to refresh and migrate the shares from one committee to the next.

That is: if committee $C_2$ is completely hidden by the sortition until it outs itself by sending data to $C_3$, then how does committee $C_1$ even know who to send data to?

Refreshing secret shares, possibly to a different committee

[Source: Shai Halevi]

Second insight¶

To solve this problem, we construct two committees per epoch. These are called the nominating and holding committees.

The holding committee will actually hold onto the secret shares of the secret. (More about them later.)

Refreshing secret shares, to a sub-committee that is nominated by a different committee

[Source: Can a Public Blockchain Keep a Secret?]

The nominating committee has one job: to create the holding committee. Each member of the nominating committee selects one member of the holding committee.

Honest servers should nominate another server at random. (But malicious servers would presumably nominate their colluding friends.)

The nominator must tell all members of the old holding committee $C_i$ how to communicate with the server $S_j$ that they have nominated to join the new holding committee $C_{i+1}$... but without outing the identity of the new holding committee member $S_j$ since they haven't spoken yet.

To do this, we use a public-key infrastructure and public key encryption.

We haven't spoken much about public key encryption in this class. For us, it suffices to think about it as a digital version of a lockbox, such that anyone can lock data inside the box but only the secret-key holder can open the box.

Informal description of a public key encryption scheme

Image source: idea-instructions.com

In full detail, the nominator does the following:

Selects a server $S_j$ to nominate for the holding committee.
Uses the PKI to find that server's public key $pj_j$.
Selects a one-time public/secret key pair $(sk', pk')$.
Posts $pk'$ and $Enc_{pk_j}(sk')$ to the blockchain.

Afterward, all $n$ servers in the network read the blockchain.

As a result, everyone knows how to communicate to the newly-nominated member of the holding committee: just encrypt data using this one-time key $pk'$.

At this point, the outgoing holding committee sends their share-refreshing messages to the new committee members. This uses up their one chance to speak.

Additionally, everyone tries to decrypt the encrypted blob. But only one server, namely $S_j$, will be able to do so. So only that member learns $sk'$ and therefore all of the messages sent to it.

And the holding committee member has done this without speaking! So they have conserved their right to speak once until the future time that they have to send their shares to the next holding committee.

Summary and extensions¶

In this way, you can store secret data on a public blockchain! Let's review what we have accomplished today.

You can protect data for a long period of time in a decentralized fashion.

Secret shares of your data bounce around the network, with different servers holding shares of your data at different points in time.

Each server only outs themselves as a share-holder at the precise moment that they send the share on to the next committee, by which point they've already deleted their share.

Refreshing secret shares, to a sub-committee that is nominated by a different committee

[Source: Can a Public Blockchain Keep a Secret?]

When you want to reconstruct the data, you'll need an honest majority so that you can distinguish the good shares from the bad ones.

But remember that the adversary has two ways to get onto the holding committee: either it gets onto the nominating committee (and then nominates itself for the holding committee) or it gets randomly selected by an honest nominator into the holding committee.

This is why we need $f < n/4$ malicious servers, because they each have two chances to get onto the holding committee and we need an honest majority. (Note: you can actually achieve a better bound, but I won't show the details today.)

Finally, we can go beyond storing secrets and actually compute over secrets using MPC.

Suppose that two or more people distribute their data in this way. If their data get sent to the same holding committee, then the committee can add their shares together to produce a share of the sum.

Multiplication is more complicated because it requires network communication. Since each holding committee member can only speak once, it can only contribute toward one multiplication. Nevertheless, over enough time epochs, you can calculate arbitrarily-complicated functions over secret data on the blockchain!

This topic is still an active ara of cutting-edge research. It goes by two (correlated but slightly different) names: YOSO MPC and Fluid MPC.