(Plain English) State of Distributed Consensus

a synopsis of PoW vs. PoS research

Your daily snapshot from our OnChainFX markets dashboard.

Happy Monday!

Before we dive in for the day, won’t you please help us out and take this quick three question survey? We want to know which features you’d like to see added to the site.

***

Curating Curators

Sometimes, you gotta curate the curators, especially on a short holiday week like this one. Fortunately, Dan Zuller (one of our top early community analysts-turned partner at crypto fund of funds, Vision Hill) came through with an epic, plain english synopsis of the state of distributed consensus in PoW and PoS systems.

Dan is criminally under-followed on crypto twitter. You’ll see why below. I’ve reformatted and lightly edited (for clarity) his epic three part mega-thread on crypto consensus systems…

(TBI Note: We’ll have a subscribers'-only post tomorrow, then a long holiday weekend reads on Wednesday. We’ll be off Thurs-Friday before kicking off the week of July 8 with some exciting new features and announcements.)

State of Distributed Consensus - Dan Zuller

What follows isn't for everyone.

Many people seek to broaden their knowledge of crypto, but often get frustrated when things get too technical. The result is a plateau in learning and in some cases, lost interest. Learn both sides of crypto, finance and technology, even if just for your own good. One can never know enough.

If you haven't already read Preethi’s piece on distributed consensus, I highly recommend you give it a read. It offers terrific coverage of the basics of distributed systems and their history, and is a fantastic starting point for this conversation.

The high-level key takeaways: distributed systems are about tradeoffs.

Every consensus algorithm can generally be broken down into a three-step process: 1) elect; 2) vote; 3) decide. There are various ways to execute this process.

Historically, in distributed systems engineering, we have not been able to reach consensus in an asynchronous and byzantine fault tolerant ("BFT") manner. Before we explain why that’s a big deal, those look like extremely scary terminologies, so let's break them down.

"Asynchronous"

Synchrony = simultaneous action/occurrence. Asynchrony = opposite of synchrony; absence or lack of concurrence in time. Synchronous systems assume a perfect network where nodes are organized & deliver messages within a defined time bound.

This doesn’t match reality because distributed systems lack a global clock. Instead, they need a way of determining the order/sequence of events happening across all computers in the network. There are ways to resolve this, most notably: 1) partial synchrony; 2) asynchrony.

Partial synchrony lies somewhere between synchrony and asynchrony. It introduces the ability to make certain time-bound assumptions, but limits their impact. You can reach consensus regardless of whether the time bounds are known.

With asynchrony, consensus is not reached in a fixed time (no fixed upper time bounds exist). It is assumed that a network may delay messages infinitely, duplicate them, or deliver them out of order. This is often closer to reality for real-world systems.

“Byzantine fault tolerance (BFT)”

For those unfamiliar with the Byzantine Generals' Problem, I recommend some light research on the topic in computer science. Byzantine failure inherently implies no restrictions, and makes no assumptions about the kind of behavior a node in a distributed network may pursue.

In a byzantine system, nodes have different incentives and can lie, coordinate, or act arbitrarily, and you cannot assume a simple majority is enough to reach consensus. @ercwl touches on this nicely in this thread.

The premise of a Byzantine fault tolerant system design is that the nodes in the network should not be dependent on the actions of a single node in any relevant way whatsoever.

Why is it so challenging to design asynchronous BFT networks?

Building a consensus algorithm that is both BFT and asynchronous has been incredibly challenging historically. Two primary concerns for reaching BFT + asynchronous consensus emerged: safety vs. liveness.

At a very basic level, “liveness” refers to a situation where a lack of consensus will cause a network partition (fork). As such, a fork-choice rule is usually implemented in these systems (more on that below). Safety, on the other hand, refers to halting a system until consensus is reached.

Enter bitcoin.

The brilliance of bitcoin is not really the underlying technology but the game theory that cracked the asynchronous + BFT challenge (favoring liveness).

In proof-of-work systems, all nodes don't have to communicate to achieve termination (e.g., agree on a final state) and transition to the next state. Rather, they agree on the *probability of the state being correct* as determined by the node that can solve the mathematical PoW puzzle the fastest.

Since bitcoin’s birth, developers have been pushing the boundaries of layer-1 innovation to understand how different tradeoffs might impact what can ultimately be achieved not just at the base layer, but also with any other use cases and apps built atop them.

PoW is arguably the most secure mechanism design known to date. For a use case to potentially capture trillions of dollars of value, high security is vital. But at what point can we draw the line between “extremely secure” & “secure enough”? What is “sufficiently secure”? And could proof-of-stake systems be sufficiently secure?

To understand the state of PoS innovation today, it's important to understand the limitations of PoW, whose security is a function of two scarce contributions: energy + time. PoW proponents argue the act of hashing in PoW provides unforgeable costliness, something that is not available in PoS systems.

But PoS proponents argue this view is limited as there’s another scarce contribution captured in PoS: the opportunity cost of capital. That is locked up (staked) capital is subject to the opportunity cost of investing elsewhere.

If we want to better understand whether PoS is “secure enough”, then we should learn about the attack vectors & some solutions modern PoS networks can use to combat them and how that may impact their ability to reach distributed consensus.

First up: Sybil attacks.

This term is tossed around pretty loosely on crypto twitter. This is a type of attack seen in p2p networks in which a node in a network operates multiple identities actively at once and undermines/subverts the authority in reputation systems by flooding the networks with duplicate votes.

At quick glance it's easy to think Sybil attacks aren't an issue with PoS systems. After all, staking is inherently something that takes 'stake' (e.g., skin in the game), which is what should prevent a Sybil possibility in the first place.

Assuming proper design, this may be the case, but in the absence of a true cost of staking or reputation system, Sybil attacks can and do occur. There is not necessarily always an external cost to staking (e.g., borrowing, forks, delegation).

Many view staking off a basis of an inherently limited resource, where as long as weight is proportional to that resource, it doesn't matter if you split it across a bunch of Sybils or have it in one entity.

This isn't always the case. EOS doesn't have this: block producers are elected by token holders, so Sybil attacks are combatted via a reputation system - they can be voted out if they are seen to be malicious in any way.

Sybil attacks can be combatted via (external) cost to stake or a reputation system. Some popular projects implementing the cost methods are Eth2.0, @meetCasperLabs, @SolanaLabs, @tezos, @dfinity, @NEARProtocol, @OasisLabs, @ThunderProtocol, @polkadotnetwork.

Next up we have the distributed denial of service (DDoS) attacks.

These are cyber attacks in which a perpetrator seeks to make a network resource unavailable to its intended users by temporarily or indefinitely disrupting services.

As I understand it, several methods can combat this starting with the introduction of unbiasable randomness (entropy) to the consensus election process. This takes the form of a verifiable random function (VRF).

(Bear with me... I know this is getting dense!)

In simplest terms, randomness serves as a defense mechanism against front-running the election process in distributed consensus. There are 3 common approaches to constructing VRFs: 1) a verifiable delay function (VDF); 2) a commit-reveal scheme; 3) BLS signatures.

A verifiable delay function (VDF) is a function that enables a time delay imposed on the output of some pseudo-random generator. VDFs are extremely computationally expensive functions that cannot be parallelized (an attractive security feature).

The delay prevents malicious actors from influencing the output of the pseudo-random generator, since all inputs will be finalized before anyone can finish computing the VDF. Ethereum 2.0, @meetCasperLabs, @SolanaLabs, & @tezos are some popular PoS projects using VRFs + VDFs.

In a commit-reveal scheme, a validator commits to a chosen value while keeping it hidden from others, and then reveals the committed value later after hashing. These schemes are designed so committed values cannot be changed & randomness cannot be manipulated. @Algorand & Ouroboros Genesis consensus (@CodaProtocol, @Cardano) are PoS networks using VRF + commit-reveal.

BLS Signatures are a form of “threshold” signatures whereby bilinear pairing (e.g., defined M of N multisig) is used for verification. Long post here.

That was a lot. Let's digest.

Randomness helps combat DDoS vulnerabilities because it challenges the ability for a perpetrator to front-run the election process in distributed consensus. We discussed a verifiable random function & three common approaches to construct it.

Another way to potentially combat against DDoS vulnerabilities is to have an identifiable, fixed committee of validators that are old-fashioned trusted as each one injects security deposit-based collateral into a network that can be slashed (lost) in the event of malicious behavior.

@cosmos, @tendermint_team, @EOS_io, @NEARProtocol, @ThunderProtocol, @polkadotnetwork & @terra_money are some examples of popular PoS projects that implement fixed committees with security deposit-based collateral.

The last two common attacks I'll cover here are stake grinding & range-bound nothing-at-stake attacks.

Stake-grinding is an attack in which an attacker manipulates the blockchain in a way to maximize the probability of being elected as the leader for the ensuing block(s).

An obvious example of a stake grinding attack would be when an attacker has a small amount of stake and goes through a blockchain's history to find places where his/her stake wins a block. In order to consecutively win, the attacker modifies the next block header until he/she wins again.

Stake grinding can generally be combatted *somewhat* similarly to how DDoS is combatted, with unbiased randomness implemented in the election process or fixed committees that have injected security deposit-based collateral.

Nothing-at-stake attacks are ones in which a validator in a committee votes (without penalty) for two blocks at the same block height. Simply put, the nothing-at-stake problem allows someone to misbehave - and get away with it.

There are generally two ways to combat nothing-at-stake attacks: 1) bond deposits (e.g., security deposit-based collateral that can be slashed in the event of malicious action), or 2) implement a fork-choice rule. This goes back to liveness vs. safety

Slashing is pretty straightforward – if one behaves badly, he/she is financially penalized. This is common with networks that favor safety over liveness (e.g., @cosmos, @tendermint_team, @EOS_io, @nearprotocol, @oasislabs, @polkadotnetwork, @tezos & @terra_money).

Fork-choice rules are generally more complex and are inherent in networks favoring liveness over safety. Some popular PoS projects favoring liveness are: ETH2.0, @meetCasperlabs, @SolanaLabs, @Algorand, @dfinity, @thunderprotocol & Ouroboros Genesis (@CodaProtocol, @Cardano).

To understand the premise behind the fork-choice rule, recall that in PoW networks, the probability of miners winning the next block is proportional to the percentage of total hashpower contributed.

Bitcoin combines fork-choice rule w/ the requirement that miners produce proofs of work (solving algorithmically complex math puzzles), which are costly to generate b/c each miner has limited (expensive to acquire) hashpower and anything improperly spent on the wrong chain is wasted.

This is why, in the event of network partitions (forks), little value is justified for minority chains by the market (the market makes a choice), as we have evidently seen (bitcoin vs. bitcoin cash is a famous example).

We’ve now covered ways to combat Sybil attacks, DDoS attacks, stake grinding attacks, nothing-at-stake attacks & network partitions, five commonly known attacks in blockchain land. You should notice, with the exception of bitcoin, all examples used were PoS. Why was that?

It’s simple. This is an effort to explore if a line can be drawn between “extremely secure” & “secure enough”. Can PoS networks be *secure enough* (vs. PoW) to still capture billions, if not trillions, in value depending on the use case?

Others like @spencernoon & @ariannasimpson are already questioning if security premiums can develop irrespective of monetary premiums. I’d love to see more work done here, though time will likely play a major factor in the ability to have an answer.

In this last part, I will share my key takeaways from this exercise & examine them in the context of a risk/reward framework.

First, we can often surprise ourselves with what we can accomplish if we set our minds to it. Prior to crypto, I had virtually no technical understanding of technology, computer science, or distributed systems (I was a PE/VC guy).

Everything I covered in this 3-part thread, I learned for free, thanks to the (quite special) open source ethos of the crypto community.

The tradeoff was time – a sacrifice I deemed worthwhile, because I had a strong desire to learn more engineering and share those learnings.

After this exercise, I believe I now have a basic foundational understanding of the technical side of this industry – something arguably everyone should possess if in crypto full time, especially if you’re managing investor money.

It boils down to risk management.. These are technological assets that are in their extremely early stages and are highly experimental in nature. Without understanding the technical picture, it’s not hard to chase high reward potential & greatly misprice the risks while at it.

This is the 1st key takeaway – the need for technical diligence. Understand design vulnerabilities to anticipate cracks that could potentially form in a network in the future. Understand what solutions can be implemented & whether such solutions are backward compatible.

The 2nd key takeaway is understanding *why* tradeoff decisions are made. Developers ultimately want to build atop a protocol they can trust. Self-sovereign or not, it’s important to know what these builders are attracted to. They’re shaping the direction of our future after all.

The same can be said for investors. Identify the patterns/trends that are forming as a result of alignment between investors and engineers. The stronger and more widespread the alignment, the greater the systemic de-risking of a given network.

Cryptonetworks are a social coordination game. The greater the social endorsement, the larger the “stickiness”, and the higher the justified network value.

A 3rd takeaway: security is a spectrum. Just as too much working capital can be inefficient for a company (because it means the amount of money available within the company is much more than what's needed for operations), too much security can be inefficient as well.

It’s simple. This is an effort to explore if a line can be drawn between “extremely secure” & “secure enough”. Can PoS networks be *secure enough* (vs. PoW) to still capture billions, if not trillions, in value depending on the use case?

I question whether a security premium can develop irrespective of a monetary premium for base layer networks. A counter to this is that all base layer protocols are competing to be money. But not all data or information is monetary or financial (eg, identity, med records).

I believe one of the most significant experiments that will take place over the next decade will be whether decentralized information networks can accrue substantial value without necessarily representing financial assets.

Time will tell (though my guess is yes 😉).

-Dan

P.S. Share. Subscribe. Spread the love. Tweet at me or Messari for requests, feedback, comments, or questions.


Best of the Boards

This week we are highlighting some of the best user generated boards from the past week. Build your own, tweet it, and tag us us for a chance to be featured next week!

CryptoTwitters Brightest - @janpaul_f

Twitter is a great resource for learning about and keeping track of crypto. Cut through the noise with this board that tracks some of the most prominent and thoughtful “Crypto Twitter” accounts.

Championing Crypto - @mwill_crypto

The secret to a good board is keeping it constantly updated. @mwill_crypto does just that with Championing Crypto - a collection of stories that bolster mindshare around Bitcoin and the crypto space as a whole.

TA Profiles - @itsexactlythis

The speculative nature of cryptoassets makes the space a hotbed for technical analysis. Keep track of TA focused twitter personalities with this board.

Did I miss something?

Send me the link, your twitter handle and your best imitation compression algorithm write up. If I like it, I’ll include your bit next issue (with attribution).

Should your colleagues read daily? We now offer discounts for corporate access. Email us, and we’ll onboard your whole team.