Let’s talk about data availability
Originally published on the Polygon DAO blog
First.. what’s data availability?
According to ethereum.org, data availability (DA) is “the guarantee that the block proposer published all transaction data for a block and that the transaction data is available to other network participants.”
But how exactly do you guarantee data is available?
For most Layer 1s (L1s), it’s pretty straightforward. L1 nodes know transaction data is available by downloading and executing it themselves. This is how nodes verify blocks and is at the core of how blockchains work.
Layer 2s (L2s) change the paradigm. L2s (specifically rollups) use fancy cryptographic proofs to guarantee blocks are valid without nodes having to execute every transaction. This unlocks massive benefits and new L1 designs!
But not so fast. Rollups still need data to be available, just for different reasons.
So how do we scale? Seems like we are back to where we started..
DA layers
Introducing DA layers!
DA layers specialize in, as you might expect, assuring nodes that data is available. This can take different forms, including:
DA blockchains
DA committees
DA middleware
Data sharding
We’re only going to discuss the first two, but here are a couple of resources if you want to learn about DA middleware and data sharding.
DA blockchains vs. DA committees
Because it’s still very expensive to post data on Ethereum, most rollup teams are posting their data off-chain. This design technically classifies them as validiums.
Ethereum’s data sharding roadmap solves the problem and enables cheap rollup data, but to be safe let’s assume we’re a year away from the first major upgrade. In the meantime, rollup teams have two major options: DA committees and DA blockchains.
DA committees are selected entities that hold off-chain copies of the transaction data and promise to make it available in case of emergency. These committees often have 7-10 members and are a slight improvement over fully relying on the rollup operator.
DA blockchains take the idea a few steps further by replacing small, permissioned committees with large, permissionless committees that have strong economic incentives to behave.
DA layers vs. data storage layers
A common mistake is thinking that data availability = data storage. However, this is not the case.
An easy way to think about the difference is on a time dimension.
DA layers make sure nodes can access data on a short time horizon. Their main goal is to smoothly progress blockchain state and they typically do not make assurances about longer time horizons. As ethereum.org puts it, “data availability is relevant when a block is yet to pass consensus”.
In fact, DA layers might even discard the data after a few weeks. In Ethereum’s next major upgrade, this data will be pruned after ~2 weeks.
Data storage layers make sure data is available on a longer time horizon and are closer to the cloud storage solutions most web2 developers are familiar with. Of course, it’s not hard to imagine web3 developers opting for decentralized versions like Arweave.
DA layer use cases
There are many things that can be built on top of DA layers. Let’s touch on three:
As we mentioned earlier, validiums are common today. Even after Ethereum has implemented its own sharded DA layer, it’s likely that rollup teams will still use off-chain data to reduce costs. Developers have historically always pushed the boundaries of what’s possible.
Sovereign rollups not only use DA layers for data availability, but also for consensus. Applications are likely good candidates to become sovereign rollups (rather than smart contract rollups or validiums) if they need full control over state transitions yet don’t want to worry about a validator set.
In his recent talk, Balaji Srinivasan envisions a future where “fiat information” competes with “crypto information”. He describes “reliable data feeds” using crypto oracles like Chainlink, where IRL metadata is posted on chain. That data could be posted onto DA layers.
DA layer endgames
It’s early days for DA layers. Polygon Avail, EigenDA and Celestia are all still in testnet and Ethereum data sharding is 1-3 years away, depending on the upgrade in question.
However, there’s plenty to look forward to. Let’s highlight what seems to be a common endgame across the board. Most teams envision something like this:
Progressively increasing block sizes and sharding them across the network.
Relieving nodes of downloading full blocks using KZG commitments.
Maintaining low verification costs with data availability sampling.
Eventually we get to a place where DA layers enable high throughput applications while trust-minimized light clients verify on mobile devices.
That’s right - performance and decentralization!
Wrapping up
Hopefully this article helped you gain more familiarity with data availability. The goal was to offer a broad overview and address common misperceptions around the topic.
There are many deep dives on how it works, so if you want to jump down the rabbit hole, here are some resources: