Back in 2017, Ethereum opinion leader Vitalik introduced the concept of Stateless clients at The Ethereum developer conference(Devcon), with the intention of solving the problem of growing on-chain data faced by public chains by reducing on-chain state storage. The concept was conceived as an ideal solution for blockchains to get rid of state bloat, and over the past few years, through the continuous in-depth research of Alexey Akhunov and Igor Mandrigin and others, the Stateless concept has become more fruitful. The purpose of this article is to provide a preliminary interpretation of the latest results of the research on the concepts of “stateless”, “State Expiry” and “state cleanup” in the public chain domain.

[PlatON Tech-Column]A detailed explanation of the Stateless concept (II)

In the previous section we looked at the Stateless concept and a range of solutions to the problem of ‘state bloat’ such as ‘state cleanup’ and ‘state expiration’, and in this section, we will continue to explore how these concepts can be applied in engineering practice.

For most blockchains, there are common user accounts (external accounts) and contract accounts. For external accounts, the state expiration scheme mentioned in the previous section is well understood and easy to implement, but for those data stored in the storage of smart contracts, how to design an expiration mechanism? As the number of storage slots in a contract is unlimited, users can store unlimited data in storage, which requires consideration of.

Choosing to restrict users to unlimited storage — complex economic schemes need to be designed to pass on state rents to users.

Choosing to modify the existing storage model — putting the contract’s storage into a new contract created by, for example, the create2 command to manage state, which can lead to ‘resurrection conflicts’ as state may not be on the chain (see below).

When the state expires or is cleaned up, do we mark the expired keys as “expired” on the original trie or do we move all expired keys from the original tree to a separate tree?

The advantage of marking expiry is that it works pretty much the same way as it does now, and it is relatively simple to mark ‘expiry’, but the problem is that ‘marking’ requires the node to store additional ‘intermediate info’, which doesn’t work well with the Verkle trie, and in addition to the Merkle proof, the expired node has to be proved by an additional Witness.

The two-tree solution does not require additional tokens for each node, does not require additional evidence for the Merkle proof, and does not have ‘resurrection conflicts’. However, it has the problem of changing the current protocol too much and requires an additional procedure to make the data expire.

If A creates some contracts with create2, but the state expires, and then A creates contracts with create2 later, so that the new contracts conflict with the old ones, a so-called ‘resurrection conflict’, what should be done?

Account merge — the ETH from the old contract is added to the new contract, the storage and code are merged with the new contract, and even the code of the old contract can specify the logic of the conflict merge.
Suppress resurrection — adjust the create2 opcode implementation so that it creates different account addresses for different periods (e.g. through the mechanism of ‘year’ regensis, different ‘years’ will create different addresses).
Add an identification bit to the account — only one contract can be created in the same location, duplicate creation will be rejected.
Require new create2 to be self-attesting when initiated — similar to adding an identifier bit, except that the identifier is added to an account-independent location.

The user sending the transaction or the miner proposing the block faces the problem of getting the state data needed to generate the Witness, the simple idea is that the node sending the transaction or issuing the block keeps the state data for a period of time (e.g. one year). This behaviour is voluntary and dependent on the node’s own configuration, so if we want the miner who sends the block to be reliable, we have to make it provide proof of custody.

But a consensus layer that does not know which data is valid and which is ‘invalid’ means that the Gas overhead of accessing new data cannot be distinguished from that of accessing old data, which means that

Pricing rules cannot be per instruction, but need to combine instructions and data status, and the Gas cost of accessing valid data is more than the invalid ones
If the transactions in the block are all transactions accessing lapsed data, the witness data can be very large

Therefore, to avoid the above drawbacks, the state of each account must be tracked at the consensus level, including account space that is not used. This further illustrates that achieving ‘stateless vs. stateful expiration’ is a complex trade-off process.

Witnessing the data

Whichever solution to the ‘state bloat’ problem eventually proves successful, one problem that must be solved is to ensure that making witness data Witness available is both easy to generate and does not consume unlimited bandwidth.

Alexey Akhunov, who has been working on Stateless, has made an analysis based on Ether[2] that without any changes to the Ether protocol, the size of witness data would be at the MB level. The current Ethernet is calculated at a gas limit of 12.5M, and the block data is only about 45KB, which is much larger than the block data. Therefore, the research direction of Stateless in recent years has been to reduce the witness data as the mainstream.

EVMCode optimization

According to Alexey Akhunov and the turbo-geth team, assuming a Stateless client based on the current ethereum data model, the contract code would be the second largest overhead of the network (or witness data), and would become the largest after possible optimisation of the storage results (MPT->Binary). Thus, it seems imperative to optimise EVMCode.

Sina Mahmoodi’s solution is to split the entire contract code into several basic blocks, with each transaction providing only the small blocks and witness data used by the exchange.

Meaning of basic blocks.

It can be understood as a straight line or sequential code with no branches in except for the entry and no branches out except for the exit
Each block either starts at index 0 or at jumpdest, and Stateless clients can safely perform jumpdest parsing
For each block, execution either reaches the end or aborts due to lack of Gas

Eventually, all basic blocks are added to a trie and a root is calculated.

It can be shown that the code executed for a transaction is on the trie where root is located and that root is the root of the executed contract, reducing the size of the EVMCode’s Witness data in Stateless client transactions in this way.

This optimization of EVMCode is also applicable to EVM2 (eWASM)

According to Sina Mahmoodi’s research [3], this approach can reduce the size of the EVMCode by 40–60%.

Data storage structure optimisation

A very important feature of Merkle trees is that it is possible to verify the legitimacy of a Root without all the data in the tree by means of the hash values of all the dependencies on a certain path, but when we do go to each hash on a path in this way, we find that there are too many paths (associated nodes) associated with proving this Root, which makes providing witness data ( Witness) is very difficult and does not compress the witness data well.

Therefore, the idea of replacing Merkle with a binary trie was proposed, for binary trees where at most one branch is replaced by a hash, to cut the Merkle tree more adequately, which works well for Witness reduction.

Assuming that only one account Acc1 has transactions in a block and the path is 3B (0011 1011), then.

The binomial tree given by Witness requires 7 branches and 1 account node for a total of 8*2=16 elements

A hexadecimal tree calculating Witness requires 1 branch and 1 account node, for a total of 2*16=32 elements

According to a practical test by Mandrigin [4], Witness saves 49% in size using a binomial tree compared to a hexadecimal tree, an improvement that is not as significant as in the theoretical best case. Probably due to the real data distribution in the main network block, the proposal has a lot of room for optimization.

The binomial tree reduces witness data, but does not control the size of Witness, and more seriously repricing instructions based on the binomial tree is very difficult.

A new KZG polynomial commitment scheme was then proposed, KZG is a trio of Kate , Zaverucha and Goldberg’s solution to the generic zero-knowledge proof scheme PLONK (Permutations over Lagrange-bases for Oecumenical Noninteractive arguments of Knowledge) implementation, whose main principle is the ability to commit f(i) = vi to a polynomial f(x) via a set of elliptic curve elements C = [f(s)]1 and to open this commitment at any point z, giving the verifier the value y = f(z) and a set element π=[(f(s)-y)/(s-z)]1 , and the correctness of this proof value y can be checked by a pairing equation (pairing equation).

The so-called Verkle Trie is a combination of vector and Merkle, using a set of vectors (vectors) v0,v1,… ,vd-1 as input and produces a commitment C, and C can be verified with any of these values. For example for a Merkle tree can be thought of as a vector commitment, where the value of each leaf i can be proven by log 16 hashes.

Dan believes that Verkle Trie will be the best solution to the state inflation problem in the next 5 years.

PlatON Stateless mechanism in practice

PlatON has been working on the problem of ‘state inflation’ for a long time and has been the first to explore a new path in the ‘stateless’ concept by dividing nodes into ‘archive nodes’ and ‘normal nodes’.

Archived nodes” are nodes that keep all state data, while “normal nodes” clean up the “historical state” of data and transaction logs in stages. During the operation of the network, nodes participating in the network consensus do not necessarily need to be archived nodes, which means that nodes that ensure the stable operation of the network can have no historical state, thus free from the influence of ‘state inflation’.

In addition, all economically relevant state data (e.g. staking, delegating, etc.) on PlatON is stored in a snapshot database called snapshotdb, and every time a new block is created, the snapshot database will calculate a hash value corresponding to the current block by combining all historical data with the latest data, and finally place the hash in the on-chain database (statedb), which can be seen as a practice of the above Re-genesis scheme.