Skip to content

Conversation

@DarianShawn
Copy link
Collaborator

Description

Mostly come from geth PR 20152:

This PR creates a secondary data structure for storing the Dogechain state, called a snapshot. This snapshot is special as it dynamically follows the chain:

  • At the very bottom, the snapshot consists of a disk layer, which is essentially a semi-recent full flat dump of the account and storage contents. This is stored in LevelDB as a <hash> -> <account> mapping for the account trie and <account-hash><slot-hash> -> <slot-value> mapping for the storage tries. The layout permits fast iteration over the accounts and storage, which will be used for a new sync algorithm (not done yet).
  • Above the disk layer there is a tree of in-memory diff layers that each represent one block's worth of state mutations. Every time a new block is processed, it is linked on top of the existing diff tree, and the bottom layers flattened together to keep the maximum tree depth reasonable. At the very bottom, the first diff layer acts as an accumulator which only gets flattened into the disk layer when it outgrows it's memory allowance. This is done mostly to avoid thrashing LevelDB.

The snapshot can be built fully online, during the live operation of a Dogechain node. This is harder than it seems because rebuilding the snapshot for mainnet takes days, during which the in-memory garbage collection long deletes the state needed for a single capture. So we'll have to provide the first canonical initialized snapshot, in order to make the latter things simpler and easier.

  • The PR achieves this by gradually iterating the state tries and maintaining a marker to the account/storage slot position until which the snapshot was already generated. Every time a new block is executed, state mutations prior to the marker get applied directly (the ones afterwards get discarded) and the snapshot builder switches to iterating the new root hash.
  • There shouldn't be any reorgs, but validator still need to accept new block of a same block height. To achieve this, the builder operates on HEAD-128 and is capable of suspending/resuming if a state is missing (a restart will only write out some tries, not all cached in memory).

The benefit of the snapshot is that it acts as an acceleration structure for state accesses:

  • Instead of doing O(log N) disk reads (+leveldb overhead) to access an account / storage slot, the snapshot can provide direct, O(1) access time. This should be a small improvement in block processing and a huge improvement in eth_call evaluations.
  • The snapshot supports account and storage iteration at O(1) complexity per entry + sequential disk access, which should enable remote nodes to retrieve state data significantly cheaper than before (the sort order is the state trie leaf order, so responses can directly be assembled into tries too).
  • The presence of the snapshot can also enable more exotic use cases such as deleting and rebuilding the entire state trie (guerilla pruning) as well as building alternative state trie (e.g. binary vs. hexary), which might be needed in the future.

The downside of the snapshot is that the raw account and storage data is essentially duplicated. In the case of mainnet, this means an extra 8-12GB of SSD space used (estimate data, not done yet).

Changes include

  • Bugfix (non-breaking change that solves an issue)
  • New feature (non-breaking change that adds functionality)

Testing

  • I have tested this code with the official test suite
  • I have tested this code manually

Manual tests

Backward compatibility

  • Start up 4-validator network with 1 new version node, and 3 target version nodes.
  • Send several transactions including multical contract transactions, too.

It works as expected, and block execution of the newer version is a little faster than the target version.

Snapshot generation

  • Upgrade a full node of devnet to current version.
  • Try these methods:
    • Enable snapshot with already up-to-day database when it starts up.
    • Use a block recovery file to start up.
    • Begin syncing from the genesis block when start up.

All works as expected, the generation done with almost same size. And the fastest one is "using a block recovery file".

Snapshot regeneration

  • Upgrade a validator of devnet to be the snapshot initialized one.
  • Disable snapshot feature (not enable by default, and synchronize mode by default) for 10-30 minutes.
  • Restart node with snapshot feature enable.
  • Stop and restart the node for several times during snapshot regeneration.

The regeneration works fine, and will resume if it restart. The regeneration only take minutes compare with first initialization.

Documentation update

Will update the cli documentation once the version bumped.

@DarianShawn DarianShawn added feature New update to Dogechain bug fix Functionality that fixes a bug help-wanted I need technical help labels Mar 20, 2023
@DarianShawn DarianShawn added this to the Release 1.3.0 milestone Mar 20, 2023
@DarianShawn DarianShawn self-assigned this Mar 20, 2023
@DarianShawn DarianShawn requested a review from 0xcb9ff9 as a code owner March 20, 2023 12:25
@DarianShawn
Copy link
Collaborator Author

The snapshot is half done without syncing protocol.
And syncing tests on Mainnet found out even worser performance.
So I'll just close this PR since we're adopting another repo for the next version.

@DarianShawn DarianShawn closed this Apr 7, 2023
@github-actions github-actions bot locked and limited conversation to collaborators Apr 7, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

bug fix Functionality that fixes a bug feature New update to Dogechain help-wanted I need technical help

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants