-
Notifications
You must be signed in to change notification settings - Fork 133
fix(l2): use checkpoints to persist previous batch state #5037
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR fixes a critical issue where non-Validium L2s with batches containing more than 131 blocks stopped working after the path-based feature was introduced. The solution implements a checkpoint system to maintain state availability during batch preparation and witness generation, as the path-based feature only keeps the last 128 blocks' state in memory.
Key Changes
- Added checkpoint creation functionality to storage engines for state preservation
- Implemented checkpoint-based batch preparation to ensure state availability for re-execution
- Added state regeneration logic to rebuild state from checkpoints after node restarts
Reviewed Changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| crates/storage/store_db/rocksdb.rs | Added RocksDB checkpoint creation implementation |
| crates/storage/store_db/in_memory.rs | Added no-op checkpoint support for in-memory store |
| crates/storage/store.rs | Refactored account updates to support checkpoint operations |
| crates/storage/api.rs | Added checkpoint creation method to StoreEngine trait |
| crates/l2/sequencer/mod.rs | Updated start_l2 to accept checkpoint parameters |
| crates/l2/sequencer/l1_committer.rs | Implemented checkpoint-based batch preparation and witness generation |
| crates/l2/sequencer/errors.rs | Enhanced error messages and added checkpoint-related errors |
| crates/l2/Cargo.toml | Added rocksdb feature flag |
| crates/blockchain/blockchain.rs | Modified witness generation to store intermediate state and accumulate witnesses |
| cmd/ethrex/l2/initializers.rs | Added checkpoint initialization logic |
| cmd/ethrex/Cargo.toml | Propagated rocksdb feature to l2 crate |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Copilot <[email protected]>
Lines of code reportTotal lines added: Detailed view |
crates/blockchain/blockchain.rs
Outdated
| .get_block_header(block.header.number)? | ||
| .is_none() | ||
| { | ||
| apply_fork_choice(&self.storage, block.hash(), block.hash(), block.hash()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we do the calls by hash instead? I think this could be dangerous. This function is called by the debug_executionWitness endpoint. Let's imagine we receive a request, fetch some blocks from the db and call generate_witness_for_blocks. If a reorg occurs while we are generating the witness, we might retrieve a different header or maybe apply a forkchoice to invalid blocks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, I'll test using the hashes since they're available.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it'd be better to tackle this in another PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
crates/l2/sequencer/l1_committer.rs
Outdated
| // There's no specific reason to prefer a blockchain instance | ||
| // here, we just need one to create the EVM. We just use the | ||
| // current checkpoint blockchain just in case. | ||
| // The important thing here is to use the correct vm_db, which | ||
| // must be created from the checkpoint store. | ||
| let mut vm = one_time_checkpoint_blockchain.new_evm(vm_db).await?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to fetch the feeConfig used for that block from the rollup_store. So this should instead be:
| // There's no specific reason to prefer a blockchain instance | |
| // here, we just need one to create the EVM. We just use the | |
| // current checkpoint blockchain just in case. | |
| // The important thing here is to use the correct vm_db, which | |
| // must be created from the checkpoint store. | |
| let mut vm = one_time_checkpoint_blockchain.new_evm(vm_db).await?; | |
| let fee_config = self | |
| .rollup_store | |
| .get_fee_config_by_block(block_to_commit_number) | |
| .await? | |
| .ok_or(CommitterError::FailedToGetInformationFromStorage( | |
| "Failed to get fee config for re-execution".to_owned(), | |
| ))?; | |
| let mut vm = Evm::new_for_l2(vm_db, fee_config)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| /// 2. Initializes a new store and blockchain for the checkpoint. | ||
| /// 3. Regenerates the head state in the checkpoint store. | ||
| /// 4. Validates that the checkpoint store's head block number and latest block match those of the original store. | ||
| async fn create_checkpoint( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this reuse the one in initializers.rs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd do it the other way around (reusing this one in initializers.rs), though, I kept it this way in case we need to update some logic specific to the committer.
Motivation
L2s with batches with more than 131 blocks (with at least one tx in the batch) stopped working correctly after the path-based feature was introduced.
Description
Extends
ethrex-storageAPI with acreate_snapshotfunction, which creates a checkpoint of the DB at a provided path, and it is used in the following manner:Future work
Remove checkpoints after the batch they served is verified in the L1.