-
Notifications
You must be signed in to change notification settings - Fork 131
fix(l2): checkpoint creation #5321
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| /// Blockchain instance using the current checkpoint store. | ||
| /// | ||
| /// It is used for witness generation. | ||
| current_checkpoint_blockchain: Arc<Blockchain>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was unused
Lines of code reportTotal lines added: Detailed view |
| // We need to guarantee that the checkpoint path is new | ||
| // to avoid causing a lock error under rocksdb feature. | ||
| let new_checkpoint_path = self | ||
| .checkpoints_dir | ||
| .join(batch_checkpoint_name(batch_number)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Creating the batch checkpoint at the beginning of the execution prevents us from recovering from errors, as we can't create a checkpoint on an already used path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR addresses critical atomicity and checkpoint creation issues in the L2 sequencer, ensuring that batch sealing, prover input storage, and checkpoint creation happen atomically to prevent invalid committer states during node restarts.
Key Changes
- Introduced
seal_batch_with_prover_input()method to atomically store batches and prover inputs in a single database transaction - Added
get_store_directory()method to enable checkpoint path validation - Implemented
check_current_checkpoint()to verify and regenerate missing checkpoints by re-executing batch blocks
Reviewed Changes
Copilot reviewed 10 out of 11 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
crates/storage/store_db/rocksdb.rs |
Added get_store_directory() to return the RocksDB path |
crates/storage/store_db/in_memory.rs |
Added get_store_directory() returning a placeholder path for in-memory store |
crates/storage/store.rs |
Exposed get_store_directory() through the Store API |
crates/storage/api.rs |
Added get_store_directory() to the StoreEngine trait |
crates/l2/storage/src/store_db/sql.rs |
Implemented atomic seal_batch_with_prover_input() using database transactions and extracted seal_batch_in_tx() helper |
crates/l2/storage/src/store_db/in_memory.rs |
Implemented seal_batch_with_prover_input() with sequential operations (acceptable for in-memory) |
crates/l2/storage/src/store.rs |
Exposed seal_batch_with_prover_input() with documentation |
crates/l2/storage/src/api.rs |
Added seal_batch_with_prover_input() to StoreEngineRollup trait |
crates/l2/sequencer/utils.rs |
Refactored fetch_blocks_with_respective_fee_configs() to accept Batch reference instead of batch number, improving error messages |
crates/l2/sequencer/l1_committer.rs |
Refactored batch production workflow: checkpoint creation moved after atomic batch/prover input storage, added checkpoint validation logic, extracted helper methods for one-time checkpoint management |
Cargo.lock |
Updated dependencies to use standard crates.io versions instead of patched versions |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| let batch_prover_input = self.generate_batch_prover_input(&batch).await?; | ||
|
|
||
| self.rollup_store | ||
| .seal_batch_with_prover_input(batch.clone(), &self.git_commit_hash, batch_prover_input) | ||
| .await?; | ||
|
|
||
| // Create the next checkpoint from the one-time checkpoint used | ||
| let new_checkpoint_path = self | ||
| .checkpoints_dir | ||
| .join(batch_checkpoint_name(batch_number)); | ||
| let (new_checkpoint_store, _) = self | ||
| .create_checkpoint( | ||
| &one_time_checkpoint_store, | ||
| &new_checkpoint_path, | ||
| &self.rollup_store, | ||
| ) | ||
| .await?; | ||
|
|
Copilot
AI
Nov 14, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resource leak: If generate_batch_prover_input, seal_batch_with_prover_input, or create_checkpoint fail, the one-time checkpoint at one_time_checkpoint_path is not cleaned up. Consider wrapping this section with error handling similar to lines 545-547 to ensure cleanup on failure.
| let batch_prover_input = self.generate_batch_prover_input(&batch).await?; | |
| self.rollup_store | |
| .seal_batch_with_prover_input(batch.clone(), &self.git_commit_hash, batch_prover_input) | |
| .await?; | |
| // Create the next checkpoint from the one-time checkpoint used | |
| let new_checkpoint_path = self | |
| .checkpoints_dir | |
| .join(batch_checkpoint_name(batch_number)); | |
| let (new_checkpoint_store, _) = self | |
| .create_checkpoint( | |
| &one_time_checkpoint_store, | |
| &new_checkpoint_path, | |
| &self.rollup_store, | |
| ) | |
| .await?; | |
| // Wrap the fallible section in error handling to ensure cleanup on failure | |
| let result = async { | |
| let batch_prover_input = self.generate_batch_prover_input(&batch).await?; | |
| self.rollup_store | |
| .seal_batch_with_prover_input(batch.clone(), &self.git_commit_hash, batch_prover_input) | |
| .await?; | |
| // Create the next checkpoint from the one-time checkpoint used | |
| let new_checkpoint_path = self | |
| .checkpoints_dir | |
| .join(batch_checkpoint_name(batch_number)); | |
| let (new_checkpoint_store, _) = self | |
| .create_checkpoint( | |
| &one_time_checkpoint_store, | |
| &new_checkpoint_path, | |
| &self.rollup_store, | |
| ) | |
| .await?; | |
| Ok(new_checkpoint_store) | |
| }.await; | |
| let new_checkpoint_store = match result { | |
| Ok(store) => store, | |
| Err(e) => { | |
| // Cleanup the one-time checkpoint on error | |
| let _ = self.remove_one_time_checkpoint(&one_time_checkpoint_path); | |
| return Err(e.into()); | |
| } | |
| }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with doing something like this
| &new_checkpoint_path, | ||
| &self.rollup_store, | ||
| ) | ||
| .await?; |
Copilot
AI
Nov 14, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resource leak: If create_checkpoint fails, the one-time checkpoint at one_time_checkpoint_path is not cleaned up. Consider adding error handling to ensure cleanup on failure, similar to the pattern used at line 419.
| .await?; | |
| .await | |
| .inspect_err(|_| { | |
| let _ = self.remove_one_time_checkpoint(&one_time_checkpoint_path); | |
| })?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with this, but I think is better to do it once since two lines below we do it any way
| .inspect_err(|_| { | ||
| let _ = self.remove_one_time_checkpoint(&one_time_checkpoint_path); | ||
| })?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could log the error here
| &new_checkpoint_path, | ||
| &self.rollup_store, | ||
| ) | ||
| .await?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with this, but I think is better to do it once since two lines below we do it any way
| let batch_prover_input = self.generate_batch_prover_input(&batch).await?; | ||
|
|
||
| self.rollup_store | ||
| .seal_batch_with_prover_input(batch.clone(), &self.git_commit_hash, batch_prover_input) | ||
| .await?; | ||
|
|
||
| // Create the next checkpoint from the one-time checkpoint used | ||
| let new_checkpoint_path = self | ||
| .checkpoints_dir | ||
| .join(batch_checkpoint_name(batch_number)); | ||
| let (new_checkpoint_store, _) = self | ||
| .create_checkpoint( | ||
| &one_time_checkpoint_store, | ||
| &new_checkpoint_path, | ||
| &self.rollup_store, | ||
| ) | ||
| .await?; | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with doing something like this
**Motivation** This PR addresses two main issues: - Sealing a batch, storing the `prover inputs`, and creating the checkpoint for that batch are currently performed non-atomically. If the node restarts during any of these three operations, the committer will end up in an invalid state. - The batch checkpoint is created at the beginning of the building process. If an error occurs while building the batch, the `l1_committer` gets stuck with the following error: ``` 2025-11-12T15:47:26.129936Z ERROR L1 Committer Error: Committer failed retrieve block from storage: Failed to create RocksDB checkpoint at "dev_ethrex_l2/checkpoint_batch_1": Invalid argument: Directory exists ``` **Description** - Stores the batch and `prover inputs` atomically in the rollup storage. - When the `l1_committer` encounters a batch generated in a previous iteration, it now checks whether the corresponding checkpoint exists. If it doesn't, the committer creates it by re-executing the batch. Closes None
Motivation
This PR addresses two main issues:
prover inputs, and creating the checkpoint for that batch are currently performed non-atomically. If the node restarts during any of these three operations, the committer will end up in an invalid state.l1_committergets stuck with the following error:Description
prover inputsatomically in the rollup storage.l1_committerencounters a batch generated in a previous iteration, it now checks whether the corresponding checkpoint exists. If it doesn't, the committer creates it by re-executing the batch.Closes None