Skip to content

Conversation

@avilagaston9
Copy link
Contributor

@avilagaston9 avilagaston9 commented Nov 13, 2025

Motivation

This PR addresses two main issues:

  • Sealing a batch, storing the prover inputs, and creating the checkpoint for that batch are currently performed non-atomically. If the node restarts during any of these three operations, the committer will end up in an invalid state.
  • The batch checkpoint is created at the beginning of the building process. If an error occurs while building the batch, the l1_committer gets stuck with the following error:
2025-11-12T15:47:26.129936Z ERROR L1 Committer Error: Committer failed retrieve block from storage: Failed to create RocksDB checkpoint at "dev_ethrex_l2/checkpoint_batch_1": Invalid argument: Directory exists

Description

  • Stores the batch and prover inputs atomically in the rollup storage.
  • When the l1_committer encounters a batch generated in a previous iteration, it now checks whether the corresponding checkpoint exists. If it doesn't, the committer creates it by re-executing the batch.

Closes None

Comment on lines -128 to -131
/// Blockchain instance using the current checkpoint store.
///
/// It is used for witness generation.
current_checkpoint_blockchain: Arc<Blockchain>,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was unused

@github-actions
Copy link

github-actions bot commented Nov 13, 2025

Lines of code report

Total lines added: 193
Total lines removed: 5
Total lines changed: 198

Detailed view
+----------------------------------------------------+-------+------+
| File                                               | Lines | Diff |
+----------------------------------------------------+-------+------+
| ethrex/crates/l2/sequencer/l1_committer.rs         | 1202  | +125 |
+----------------------------------------------------+-------+------+
| ethrex/crates/l2/sequencer/utils.rs                | 165   | -5   |
+----------------------------------------------------+-------+------+
| ethrex/crates/l2/storage/src/api.rs                | 134   | +6   |
+----------------------------------------------------+-------+------+
| ethrex/crates/l2/storage/src/store.rs              | 346   | +10  |
+----------------------------------------------------+-------+------+
| ethrex/crates/l2/storage/src/store_db/in_memory.rs | 360   | +15  |
+----------------------------------------------------+-------+------+
| ethrex/crates/l2/storage/src/store_db/sql.rs       | 914   | +26  |
+----------------------------------------------------+-------+------+
| ethrex/crates/storage/api.rs                       | 247   | +1   |
+----------------------------------------------------+-------+------+
| ethrex/crates/storage/store.rs                     | 1497  | +3   |
+----------------------------------------------------+-------+------+
| ethrex/crates/storage/store_db/in_memory.rs        | 633   | +3   |
+----------------------------------------------------+-------+------+
| ethrex/crates/storage/store_db/rocksdb.rs          | 1652  | +4   |
+----------------------------------------------------+-------+------+

Comment on lines -340 to -344
// We need to guarantee that the checkpoint path is new
// to avoid causing a lock error under rocksdb feature.
let new_checkpoint_path = self
.checkpoints_dir
.join(batch_checkpoint_name(batch_number));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Creating the batch checkpoint at the beginning of the execution prevents us from recovering from errors, as we can't create a checkpoint on an already used path.

@github-actions github-actions bot added the L2 Rollup client label Nov 13, 2025
@avilagaston9 avilagaston9 self-assigned this Nov 13, 2025
@avilagaston9 avilagaston9 marked this pull request as ready for review November 14, 2025 03:53
Copilot AI review requested due to automatic review settings November 14, 2025 03:53
@avilagaston9 avilagaston9 requested a review from a team as a code owner November 14, 2025 03:53
@avilagaston9 avilagaston9 moved this to In Review in ethrex_l2 Nov 14, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR addresses critical atomicity and checkpoint creation issues in the L2 sequencer, ensuring that batch sealing, prover input storage, and checkpoint creation happen atomically to prevent invalid committer states during node restarts.

Key Changes

  • Introduced seal_batch_with_prover_input() method to atomically store batches and prover inputs in a single database transaction
  • Added get_store_directory() method to enable checkpoint path validation
  • Implemented check_current_checkpoint() to verify and regenerate missing checkpoints by re-executing batch blocks

Reviewed Changes

Copilot reviewed 10 out of 11 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
crates/storage/store_db/rocksdb.rs Added get_store_directory() to return the RocksDB path
crates/storage/store_db/in_memory.rs Added get_store_directory() returning a placeholder path for in-memory store
crates/storage/store.rs Exposed get_store_directory() through the Store API
crates/storage/api.rs Added get_store_directory() to the StoreEngine trait
crates/l2/storage/src/store_db/sql.rs Implemented atomic seal_batch_with_prover_input() using database transactions and extracted seal_batch_in_tx() helper
crates/l2/storage/src/store_db/in_memory.rs Implemented seal_batch_with_prover_input() with sequential operations (acceptable for in-memory)
crates/l2/storage/src/store.rs Exposed seal_batch_with_prover_input() with documentation
crates/l2/storage/src/api.rs Added seal_batch_with_prover_input() to StoreEngineRollup trait
crates/l2/sequencer/utils.rs Refactored fetch_blocks_with_respective_fee_configs() to accept Batch reference instead of batch number, improving error messages
crates/l2/sequencer/l1_committer.rs Refactored batch production workflow: checkpoint creation moved after atomic batch/prover input storage, added checkpoint validation logic, extracted helper methods for one-time checkpoint management
Cargo.lock Updated dependencies to use standard crates.io versions instead of patched versions

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +580 to 597
let batch_prover_input = self.generate_batch_prover_input(&batch).await?;

self.rollup_store
.seal_batch_with_prover_input(batch.clone(), &self.git_commit_hash, batch_prover_input)
.await?;

// Create the next checkpoint from the one-time checkpoint used
let new_checkpoint_path = self
.checkpoints_dir
.join(batch_checkpoint_name(batch_number));
let (new_checkpoint_store, _) = self
.create_checkpoint(
&one_time_checkpoint_store,
&new_checkpoint_path,
&self.rollup_store,
)
.await?;

Copy link

Copilot AI Nov 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resource leak: If generate_batch_prover_input, seal_batch_with_prover_input, or create_checkpoint fail, the one-time checkpoint at one_time_checkpoint_path is not cleaned up. Consider wrapping this section with error handling similar to lines 545-547 to ensure cleanup on failure.

Suggested change
let batch_prover_input = self.generate_batch_prover_input(&batch).await?;
self.rollup_store
.seal_batch_with_prover_input(batch.clone(), &self.git_commit_hash, batch_prover_input)
.await?;
// Create the next checkpoint from the one-time checkpoint used
let new_checkpoint_path = self
.checkpoints_dir
.join(batch_checkpoint_name(batch_number));
let (new_checkpoint_store, _) = self
.create_checkpoint(
&one_time_checkpoint_store,
&new_checkpoint_path,
&self.rollup_store,
)
.await?;
// Wrap the fallible section in error handling to ensure cleanup on failure
let result = async {
let batch_prover_input = self.generate_batch_prover_input(&batch).await?;
self.rollup_store
.seal_batch_with_prover_input(batch.clone(), &self.git_commit_hash, batch_prover_input)
.await?;
// Create the next checkpoint from the one-time checkpoint used
let new_checkpoint_path = self
.checkpoints_dir
.join(batch_checkpoint_name(batch_number));
let (new_checkpoint_store, _) = self
.create_checkpoint(
&one_time_checkpoint_store,
&new_checkpoint_path,
&self.rollup_store,
)
.await?;
Ok(new_checkpoint_store)
}.await;
let new_checkpoint_store = match result {
Ok(store) => store,
Err(e) => {
// Cleanup the one-time checkpoint on error
let _ = self.remove_one_time_checkpoint(&one_time_checkpoint_path);
return Err(e.into());
}
};

Copilot uses AI. Check for mistakes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with doing something like this

&new_checkpoint_path,
&self.rollup_store,
)
.await?;
Copy link

Copilot AI Nov 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resource leak: If create_checkpoint fails, the one-time checkpoint at one_time_checkpoint_path is not cleaned up. Consider adding error handling to ensure cleanup on failure, similar to the pattern used at line 419.

Suggested change
.await?;
.await
.inspect_err(|_| {
let _ = self.remove_one_time_checkpoint(&one_time_checkpoint_path);
})?;

Copilot uses AI. Check for mistakes.
Copy link
Contributor

@tomip01 tomip01 Nov 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with this, but I think is better to do it once since two lines below we do it any way

Comment on lines +419 to +421
.inspect_err(|_| {
let _ = self.remove_one_time_checkpoint(&one_time_checkpoint_path);
})?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could log the error here

&new_checkpoint_path,
&self.rollup_store,
)
.await?;
Copy link
Contributor

@tomip01 tomip01 Nov 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with this, but I think is better to do it once since two lines below we do it any way

Comment on lines +580 to 597
let batch_prover_input = self.generate_batch_prover_input(&batch).await?;

self.rollup_store
.seal_batch_with_prover_input(batch.clone(), &self.git_commit_hash, batch_prover_input)
.await?;

// Create the next checkpoint from the one-time checkpoint used
let new_checkpoint_path = self
.checkpoints_dir
.join(batch_checkpoint_name(batch_number));
let (new_checkpoint_store, _) = self
.create_checkpoint(
&one_time_checkpoint_store,
&new_checkpoint_path,
&self.rollup_store,
)
.await?;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with doing something like this

@tomip01 tomip01 added this pull request to the merge queue Nov 14, 2025
Merged via the queue into main with commit 8825ef3 Nov 14, 2025
54 checks passed
@tomip01 tomip01 deleted the fix/l2/checkpoint_creation branch November 14, 2025 21:19
@github-project-automation github-project-automation bot moved this from In Review to Done in ethrex_l2 Nov 14, 2025
lakshya-sky pushed a commit to lakshya-sky/ethrex that referenced this pull request Nov 17, 2025
**Motivation**

This PR addresses two main issues:

- Sealing a batch, storing the `prover inputs`, and creating the
checkpoint for that batch are currently performed non-atomically. If the
node restarts during any of these three operations, the committer will
end up in an invalid state.
- The batch checkpoint is created at the beginning of the building
process. If an error occurs while building the batch, the `l1_committer`
gets stuck with the following error:

```
2025-11-12T15:47:26.129936Z ERROR L1 Committer Error: Committer failed retrieve block from storage: Failed to create RocksDB checkpoint at "dev_ethrex_l2/checkpoint_batch_1": Invalid argument: Directory exists
```

**Description**

- Stores the batch and `prover inputs` atomically in the rollup storage.
- When the `l1_committer` encounters a batch generated in a previous
iteration, it now checks whether the corresponding checkpoint exists. If
it doesn't, the committer creates it by re-executing the batch.


Closes None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

L2 Rollup client

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

4 participants