Skip to content

Conversation

@frisitano
Copy link
Collaborator

No description provided.

@frisitano frisitano marked this pull request as draft October 7, 2025 07:10
@frisitano frisitano marked this pull request as ready for review October 8, 2025 17:15
@codspeed-hq
Copy link

codspeed-hq bot commented Oct 8, 2025

CodSpeed Performance Report

Merging #351 will degrade performances by 97.16%

Comparing refactor/rollup-node-refactor (6ff7d55) with main (6852bf6)

Summary

⚡ 1 improvement
❌ 1 regression

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

Benchmark BASE HEAD Change
pipeline_derive_in_file_blobs 27.4 ms 965.2 ms -97.16%
pipeline_derive_s3_blobs 16,914.8 ms 79.3 ms ×210

Copy link
Contributor

@jonastheis jonastheis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is great! Simplifies the readability and concepts in the flow of the code so much! imo it's much easier to reason about the state of the node than before.

A few things:

  • we should add an in-depth description of the changes, new features, simplifications -> this will also allow us to systematically evaluate whether we have everything tested or need to add some tests later. + it will help with reviewing
  • I left a bunch of comments inline.
  • I'm a bit concerned about performance in some cases but we need to evaluate with benchmarks
  • I think this PR addresses a few issues at once, we should link that to the description above and then close these issues accordingly:

impl<
N: FullNetwork<Primitives = ScrollNetworkPrimitives>,
CS: ScrollHardforks + EthChainSpec + Send + Sync + 'static,
> Stream for ScrollNetworkManager<N, CS>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why change this from a Stream to a Future?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously the rollup node manager would drive the ScrollNetworkManager future, which would yield NetworkManagerEvent's. Now we spawn the ScrollNetworkManager as a separate task and use channels to send events to the ChainOrchestrator. It's a slightly different architecture but achieves a similar goal. As such we don't need a stream on the ScrollNetworkManager as it doesn't yeild events anymore.

ChainOrchestrator, ChainOrchestratorConfig, ChainOrchestratorHandle, Consensus, NoopConsensus,
SystemContractConsensus,
};
// use rollup_node_manager::{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove

number: 0,
});
}
// if let Some(block_info) = startup_safe_block {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this commented out?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed it, we no longer need it as now we include the block number associated with derived attributes which allows us to do our reconciliation. Previously we were relying on the safe_block_numer to do the association which was messy and error prone.

self.sequencer_args.allow_empty_blocks,
);
let engine = Engine::new(Arc::new(engine_api), fcs);
// let engine = EngineDriver::new(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why commented?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

.stream(self.get_connection())
.await?
.map(|res| Ok(res.map(Into::into)?)))
Some(L1MessageKey::BlockNumber(block_number)) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is a lot of stuff happening in this function and it would be great to add some comments as to what on a high-level is happening in each branch and why.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


return Err(ChainOrchestratorError::ChainInconsistency);
// /// Wraps a pending chain orchestrator future, metering the completion of it.
// pub fn handle_metered(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this commented out?

soft_limit: usize,
}
// If the block number is greater than the current head we attempt to extend the chain.
let mut new_headers = if received_block_number > self.engine.fcs().head_block_info().number
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let mut new_headers = if received_block_number > self.engine.fcs().head_block_info().number
let mut new_headers = if received_block_number > current_head_number

.ok_or(ChainOrchestratorError::L2BlockNotFoundInL2Client(received_block_number))?;

if current_chain_block.header.hash_slow() == received_block_hash {
tracing::debug!(target: "scroll::chain_orchestrator", ?received_block_hash, ?received_block_number, "Received block from peer that is already in the chain");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure we only want to log this in debug?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

// Assert that we are not reorging below the safe head.
let current_safe_info = self.engine.fcs().safe_block_info();
if received_block_number <= current_safe_info.number {
tracing::debug!(target: "scroll::chain_orchestrator", ?received_block_hash, ?received_block_number, current_safe_info = ?self.engine.fcs().safe_block_info(), "Received block from peer that would reorg below the safe head - ignoring");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure we only want to log this in debug?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

let mut bytes = [0u8; 1024];
rand::rng().fill(bytes.as_mut_slice());
let mut u = Unstructured::new(&bytes);
// Check if the parent hash of the received block is in the chain.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this just a reorg of depth 1? Shouldn't this case also be handled by the reorg logic below? I think the code flow here could be a bit better to make it clearer which conditions are met and which path is taken. especially in the reorg case and with the fork-choice condition if block_with_peer.block.header.timestamp <= current_head.header.timestamp {

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not a reorg of depth one, it's a reorg of arbitrary depth, i.e. the new chain has length one, but the depth is arbitrary. As a consequence of this comment, I added a check to ensure that the depth would not result in a safe block reorg. You are correct that we could delegate this to the reorg logic below, but it seems inefficient and a waste, as we already have all the information we need to reconcile the reorg. With some refactoring, I agree we could combine this condition and the reorg logic below in a more readable and efficient manner. For now, I think it's pragmatic to keep it as is.

@frisitano frisitano requested a review from greged93 October 9, 2025 05:36
// If the received block number has a block number greater than the current head by more
// than the optimistic sync threshold, we optimistically sync the chain.
if received_block_number > current_head_number + self.config.optimistic_sync_threshold() {
tracing::trace!(target: "scroll::chain_orchestrator", ?received_block_number, ?current_head_number, "Received new block from peer with block number greater than current head by more than the optimistic sync threshold");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here we start optimistic sync but also do the other consolidation. is that intended?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch, fixed.

// Safe head should be the highest block from batch index <= 100
assert_eq!(safe_head, Some(block_1.block_info));
// Persist the mapping of L1 messages to L2 blocks such that we can react to L1 reorgs.
let blocks = chain.iter().map(|block| block.into()).collect::<Vec<_>>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a valid operation in optimistic sync mode? what if the L1 messages contained in the chain are garbage?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the logic such that now we only persist and gossip blocks if they have been validated and we have fully synced L1 / L2 and consolidated the chain.


// If we were previously in L2 syncing mode and the FCS update resulted in a valid state, we
// transition the L2 sync state to synced and consolidate the chain.
if result.is_valid() && self.sync_state.l2().is_syncing() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to check if the result is valid? above we already check whether it is invalid and return

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, because there is also the case that it could be Syncing, so in that case, we will want to defer until a later point at which we've fully synced.

// Persist the signature for the block and notify the network manager of a successful
// import.
let tx = self.database.tx_mut().await?;
tx.insert_signature(chain_head_hash, block_with_peer.signature).await?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't we already persist the signature in handle_block_from_peer

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, I've removed persisting the signature here


// If the received and expected L1 messages do not match return an error.
if message_hash != expected_hash {
self.notify(ChainOrchestratorEvent::L1MessageMismatch {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we currently react to this event?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The event itself is exclusively used for testing.

Copy link
Contributor

@greged93 greged93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great refactor, this is soooo much easier to read and nicer to go through then the previous state of the orchestrator and even node in general!

Left some inline comments and a small nit.

if block_matches_attributes(
&attributes.attributes,
&current_block,
current_block.parent_hash,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can go, this check was used before in order to check that the block we received from the L2 was the child block of the safe head in the Engine Driver. Here all we are doing is check block.parent_hash == block.parent_hash.

Comment on lines 416 to 423
BlockConsolidationOutcome::Consolidated(block_info) => {
self.insert_block(block_info, outcome.batch_info).await?;
}
BlockConsolidationOutcome::Skipped(block_info) => {
// No action needed, the block has already been previously consolidated however
// we will insert it again defensively
self.insert_block(block_info, outcome.batch_info).await?;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this can collapsed into one arm

Comment on lines +139 to +140
let result =
self.client.fork_choice_updated_v1(fcs.get_alloy_fcs(), Some(attributes)).await?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small note here: I think this works in the case of Reth because payloads built from attributes are automatically inserted here.

One concern we might have which isn't handled here but mentioned in the Op stack docs, is the case where the data from the batch contains invalid transaction data and the execution node fails to build a payload. I believe in this case, the result we get here would be valid, but trying to call get_payload(id) would return an error.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an important nuance. This will have implications in the Reorg branch of the consolidation logic:

BlockConsolidationAction::Reorg(attributes) => {
tracing::info!(target: "scroll::chain_orchestrator", block_number = ?attributes.block_number, "Reorging chain to derived block");
// We reorg the head to the safe block and then build the payload for the
// attributes.
let head = *self.engine.fcs().safe_block_info();
if head.number != attributes.block_number - 1 {
return Err(ChainOrchestratorError::InvalidBatchReorg {
batch_info,
safe_block_number: head.number,
derived_block_number: attributes.block_number,
});
}
let fcu = self.engine.build_payload(Some(head), attributes.attributes).await?;
let payload = self
.engine
.get_payload(fcu.payload_id.expect("payload_id can not be None"))
.await?;
let block: ScrollBlock = try_into_block(
ExecutionData { payload: payload.into(), sidecar: Default::default() },
self.config.chain_spec().clone(),
)
.expect("block must be valid");
let result = self.engine.new_payload(&block).await?;
if result.is_invalid() {
return Err(ChainOrchestratorError::InvalidBatch(
(&block).into(),
batch_info,
));
}
// Update the forkchoice state to the new head.
let block_info: L2BlockInfoWithL1Messages = (&block).into();
self.engine
.update_fcs(
Some(block_info.block_info),
Some(block_info.block_info),
Some(block_info.block_info),
)
.await?;
reorg_results.push(block_info.clone());
BlockConsolidationOutcome::Reorged(block_info)
}
};

Whilst this is an important nuance, I consider accounting for corrupt transaction data to be out of scope of this PR due to the fact that batch submission is permissioned (in the happy case). I propose that we create an issue to track this and address this in a future PR, possibly in the context of a ninfallible derivation pipeline (which we currently don't have).

What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I propose that we create an issue to track this and address this in a future PR, possibly in the context of a ninfallible derivation pipeline (which we currently don't have).

Agreed, let's track it and leave as is for now.

Comment on lines 253 to 261
// If there is an inflight payload building job, poll it.
if let Some(payload_building_job) = this.payload_building_job.as_mut() {
match payload_building_job.future.as_mut().poll(cx) {
Poll::Ready(payload_id) => {
this.payload_building_job = None;
return Poll::Ready(Some(SequencerEvent::PayloadReady(payload_id)));
}
Poll::Pending => {}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the payload_building_job have higher priority in the polling order? If the payload is ready and the trigger as well, the current order means we decide to skip the next slot. If we invert them, we would return the payload to the chain orchestrator, and would catch the trigger on the next polling (might be a little late, but at least we won't completely miss it).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch

Copy link
Contributor

@greged93 greged93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One additional comment

Comment on lines 1077 to 1081
// Persist the signature for the block and notify the network manager of a successful
// import.
let tx = self.database.tx_mut().await?;
tx.insert_signature(chain_head_hash, block_with_peer.signature).await?;
tx.commit().await?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the signature is already persisted in handle_block_from_peer, which is the only place where this method is called.

Comment on lines +139 to +140
let result =
self.client.fork_choice_updated_v1(fcs.get_alloy_fcs(), Some(attributes)).await?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I propose that we create an issue to track this and address this in a future PR, possibly in the context of a ninfallible derivation pipeline (which we currently don't have).

Agreed, let's track it and leave as is for now.

@frisitano frisitano requested a review from greged93 October 13, 2025 12:12
Copy link
Contributor

@greged93 greged93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no further comment, lgtm!

@frisitano frisitano merged commit 5472bfd into main Oct 13, 2025
14 of 15 checks passed
@frisitano frisitano deleted the refactor/rollup-node-refactor branch October 13, 2025 13:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants