-
Notifications
You must be signed in to change notification settings - Fork 5
Refactor/rollup node refactor #351
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
CodSpeed Performance ReportMerging #351 will degrade performances by 97.16%Comparing Summary
Benchmarks breakdown
|
jonastheis
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is great! Simplifies the readability and concepts in the flow of the code so much! imo it's much easier to reason about the state of the node than before.
A few things:
- we should add an in-depth description of the changes, new features, simplifications -> this will also allow us to systematically evaluate whether we have everything tested or need to add some tests later. + it will help with reviewing
- I left a bunch of comments inline.
- I'm a bit concerned about performance in some cases but we need to evaluate with benchmarks
- I think this PR addresses a few issues at once, we should link that to the description above and then close these issues accordingly:
| impl< | ||
| N: FullNetwork<Primitives = ScrollNetworkPrimitives>, | ||
| CS: ScrollHardforks + EthChainSpec + Send + Sync + 'static, | ||
| > Stream for ScrollNetworkManager<N, CS> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why change this from a Stream to a Future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously the rollup node manager would drive the ScrollNetworkManager future, which would yield NetworkManagerEvent's. Now we spawn the ScrollNetworkManager as a separate task and use channels to send events to the ChainOrchestrator. It's a slightly different architecture but achieves a similar goal. As such we don't need a stream on the ScrollNetworkManager as it doesn't yeild events anymore.
crates/node/src/args.rs
Outdated
| ChainOrchestrator, ChainOrchestratorConfig, ChainOrchestratorHandle, Consensus, NoopConsensus, | ||
| SystemContractConsensus, | ||
| }; | ||
| // use rollup_node_manager::{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove
crates/node/src/args.rs
Outdated
| number: 0, | ||
| }); | ||
| } | ||
| // if let Some(block_info) = startup_safe_block { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this commented out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've removed it, we no longer need it as now we include the block number associated with derived attributes which allows us to do our reconciliation. Previously we were relying on the safe_block_numer to do the association which was messy and error prone.
crates/node/src/args.rs
Outdated
| self.sequencer_args.allow_empty_blocks, | ||
| ); | ||
| let engine = Engine::new(Arc::new(engine_api), fcs); | ||
| // let engine = EngineDriver::new( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why commented?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
| .stream(self.get_connection()) | ||
| .await? | ||
| .map(|res| Ok(res.map(Into::into)?))) | ||
| Some(L1MessageKey::BlockNumber(block_number)) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is a lot of stuff happening in this function and it would be great to add some comments as to what on a high-level is happening in each branch and why.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
|
||
| return Err(ChainOrchestratorError::ChainInconsistency); | ||
| // /// Wraps a pending chain orchestrator future, metering the completion of it. | ||
| // pub fn handle_metered( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this commented out?
crates/chain-orchestrator/src/lib.rs
Outdated
| soft_limit: usize, | ||
| } | ||
| // If the block number is greater than the current head we attempt to extend the chain. | ||
| let mut new_headers = if received_block_number > self.engine.fcs().head_block_info().number |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| let mut new_headers = if received_block_number > self.engine.fcs().head_block_info().number | |
| let mut new_headers = if received_block_number > current_head_number |
crates/chain-orchestrator/src/lib.rs
Outdated
| .ok_or(ChainOrchestratorError::L2BlockNotFoundInL2Client(received_block_number))?; | ||
|
|
||
| if current_chain_block.header.hash_slow() == received_block_hash { | ||
| tracing::debug!(target: "scroll::chain_orchestrator", ?received_block_hash, ?received_block_number, "Received block from peer that is already in the chain"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure we only want to log this in debug?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
crates/chain-orchestrator/src/lib.rs
Outdated
| // Assert that we are not reorging below the safe head. | ||
| let current_safe_info = self.engine.fcs().safe_block_info(); | ||
| if received_block_number <= current_safe_info.number { | ||
| tracing::debug!(target: "scroll::chain_orchestrator", ?received_block_hash, ?received_block_number, current_safe_info = ?self.engine.fcs().safe_block_info(), "Received block from peer that would reorg below the safe head - ignoring"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure we only want to log this in debug?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
| let mut bytes = [0u8; 1024]; | ||
| rand::rng().fill(bytes.as_mut_slice()); | ||
| let mut u = Unstructured::new(&bytes); | ||
| // Check if the parent hash of the received block is in the chain. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this just a reorg of depth 1? Shouldn't this case also be handled by the reorg logic below? I think the code flow here could be a bit better to make it clearer which conditions are met and which path is taken. especially in the reorg case and with the fork-choice condition if block_with_peer.block.header.timestamp <= current_head.header.timestamp {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not a reorg of depth one, it's a reorg of arbitrary depth, i.e. the new chain has length one, but the depth is arbitrary. As a consequence of this comment, I added a check to ensure that the depth would not result in a safe block reorg. You are correct that we could delegate this to the reorg logic below, but it seems inefficient and a waste, as we already have all the information we need to reconcile the reorg. With some refactoring, I agree we could combine this condition and the reorg logic below in a more readable and efficient manner. For now, I think it's pragmatic to keep it as is.
crates/chain-orchestrator/src/lib.rs
Outdated
| // If the received block number has a block number greater than the current head by more | ||
| // than the optimistic sync threshold, we optimistically sync the chain. | ||
| if received_block_number > current_head_number + self.config.optimistic_sync_threshold() { | ||
| tracing::trace!(target: "scroll::chain_orchestrator", ?received_block_number, ?current_head_number, "Received new block from peer with block number greater than current head by more than the optimistic sync threshold"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here we start optimistic sync but also do the other consolidation. is that intended?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch, fixed.
crates/chain-orchestrator/src/lib.rs
Outdated
| // Safe head should be the highest block from batch index <= 100 | ||
| assert_eq!(safe_head, Some(block_1.block_info)); | ||
| // Persist the mapping of L1 messages to L2 blocks such that we can react to L1 reorgs. | ||
| let blocks = chain.iter().map(|block| block.into()).collect::<Vec<_>>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a valid operation in optimistic sync mode? what if the L1 messages contained in the chain are garbage?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated the logic such that now we only persist and gossip blocks if they have been validated and we have fully synced L1 / L2 and consolidated the chain.
|
|
||
| // If we were previously in L2 syncing mode and the FCS update resulted in a valid state, we | ||
| // transition the L2 sync state to synced and consolidate the chain. | ||
| if result.is_valid() && self.sync_state.l2().is_syncing() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to check if the result is valid? above we already check whether it is invalid and return
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, because there is also the case that it could be Syncing, so in that case, we will want to defer until a later point at which we've fully synced.
crates/chain-orchestrator/src/lib.rs
Outdated
| // Persist the signature for the block and notify the network manager of a successful | ||
| // import. | ||
| let tx = self.database.tx_mut().await?; | ||
| tx.insert_signature(chain_head_hash, block_with_peer.signature).await?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't we already persist the signature in handle_block_from_peer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, I've removed persisting the signature here
|
|
||
| // If the received and expected L1 messages do not match return an error. | ||
| if message_hash != expected_hash { | ||
| self.notify(ChainOrchestratorEvent::L1MessageMismatch { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do we currently react to this event?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The event itself is exclusively used for testing.
greged93
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great refactor, this is soooo much easier to read and nicer to go through then the previous state of the orchestrator and even node in general!
Left some inline comments and a small nit.
| if block_matches_attributes( | ||
| &attributes.attributes, | ||
| ¤t_block, | ||
| current_block.parent_hash, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can go, this check was used before in order to check that the block we received from the L2 was the child block of the safe head in the Engine Driver. Here all we are doing is check block.parent_hash == block.parent_hash.
crates/database/db/src/operations.rs
Outdated
| BlockConsolidationOutcome::Consolidated(block_info) => { | ||
| self.insert_block(block_info, outcome.batch_info).await?; | ||
| } | ||
| BlockConsolidationOutcome::Skipped(block_info) => { | ||
| // No action needed, the block has already been previously consolidated however | ||
| // we will insert it again defensively | ||
| self.insert_block(block_info, outcome.batch_info).await?; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this can collapsed into one arm
| let result = | ||
| self.client.fork_choice_updated_v1(fcs.get_alloy_fcs(), Some(attributes)).await?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
small note here: I think this works in the case of Reth because payloads built from attributes are automatically inserted here.
One concern we might have which isn't handled here but mentioned in the Op stack docs, is the case where the data from the batch contains invalid transaction data and the execution node fails to build a payload. I believe in this case, the result we get here would be valid, but trying to call get_payload(id) would return an error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an important nuance. This will have implications in the Reorg branch of the consolidation logic:
rollup-node/crates/chain-orchestrator/src/lib.rs
Lines 425 to 469 in 44424d6
| BlockConsolidationAction::Reorg(attributes) => { | |
| tracing::info!(target: "scroll::chain_orchestrator", block_number = ?attributes.block_number, "Reorging chain to derived block"); | |
| // We reorg the head to the safe block and then build the payload for the | |
| // attributes. | |
| let head = *self.engine.fcs().safe_block_info(); | |
| if head.number != attributes.block_number - 1 { | |
| return Err(ChainOrchestratorError::InvalidBatchReorg { | |
| batch_info, | |
| safe_block_number: head.number, | |
| derived_block_number: attributes.block_number, | |
| }); | |
| } | |
| let fcu = self.engine.build_payload(Some(head), attributes.attributes).await?; | |
| let payload = self | |
| .engine | |
| .get_payload(fcu.payload_id.expect("payload_id can not be None")) | |
| .await?; | |
| let block: ScrollBlock = try_into_block( | |
| ExecutionData { payload: payload.into(), sidecar: Default::default() }, | |
| self.config.chain_spec().clone(), | |
| ) | |
| .expect("block must be valid"); | |
| let result = self.engine.new_payload(&block).await?; | |
| if result.is_invalid() { | |
| return Err(ChainOrchestratorError::InvalidBatch( | |
| (&block).into(), | |
| batch_info, | |
| )); | |
| } | |
| // Update the forkchoice state to the new head. | |
| let block_info: L2BlockInfoWithL1Messages = (&block).into(); | |
| self.engine | |
| .update_fcs( | |
| Some(block_info.block_info), | |
| Some(block_info.block_info), | |
| Some(block_info.block_info), | |
| ) | |
| .await?; | |
| reorg_results.push(block_info.clone()); | |
| BlockConsolidationOutcome::Reorged(block_info) | |
| } | |
| }; |
Whilst this is an important nuance, I consider accounting for corrupt transaction data to be out of scope of this PR due to the fact that batch submission is permissioned (in the happy case). I propose that we create an issue to track this and address this in a future PR, possibly in the context of a ninfallible derivation pipeline (which we currently don't have).
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I propose that we create an issue to track this and address this in a future PR, possibly in the context of a ninfallible derivation pipeline (which we currently don't have).
Agreed, let's track it and leave as is for now.
crates/sequencer/src/lib.rs
Outdated
| // If there is an inflight payload building job, poll it. | ||
| if let Some(payload_building_job) = this.payload_building_job.as_mut() { | ||
| match payload_building_job.future.as_mut().poll(cx) { | ||
| Poll::Ready(payload_id) => { | ||
| this.payload_building_job = None; | ||
| return Poll::Ready(Some(SequencerEvent::PayloadReady(payload_id))); | ||
| } | ||
| Poll::Pending => {} | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the payload_building_job have higher priority in the polling order? If the payload is ready and the trigger as well, the current order means we decide to skip the next slot. If we invert them, we would return the payload to the chain orchestrator, and would catch the trigger on the next polling (might be a little late, but at least we won't completely miss it).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch
greged93
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One additional comment
crates/chain-orchestrator/src/lib.rs
Outdated
| // Persist the signature for the block and notify the network manager of a successful | ||
| // import. | ||
| let tx = self.database.tx_mut().await?; | ||
| tx.insert_signature(chain_head_hash, block_with_peer.signature).await?; | ||
| tx.commit().await?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the signature is already persisted in handle_block_from_peer, which is the only place where this method is called.
| let result = | ||
| self.client.fork_choice_updated_v1(fcs.get_alloy_fcs(), Some(attributes)).await?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I propose that we create an issue to track this and address this in a future PR, possibly in the context of a ninfallible derivation pipeline (which we currently don't have).
Agreed, let's track it and leave as is for now.
greged93
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no further comment, lgtm!
No description provided.