Improve "latest" block resolution on pruned nodes#1856
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughAdds state-pruning awareness and bounded backfill/reconciliation to KV mapping-sync: sync APIs accept a client reference and optional Changes
Sequence Diagram(s)sequenceDiagram
participant Node as Node/service
participant Worker as MappingSyncWorker
participant Client as HeaderBackend (client)
participant KV as KV datastore
participant Reconciler as CanonicalReconciler
Node->>Worker: spawn_frontier_tasks(state_pruning_blocks)
Worker->>Worker: start sync_blocks(..., state_pruning_blocks)
loop per block/tip
Worker->>Client: resolve skip target / fetch header/hash
alt skip-target resolvable (in live window)
Worker->>Worker: adjust current_syncing_tips, persist tips
Worker->>Client: fetch header (may be pruned)
alt state available
Worker->>KV: write number→hash mapping (full)
else state unavailable
Worker->>KV: write minimal MappingCommitment (no txs)
Worker->>Reconciler: reconcile_recent_window(window)
Reconciler->>KV: repair/advance mappings (including digest fallback)
end
Worker->>Worker: backfill_number_mappings (bounded)
else skip-target missing/error
Worker->>Worker: persist tips, log warning, backoff/return
end
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@client/mapping-sync/src/kv/mod.rs`:
- Around line 299-304: The code computes the pruning window using best_number
(client.info().best_number) which is wrong for Constrained pruning; change the
basis to the finalized block height by replacing use of
client.info().best_number when computing best_number_u64 (and consequently
live_window_start_u64 and skip_to_u64) with client.info().finalized_number (or
finalized_number.unique_saturated_into()), so live_window_start_u64 =
finalized_number_u64.saturating_sub(state_pruning_blocks) and skip_to_u64 uses
that finalized-based window; keep the rest of the logic (sync_from_u64,
skip_to_u64, current_number_u64 from operating_header) identical.
- Around line 62-67: The code currently swallows errors from
HeaderBackend::hash() via .ok().flatten() (creating canonical_hash_at_number)
and a similar pattern at the other call, treating hash failures as "no canonical
block" and causing sync_block()/pruning logic to commit partial/unsafe state;
instead propagate the Err to force a retry: change the places using
client.hash(...) that assign canonical_hash_at_number and the second check at
the other call to return Err(...) (or map to the function's error type) when
client.hash() yields Err, and only treat Ok(None) as the non-canonical case;
update the handling around NumberMappingWrite and sync_block() so they only
proceed on successful Ok(Some(hash)) comparisons and propagate errors from
client.hash() up the call stack rather than converting them to Skip/None.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 586654c4-323e-4f6e-81b1-e3fe0bcb595c
📒 Files selected for processing (5)
client/mapping-sync/src/kv/mod.rsclient/mapping-sync/src/kv/worker.rsclient/mapping-sync/src/sql/mod.rstemplate/node/src/eth.rstemplate/node/src/service.rs
librelois
left a comment
There was a problem hiding this comment.
client/mapping-sync/src/kv/mod.rs (line 248) and client/mapping-sync/src/kv/mod.rs (line 325) conflict. sync_one_block maintains a queue of pending current_syncing_tips, but the new pruning skip path replaces that queue with only skip_hash. If the node restarts while multiple leaves exist, any other queued leaf is discarded even if it is still inside the live window and should be indexed. That creates a correctness hole around reorg/fork catch-up.
There was a problem hiding this comment.
♻️ Duplicate comments (1)
client/mapping-sync/src/kv/mod.rs (1)
309-336:⚠️ Potential issue | 🟠 MajorDon’t silently fall through when skip-target lookups fail.
Line 312 still treats
client.hash(skip_to_number)failure as “no skip”, so the function drops intosync_block()for a block already known to be outside the live window. Lines 325-335 also turnheader()errors intofalse, which permanently drops queued tips fromcurrent_syncing_tips. Both cases can reintroduce stuck or partially indexed branches on a transient backend failure.🛠️ Proposed fix
if current_number_u64 < skip_to_u64 { let skip_to_number = skip_to_u64.saturated_into::<<Block::Header as HeaderT>::Number>(); - if let Ok(Some(skip_hash)) = client.hash(skip_to_number) { - log::warn!( - target: "mapping-sync", - "Pruned node: skipping blocks #{}..#{} (outside live state window), \ - jumping tip to #{}", - current_number_u64, - skip_to_u64.saturating_sub(1), - skip_to_u64, - ); - // Retain any tips still within the live window rather than - // discarding them — they may be unsynced fork branches that - // need indexing. Replace only the out-of-window tip with - // the skip target. - current_syncing_tips.retain(|tip| { - substrate_backend - .blockchain() - .header(*tip) - .ok() - .flatten() - .map(|h| { - let n: u64 = (*h.number()).unique_saturated_into(); - n >= skip_to_u64 - }) - .unwrap_or(false) - }); - current_syncing_tips.push(skip_hash); - frontier_backend - .meta() - .write_current_syncing_tips(current_syncing_tips)?; - return Ok(true); - } + let skip_hash = client + .hash(skip_to_number) + .map_err(|e| format!("failed to resolve skip target #{skip_to_u64}: {e:?}"))? + .ok_or_else(|| format!("canonical hash for skip target #{skip_to_u64} not found"))?; + log::warn!( + target: "mapping-sync", + "Pruned node: skipping blocks #{}..#{} (outside live state window), \ + jumping tip to #{}", + current_number_u64, + skip_to_u64.saturating_sub(1), + skip_to_u64, + ); + let mut retained_tips = Vec::with_capacity(current_syncing_tips.len() + 1); + for tip in current_syncing_tips { + match substrate_backend + .blockchain() + .header(tip) + .map_err(|e| format!("failed to load syncing tip {tip:?}: {e:?}"))? + { + Some(h) => { + let n: u64 = (*h.number()).unique_saturated_into(); + if n >= skip_to_u64 { + retained_tips.push(tip); + } + } + None => {} + } + } + retained_tips.push(skip_hash); + frontier_backend + .meta() + .write_current_syncing_tips(retained_tips)?; + return Ok(true); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@client/mapping-sync/src/kv/mod.rs` around lines 309 - 336, The code currently treats Err from client.hash(skip_to_number) as Ok(None) and treats header() errors as false, which can drop tips or proceed to sync_block() wrongly; update the client.hash(skip_to_number) handling in the block using skip_to_number so that if client.hash(...) returns Err you log the error and return/propagate that error (or otherwise bail out of the current sync attempt) instead of falling through to sync_block(), and change the retain closure around current_syncing_tips (the substrate_backend.blockchain().header(...) chain that ends with .unwrap_or(false)) to conservatively keep tips on errors (use .unwrap_or(true) or otherwise treat Err as “keep this tip”) so transient backend failures don’t permanently drop queued tips.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@client/mapping-sync/src/kv/mod.rs`:
- Around line 309-336: The code currently treats Err from
client.hash(skip_to_number) as Ok(None) and treats header() errors as false,
which can drop tips or proceed to sync_block() wrongly; update the
client.hash(skip_to_number) handling in the block using skip_to_number so that
if client.hash(...) returns Err you log the error and return/propagate that
error (or otherwise bail out of the current sync attempt) instead of falling
through to sync_block(), and change the retain closure around
current_syncing_tips (the substrate_backend.blockchain().header(...) chain that
ends with .unwrap_or(false)) to conservatively keep tips on errors (use
.unwrap_or(true) or otherwise treat Err as “keep this tip”) so transient backend
failures don’t permanently drop queued tips.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 68a48923-fb28-459c-bacc-0ada47d07789
📒 Files selected for processing (1)
client/mapping-sync/src/kv/mod.rs
There was a problem hiding this comment.
sync_block now treats inability to resolve the canonical hash at a block number as a hard sync error.
Location: frontier/client/mapping-sync/src/kv/mod.rs (line 50).
Why it matters: this call path now does client.hash(*header.number())? and returns an error if it fails. That aborts the block sync attempt (Err(...)) instead of continuing with a write_none-style best-effort behavior used for pruned/partial state.
Risk: on nodes where hash-at-height lookup can fail intermittently (or by design in edge conditions), the worker can repeatedly retry and stall catch-up progress for that tip.
Also, Pruned-range skip logic has a silent fallback when client.hash(skip_to_number) returns None/error
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
client/mapping-sync/src/kv/mod.rs (1)
59-69:⚠️ Potential issue | 🟠 MajorDon't turn
client.hash()failures intoSkip.
ErrandOk(None)both fall through toNumberMappingWrite::Skip, which can still commit a block as synced without a number mapping after a transient backend failure. That recreates the same partial-state outcome this PR is trying to eliminate;Skipshould be reserved for a definite non-canonical hash.#!/bin/bash # Inspect the canonical-hash decision in sync_block and compare it with the # explicit Err/Ok(None) handling in the pruning skip path. sed -n '57,69p' client/mapping-sync/src/kv/mod.rs echo sed -n '309,364p' client/mapping-sync/src/kv/mod.rs echo # Show all current client.hash() handling in mapping-sync for consistency review. rg -n -C2 'client\.hash\(' client/mapping-sync/src -g '*.rs'🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@client/mapping-sync/src/kv/mod.rs` around lines 59 - 69, The current logic collapses client.hash(...) errors and Ok(None) into NumberMappingWrite::Skip; instead, treat an Err from client.hash as a real backend failure and propagate/return an error (do not set Skip), while only mapping Ok(None) to Skip (since that represents a non-canonical/missing hash). Update the code around the canonical_hash_at_number computation in sync_block (the client.hash call and number_mapping_write assignment) to match: match client.hash(*header.number()) { Ok(Some(h)) => compare to substrate_block_hash to decide Write vs Skip, Ok(None) => NumberMappingWrite::Skip, Err(e) => return Err(e) or convert to an appropriate Sync error and abort the sync so we don't commit a block without a number mapping. Ensure the change references client.hash, canonical_hash_at_number, substrate_block_hash, and NumberMappingWrite.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@client/mapping-sync/src/kv/mod.rs`:
- Around line 59-69: The current logic collapses client.hash(...) errors and
Ok(None) into NumberMappingWrite::Skip; instead, treat an Err from client.hash
as a real backend failure and propagate/return an error (do not set Skip), while
only mapping Ok(None) to Skip (since that represents a non-canonical/missing
hash). Update the code around the canonical_hash_at_number computation in
sync_block (the client.hash call and number_mapping_write assignment) to match:
match client.hash(*header.number()) { Ok(Some(h)) => compare to
substrate_block_hash to decide Write vs Skip, Ok(None) =>
NumberMappingWrite::Skip, Err(e) => return Err(e) or convert to an appropriate
Sync error and abort the sync so we don't commit a block without a number
mapping. Ensure the change references client.hash, canonical_hash_at_number,
substrate_block_hash, and NumberMappingWrite.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: b58692db-8b2b-42c0-88f7-6f464d4ce87c
📒 Files selected for processing (1)
client/mapping-sync/src/kv/mod.rs
librelois
left a comment
There was a problem hiding this comment.
Possible unbounded sync stall during catch-up on heavily pruned nodes
In client/mapping-sync/src/kv/mod.rs:399-411, backfill_number_mappings(...) is invoked with from = skip_to_u64 and to = best_number_u64, and it scans the full inclusive range in one loop.
If best_number_u64 - skip_to_u64 is large, this can do many DB/header lookups in a single work
librelois
left a comment
There was a problem hiding this comment.
Add targeted tests for the new state_pruning_blocks flow in KV sync:
stubbed HeaderBackend::hash/header/is_synced behavior to cover Some, None, and error branches.
Add an integration-style assertion for skipped tips retention (ensuring in-window fork tips are preserved) after pruning catch-up.
Consider a comment or constant naming audit for state_pruning_blocks semantics (prune window derived from finalized number vs best number) to avoid future misconfiguration assumptions.
librelois
left a comment
There was a problem hiding this comment.
- Extend the backfill path so
BlockHash-only canonical blocks become fully discoverable by KV lookups, not just present inBLOCK_NUMBER_MAPPING. - Change the tip-retention logic to keep unknown/erroring tips and retry them later instead of dropping them.
- Add a targeted test for the pruned
PostLog::BlockHashcase
librelois
left a comment
There was a problem hiding this comment.
[Medium] Periodic worker no longer advances full-history canonical repair cursor
The KV worker now runs reconcile_recent_window(..., PERIODIC_RECONCILE_WINDOW = 16) when idle instead of the old cursor-driven reconcile_from_cursor_batch(..., CANONICAL_NUMBER_REPAIR_BATCH_SIZE). As a result, stale or missing BLOCK_NUMBER_MAPPING entries outside the most recent 16 blocks may remain unrepaired indefinitely unless another path happens to revisit them.
Suggested fix
Restore a background cursor-based sweep in addition to the recent-window reconcile.
Options:
- Run
reconcile_recent_window(...)for fast head repair, then also runrepair_canonical_number_mappings_batch(...)with a bounded batch size each idle tick. - Or alternate between recent-window and cursor-based batch repair on successive idle ticks.
Recommended approach:
- Keep the recent-window reconcile for head correctness.
- Reintroduce
repair_canonical_number_mappings_batch(...)with a capped batch size so old ranges are eventually repaired without reintroducing large per-tick stalls.
[Low] Reconciler tx repair only checks the first transaction
In the state-available reconciler path, transaction repair is triggered by checking metadata only for ethereum_block.transactions[0]. If tx0 is present but a later transaction mapping is missing, the reconciler will skip repair and leave TRANSACTION_MAPPING incomplete for that block.
Suggested fix
Change the repair predicate to validate all transactions in the block, not just the first one.
Options:
- Treat the block as needing repair if any transaction hash is missing canonical metadata for the current substrate block.
- Or compare the number of mapped transactions for this canonical block against
ethereum_block.transactions.len()before deciding to skip repair.
Recommended approach:
- Use
any(...)across all transaction hashes and rewrite the full transaction mapping whenever any expected canonical entry is missing. - Keep the existing
write_hashes(...)full rewrite behavior once repair is triggered.
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
client/mapping-sync/src/kv/mod.rs (1)
70-80:⚠️ Potential issue | 🟠 MajorPropagate canonical-hash lookup failures instead of silently falling back to
Skip.Line 75 converts
client.hash()failures intoNone, and theSkipbranch still writes the block as synced withoutBLOCK_NUMBER_MAPPING. That recreates the partial-sync state this PR is trying to eliminate.Suggested fix
- let canonical_hash_at_number = client.hash(*header.number()).ok().flatten(); + let canonical_hash_at_number = client + .hash(*header.number()) + .map_err(|e| format!("failed to resolve canonical hash at #{block_number}: {e:?}"))?; let number_mapping_write = if canonical_hash_at_number == Some(substrate_block_hash) { fc_db::kv::NumberMappingWrite::Write } else { fc_db::kv::NumberMappingWrite::Skip };Run this to confirm that
Skipstill omits the number mapping whilewrite_hashespersists the rest:#!/bin/bash set -euo pipefail echo "sync_block canonical lookup:" sed -n '67,82p' client/mapping-sync/src/kv/mod.rs echo echo "NumberMappingWrite / write_hashes behavior:" rg -n "NumberMappingWrite|fn write_hashes|set_block_hash_by_number|SYNCED_MAPPING|BLOCK_NUMBER_MAPPING" client/db/src/kv/mod.rs -C 3🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@client/mapping-sync/src/kv/mod.rs` around lines 70 - 80, The code currently swallows failures from client.hash() by mapping Err to None and then choosing NumberMappingWrite::Skip, which allows the block to be marked synced without BLOCK_NUMBER_MAPPING; instead, change the logic so that client.hash(*header.number()) is not flattened into an Option but its Result is inspected: if Ok(Some(h)) compare to substrate_block_hash and set fc_db::kv::NumberMappingWrite::Write when equal or Skip when Ok(None) (meaning no canonical hash), but if client.hash(...) returns Err propagate that Err upward (return Err from sync_block or map it into the enclosing Result) so the caller does not mark the block as fully synced; update the surrounding code path that constructs number_mapping_write (and callers like write_hashes or the sync_block function) to handle the propagated error rather than silently using NumberMappingWrite::Skip.
🧹 Nitpick comments (2)
client/rpc/src/eth/mod.rs (1)
262-280: LGTM - Fast-path optimization is sound.The fast-path correctly short-circuits when the latest indexed block is already readable, avoiding unnecessary cache lookups and backward scans. This aligns with the PR objective of improving "latest" block resolution on pruned nodes.
Optional: Improve log clarity.
The log message sets all flags (
cache_hit,bounded_hit,exhaustive_hit,full_miss) tofalse, which doesn't clearly indicate that the fast-path was taken. Consider adding afast_path_hit=trueindicator for easier debugging and monitoring.,
🔧 Suggested log improvement
log::debug!( target: "rpc", - "latest readable selection cache_hit=false bounded_hit=false exhaustive_hit=false full_miss=false bounded_scanned_hops=0 exhaustive_scanned_hops=0 limit={}", + "latest readable selection fast_path_hit=true cache_hit=false bounded_hit=false exhaustive_hit=false full_miss=false bounded_scanned_hops=0 exhaustive_scanned_hops=0 limit={}", self.latest_readable_scan_limit, );🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@client/rpc/src/eth/mod.rs` around lines 262 - 280, The fast-path log should explicitly indicate the fast-path was taken; update the log::debug call inside the branch that checks self.storage_override.current_block(latest_indexed_hash) to include a fast_path_hit=true field (in addition to existing flags) so it's clear when the short-circuit occurs; modify the log message invoked near last_readable_latest.replace(latest_indexed_hash) (and referencing self.latest_readable_scan_limit) to add fast_path_hit=true.ts-tests/tests/test-reorg-block-consistency.ts (1)
75-93: Potential race condition between chain tip and mapping-sync.After
waitForBlockreturns forexpectedHead(Fork B's tip), the canonical chain has switched. However, the mapping-sync worker may not have updated the Ethereum block hash mappings at lower heights yet. The assertions on lines 84 and 90 queryreorgHeightHex(a lower height), and could intermittently see stale Fork A data if mapping-sync lags.Consider adding a polling retry loop around the post-reorg assertions to improve test stability:
🔧 Suggested retry pattern for robustness
// Wait for fork B to become canonical (longer chain). const expectedHead = "0x" + (tipNumber + 4).toString(16); await waitForBlock(context.web3, expectedHead, 20_000); - // After the reorg, the Ethereum block at the reorg height must differ. - const ethBlockAfterReorg = (await customRequest(context.web3, "eth_getBlockByNumber", [reorgHeightHex, false])) - .result; - - expect(ethBlockAfterReorg).to.not.be.null; - expect(ethBlockAfterReorg.hash).to.not.equal( - ethHashForkA, - "block hash at the reorg height must change to fork B's block" - ); + // Poll until the mapping-sync reflects Fork B at the reorg height. + const pollTimeout = 5_000; + const pollStart = Date.now(); + let ethBlockAfterReorg: any = null; + while (Date.now() - pollStart < pollTimeout) { + ethBlockAfterReorg = (await customRequest(context.web3, "eth_getBlockByNumber", [reorgHeightHex, false])) + .result; + if (ethBlockAfterReorg && ethBlockAfterReorg.hash !== ethHashForkA) { + break; + } + await new Promise<void>((r) => setTimeout(r, 50)); + } + expect(ethBlockAfterReorg).to.not.be.null; + expect(ethBlockAfterReorg.hash).to.not.equal( + ethHashForkA, + "block hash at the reorg height must change to fork B's block" + );🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ts-tests/tests/test-reorg-block-consistency.ts` around lines 75 - 93, The test can race with the mapping-sync worker: after waitForBlock(expectedHead) the node's canonical tip has changed but mapping-sync may not have updated lower-height Ethereum hashes yet, so wrap the post-reorg check that fetches ethBlockAfterReorg (via customRequest(context.web3, "eth_getBlockByNumber", [reorgHeightHex, false])) and the two expectations (ethBlockAfterReorg.hash !== ethHashForkA and ethBlockAfterReorg.parentHash === anchorEthHash) in a short polling retry loop that refetches the block until both conditions are true or a timeout is reached; use exponential or fixed-interval backoff (e.g., 100–500ms) and a reasonable global timeout (e.g., 5–20s) to avoid flakiness while failing the test if mapping-sync never converges.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@client/mapping-sync/src/kv/canonical_reconciler.rs`:
- Around line 173-183: The range calculation mistakenly treats window_size == 0
as a full scan because window_size.saturating_sub(1) becomes u64::MAX; update
the start/end logic in the function that computes start and end (using symbols
window_size, start, end) to early-handle window_size == 0 the same way
reconcile_from_cursor_batch() does—i.e., if window_size == 0 return the no-op
result (e.g., Ok(None) or whatever the function returns for no work) before
computing saturating_sub and start, ensuring no full-range scan occurs.
In `@client/mapping-sync/src/kv/mod.rs`:
- Around line 471-490: The code pops operating_header from current_syncing_tips
then persists and returns on the Ok(None) and Err paths, which loses that
branch; modify both the Ok(None) and Err(e) handling to re-insert (requeue)
operating_header back into current_syncing_tips before calling
frontier_backend.meta().write_current_syncing_tips(...) and returning, so
fetch_header() will see this ancestor on the next tick; update the branches that
log the warning and the comment accordingly and ensure the same requeue logic is
applied for both paths.
---
Duplicate comments:
In `@client/mapping-sync/src/kv/mod.rs`:
- Around line 70-80: The code currently swallows failures from client.hash() by
mapping Err to None and then choosing NumberMappingWrite::Skip, which allows the
block to be marked synced without BLOCK_NUMBER_MAPPING; instead, change the
logic so that client.hash(*header.number()) is not flattened into an Option but
its Result is inspected: if Ok(Some(h)) compare to substrate_block_hash and set
fc_db::kv::NumberMappingWrite::Write when equal or Skip when Ok(None) (meaning
no canonical hash), but if client.hash(...) returns Err propagate that Err
upward (return Err from sync_block or map it into the enclosing Result) so the
caller does not mark the block as fully synced; update the surrounding code path
that constructs number_mapping_write (and callers like write_hashes or the
sync_block function) to handle the propagated error rather than silently using
NumberMappingWrite::Skip.
---
Nitpick comments:
In `@client/rpc/src/eth/mod.rs`:
- Around line 262-280: The fast-path log should explicitly indicate the
fast-path was taken; update the log::debug call inside the branch that checks
self.storage_override.current_block(latest_indexed_hash) to include a
fast_path_hit=true field (in addition to existing flags) so it's clear when the
short-circuit occurs; modify the log message invoked near
last_readable_latest.replace(latest_indexed_hash) (and referencing
self.latest_readable_scan_limit) to add fast_path_hit=true.
In `@ts-tests/tests/test-reorg-block-consistency.ts`:
- Around line 75-93: The test can race with the mapping-sync worker: after
waitForBlock(expectedHead) the node's canonical tip has changed but mapping-sync
may not have updated lower-height Ethereum hashes yet, so wrap the post-reorg
check that fetches ethBlockAfterReorg (via customRequest(context.web3,
"eth_getBlockByNumber", [reorgHeightHex, false])) and the two expectations
(ethBlockAfterReorg.hash !== ethHashForkA and ethBlockAfterReorg.parentHash ===
anchorEthHash) in a short polling retry loop that refetches the block until both
conditions are true or a timeout is reached; use exponential or fixed-interval
backoff (e.g., 100–500ms) and a reasonable global timeout (e.g., 5–20s) to avoid
flakiness while failing the test if mapping-sync never converges.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: cc316938-04cc-45a1-b2b9-cda6fbaeea60
📒 Files selected for processing (12)
client/db/Cargo.tomlclient/db/src/kv/mod.rsclient/mapping-sync/Cargo.tomlclient/mapping-sync/src/kv/canonical_reconciler.rsclient/mapping-sync/src/kv/mod.rsclient/mapping-sync/src/kv/worker.rsclient/mapping-sync/src/sql/mod.rsclient/rpc/src/eth/mod.rstemplate/node/src/eth.rstemplate/node/src/service.rsts-tests/tests/test-pruning-skip.tsts-tests/tests/test-reorg-block-consistency.ts
There was a problem hiding this comment.
♻️ Duplicate comments (1)
client/mapping-sync/src/kv/mod.rs (1)
70-80:⚠️ Potential issue | 🟠 MajorPropagate
client.hash()failures instead of collapsing them intoSkip.Line 75 still treats backend failures the same as
Ok(None). That letssync_block()write a synced mapping withoutBLOCK_NUMBER_MAPPING;fetch_header()will then stop revisiting that block until a later reconcile pass repairs it. OnlyOk(None)should fall back toSkip.Suggested fix
- let canonical_hash_at_number = client.hash(*header.number()).ok().flatten(); + let canonical_hash_at_number = client + .hash(*header.number()) + .map_err(|e| format!("failed to resolve canonical hash at #{block_number}: {e:?}"))?; let number_mapping_write = if canonical_hash_at_number == Some(substrate_block_hash) { fc_db::kv::NumberMappingWrite::Write } else { fc_db::kv::NumberMappingWrite::Skip };You can verify the partial-write path with this read-only check. Expected result:
write_hashes()marks blocks synced, andfetch_header()skips synced blocks, so swallowinghash()errors can leave a hole behind.#!/bin/bash set -euo pipefail sed -n '70,80p' client/mapping-sync/src/kv/mod.rs echo sed -n '617,628p' client/mapping-sync/src/kv/mod.rs echo rg -n --type rust 'fn write_hashes|is_synced\(|write_none\(' client/db/src/kv/mod.rs client/mapping-sync/src/kv/mod.rs
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 5679aeaa-0c7f-48f4-890f-595c6ef569e0
📒 Files selected for processing (2)
client/mapping-sync/src/kv/canonical_reconciler.rsclient/mapping-sync/src/kv/mod.rs
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@client/mapping-sync/src/kv/mod.rs`:
- Around line 255-263: The code in backfill_number_mappings() silently discards
backend errors by using .ok().flatten() on client.hash(...) and
substrate_backend.header(...); change these to propagate backend errors instead
of treating them as missing data: replace the .ok().flatten() pattern for
client.hash(block_number_native) and substrate_backend.header(canonical_hash)
with error-propagating calls (use the Result path so backend errors return Err
from backfill_number_mappings rather than continuing), preserving the existing
behavior for genuine None (missing block) cases and keeping the existing checks
that use canonical_hash, frontier_backend.mapping().is_synced, and header.
- Around line 434-449: The retain currently drops tips when
substrate_backend.blockchain().header(*tip) returns Err or None; change the
predicate so that lookup failures preserve the tip (i.e., return true on
Err/None) to avoid discarding queued/unsynced forks — update the closure used by
current_syncing_tips.retain to treat unwrap_or(false) as unwrap_or(true) (or
otherwise return true when header lookup fails) so only proven out-of-window
tips are removed while transient lookup errors keep the tip.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 1790b62b-3599-41ef-beda-58ac797443d0
📒 Files selected for processing (1)
client/mapping-sync/src/kv/mod.rs
* Improve "latest" block resolution on pruned nodes * Use config.state_pruning as state_pruning_blocks in node template * Coderabbit: better logs + use finalized block * Retain in-window tips when skipping pruned blocks * Skip target lookup failure explicitly * Backfill BLOCK_NUMBER_MAPPING for the full live window * Add max_blocks to backfill_number_mappings() * Add pruning_skip_retains_in_window_tips() test * Use find_post_log() + test-pruning-skip.ts * Make latest_block_hash() read-only + rpc fast path * Validate eth hash from digest before writing BLOCK_MAPPING * Repair missing TRANSACTION_MAPPING when state becomes readable * Add integration test for block and tx consistency after a reorg * Check all txns when deciding tx mapping repair * Restore a background cursor-based sweep * Remove unnecessary comment/report * coderabbit review * coderabbit: propagate client.hash() failures * coderabbit: no more ok().flatten() * Remove unnecesasry reconcile #0..#0
* Improve "latest" block resolution on pruned nodes * Use config.state_pruning as state_pruning_blocks in node template * Coderabbit: better logs + use finalized block * Retain in-window tips when skipping pruned blocks * Skip target lookup failure explicitly * Backfill BLOCK_NUMBER_MAPPING for the full live window * Add max_blocks to backfill_number_mappings() * Add pruning_skip_retains_in_window_tips() test * Use find_post_log() + test-pruning-skip.ts * Make latest_block_hash() read-only + rpc fast path * Validate eth hash from digest before writing BLOCK_MAPPING * Repair missing TRANSACTION_MAPPING when state becomes readable * Add integration test for block and tx consistency after a reorg * Check all txns when deciding tx mapping repair * Restore a background cursor-based sweep * Remove unnecessary comment/report * coderabbit review * coderabbit: propagate client.hash() failures * coderabbit: no more ok().flatten() * Remove unnecesasry reconcile #0..#0
Problem
eth_getBlockByNumber("latest")returns a permanently stuck block number on nodes running with--pruning N, whileeth_blockNumbercorrectly tracks the chain tip.Two root causes:
NumberMappingWrite::Skipwas hardcoded insync_block(), soBLOCK_NUMBER_MAPPING(thenumber → eth_hashindex thatlatest_block_hash()depends on) was never written during catch-up.PostLog::BlockHashgetsNonefrom state and marks blocks as synced with no mapping, making no forward progress.eth_getBlockByNumber(N)for explicit numbers was never affected —BLOCK_MAPPING(keyed by ETH hash) was always written correctly.Changes
sync_block(): replace hardcodedNumberMappingWrite::Skipwith a canonical check viaHeaderBackend::hash()— no state access, pruning-safe. Adds requiredclient: &Cparameter (breaking change for callers).sync_one_block(): when the current tip is behind the live state window, jump forward tobest - pruning_windowinstead of stalling.PostLog::BlockHash/Nonearm: addlog::warn!when state is unavailable instead of silently writing an empty mapping.Upgrade behaviour
Stuck nodes recover automatically on binary upgrade — the skip fires once, the worker catches up the remaining blocks in the live window, and
"latest"starts advancing within minutes. No DB wipe or resync required.