Skip to content

Improve "latest" block resolution on pruned nodes#1856

Merged
librelois merged 20 commits intomasterfrom
artur/stalled-latest
Mar 16, 2026
Merged

Improve "latest" block resolution on pruned nodes#1856
librelois merged 20 commits intomasterfrom
artur/stalled-latest

Conversation

@arturgontijo
Copy link
Copy Markdown
Collaborator

@arturgontijo arturgontijo commented Mar 7, 2026

Problem

eth_getBlockByNumber("latest") returns a permanently stuck block number on nodes running with --pruning N, while eth_blockNumber correctly tracks the chain tip.

Two root causes:

  • NumberMappingWrite::Skip was hardcoded in sync_block(), so BLOCK_NUMBER_MAPPING (the number → eth_hash index that latest_block_hash() depends on) was never written during catch-up.
  • The mapping worker stalls on pruned blocks — when the tip falls behind the pruning window, PostLog::BlockHash gets None from state and marks blocks as synced with no mapping, making no forward progress.

eth_getBlockByNumber(N) for explicit numbers was never affected — BLOCK_MAPPING (keyed by ETH hash) was always written correctly.

Changes

  • sync_block(): replace hardcoded NumberMappingWrite::Skip with a canonical check via HeaderBackend::hash() — no state access, pruning-safe. Adds required client: &C parameter (breaking change for callers).
  • sync_one_block(): when the current tip is behind the live state window, jump forward to best - pruning_window instead of stalling.
  • PostLog::BlockHash / None arm: add log::warn! when state is unavailable instead of silently writing an empty mapping.

Upgrade behaviour

Stuck nodes recover automatically on binary upgrade — the skip fires once, the worker catches up the remaining blocks in the live window, and "latest" starts advancing within minutes. No DB wipe or resync required.

@arturgontijo arturgontijo requested a review from librelois March 7, 2026 23:47
@arturgontijo arturgontijo self-assigned this Mar 7, 2026
@arturgontijo arturgontijo requested a review from sorpaas as a code owner March 7, 2026 23:47
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 7, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 41440d47-1d57-40dc-ba60-1e6e761216b8

📥 Commits

Reviewing files that changed from the base of the PR and between 53a88f5 and ecf10e1.

📒 Files selected for processing (1)
  • client/mapping-sync/src/kv/mod.rs

📝 Walkthrough

Walkthrough

Adds state-pruning awareness and bounded backfill/reconciliation to KV mapping-sync: sync APIs accept a client reference and optional state_pruning_blocks; worker and node wiring propagate it; reconciler gains digest-derived fallback and recent-window repair; pruning/backfill logic and many tests added.

Changes

Cohort / File(s) Summary
KV core sync
client/mapping-sync/src/kv/mod.rs
Add client: &C to sync_block; add state_pruning_blocks: Option<u64> to sync_one_block/sync_blocks; implement pruning-aware skip/resolve, tip retention on skips, bounded backfill_number_mappings, periodic reconcile_recent_window, reorg-aware reconciliation, minimal mapping writes when state unavailable; add constants (BACKFILL_ON_SKIP_MAX_BLOCKS, PERIODIC_RECONCILE_WINDOW, CURSOR_REPAIR_IDLE_BATCH) and SaturatedConversion import.
Canonical reconciler
client/mapping-sync/src/kv/canonical_reconciler.rs
Add eth_hash_from_digest helper and new public reconcile_recent_window; extend reconciliation to use digest-derived hashes when state is unavailable and add conditional tx-mapping repair paths.
Worker wiring
client/mapping-sync/src/kv/worker.rs
MappingSyncWorker gains state_pruning_blocks: Option<u64>; constructor updated and value forwarded to sync_blocks; switch to recent-window reconcile + cursor repair calls; tests updated to pass None.
Node/task wiring
template/node/src/eth.rs, template/node/src/service.rs
spawn_frontier_tasks signature extended with state_pruning_blocks: Option<u64>; service derives optional pruning bound from PruningMode::Constrained and passes it into task spawners / mapping sync worker.
DB pointer semantics
client/db/src/kv/mod.rs
Make latest-pointer reads read-only: remove writes in recovery paths, ignore found_number in backtracking, and add tests ensuring persisted pointer never lowers.
KV SQL & RPC tweaks
client/mapping-sync/src/sql/mod.rs, client/rpc/src/eth/mod.rs
SQL test now polls up to 5s for indexer rows; latest_indexed_hash_with_block adds a fast-path for readable latest block and updates cache on hit.
Integration tests
ts-tests/tests/test-pruning-skip.ts, ts-tests/tests/test-reorg-block-consistency.ts
Add pruning-skip integration test and expanded reorg/block-consistency tests covering block/tx mapping correctness across forks.
Manifests / deps
client/mapping-sync/Cargo.toml, client/db/Cargo.toml
Move ethereum-types to mapping-sync dependencies; adjust tokio features in client/db and add tokio dev-dependency.

Sequence Diagram(s)

sequenceDiagram
    participant Node as Node/service
    participant Worker as MappingSyncWorker
    participant Client as HeaderBackend (client)
    participant KV as KV datastore
    participant Reconciler as CanonicalReconciler

    Node->>Worker: spawn_frontier_tasks(state_pruning_blocks)
    Worker->>Worker: start sync_blocks(..., state_pruning_blocks)
    loop per block/tip
        Worker->>Client: resolve skip target / fetch header/hash
        alt skip-target resolvable (in live window)
            Worker->>Worker: adjust current_syncing_tips, persist tips
            Worker->>Client: fetch header (may be pruned)
            alt state available
                Worker->>KV: write number→hash mapping (full)
            else state unavailable
                Worker->>KV: write minimal MappingCommitment (no txs)
                Worker->>Reconciler: reconcile_recent_window(window)
                Reconciler->>KV: repair/advance mappings (including digest fallback)
            end
            Worker->>Worker: backfill_number_mappings (bounded)
        else skip-target missing/error
            Worker->>Worker: persist tips, log warning, backoff/return
        end
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • sorpaas
  • librelois
  • manuelmauro
🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main objective of the PR: improving block resolution on pruned nodes by fixing the stalled 'latest' block issue.
Description check ✅ Passed The description comprehensively explains the problem, root causes, changes made, and upgrade behavior—all directly related to the changeset across multiple files.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch artur/stalled-latest
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@client/mapping-sync/src/kv/mod.rs`:
- Around line 299-304: The code computes the pruning window using best_number
(client.info().best_number) which is wrong for Constrained pruning; change the
basis to the finalized block height by replacing use of
client.info().best_number when computing best_number_u64 (and consequently
live_window_start_u64 and skip_to_u64) with client.info().finalized_number (or
finalized_number.unique_saturated_into()), so live_window_start_u64 =
finalized_number_u64.saturating_sub(state_pruning_blocks) and skip_to_u64 uses
that finalized-based window; keep the rest of the logic (sync_from_u64,
skip_to_u64, current_number_u64 from operating_header) identical.
- Around line 62-67: The code currently swallows errors from
HeaderBackend::hash() via .ok().flatten() (creating canonical_hash_at_number)
and a similar pattern at the other call, treating hash failures as "no canonical
block" and causing sync_block()/pruning logic to commit partial/unsafe state;
instead propagate the Err to force a retry: change the places using
client.hash(...) that assign canonical_hash_at_number and the second check at
the other call to return Err(...) (or map to the function's error type) when
client.hash() yields Err, and only treat Ok(None) as the non-canonical case;
update the handling around NumberMappingWrite and sync_block() so they only
proceed on successful Ok(Some(hash)) comparisons and propagate errors from
client.hash() up the call stack rather than converting them to Skip/None.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 586654c4-323e-4f6e-81b1-e3fe0bcb595c

📥 Commits

Reviewing files that changed from the base of the PR and between 0eeb426 and 490a555.

📒 Files selected for processing (5)
  • client/mapping-sync/src/kv/mod.rs
  • client/mapping-sync/src/kv/worker.rs
  • client/mapping-sync/src/sql/mod.rs
  • template/node/src/eth.rs
  • template/node/src/service.rs

Copy link
Copy Markdown
Member

@librelois librelois left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

client/mapping-sync/src/kv/mod.rs (line 248) and client/mapping-sync/src/kv/mod.rs (line 325) conflict. sync_one_block maintains a queue of pending current_syncing_tips, but the new pruning skip path replaces that queue with only skip_hash. If the node restarts while multiple leaves exist, any other queued leaf is discarded even if it is still inside the live window and should be indexed. That creates a correctness hole around reorg/fork catch-up.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
client/mapping-sync/src/kv/mod.rs (1)

309-336: ⚠️ Potential issue | 🟠 Major

Don’t silently fall through when skip-target lookups fail.

Line 312 still treats client.hash(skip_to_number) failure as “no skip”, so the function drops into sync_block() for a block already known to be outside the live window. Lines 325-335 also turn header() errors into false, which permanently drops queued tips from current_syncing_tips. Both cases can reintroduce stuck or partially indexed branches on a transient backend failure.

🛠️ Proposed fix
 			if current_number_u64 < skip_to_u64 {
 				let skip_to_number =
 					skip_to_u64.saturated_into::<<Block::Header as HeaderT>::Number>();
-				if let Ok(Some(skip_hash)) = client.hash(skip_to_number) {
-					log::warn!(
-						target: "mapping-sync",
-						"Pruned node: skipping blocks #{}..#{} (outside live state window), \
-						jumping tip to #{}",
-						current_number_u64,
-						skip_to_u64.saturating_sub(1),
-						skip_to_u64,
-					);
-					// Retain any tips still within the live window rather than
-					// discarding them — they may be unsynced fork branches that
-					// need indexing. Replace only the out-of-window tip with
-					// the skip target.
-					current_syncing_tips.retain(|tip| {
-						substrate_backend
-							.blockchain()
-							.header(*tip)
-							.ok()
-							.flatten()
-							.map(|h| {
-								let n: u64 = (*h.number()).unique_saturated_into();
-								n >= skip_to_u64
-							})
-							.unwrap_or(false)
-					});
-					current_syncing_tips.push(skip_hash);
-					frontier_backend
-						.meta()
-						.write_current_syncing_tips(current_syncing_tips)?;
-					return Ok(true);
-				}
+				let skip_hash = client
+					.hash(skip_to_number)
+					.map_err(|e| format!("failed to resolve skip target #{skip_to_u64}: {e:?}"))?
+					.ok_or_else(|| format!("canonical hash for skip target #{skip_to_u64} not found"))?;
+				log::warn!(
+					target: "mapping-sync",
+					"Pruned node: skipping blocks #{}..#{} (outside live state window), \
+					jumping tip to #{}",
+					current_number_u64,
+					skip_to_u64.saturating_sub(1),
+					skip_to_u64,
+				);
+				let mut retained_tips = Vec::with_capacity(current_syncing_tips.len() + 1);
+				for tip in current_syncing_tips {
+					match substrate_backend
+						.blockchain()
+						.header(tip)
+						.map_err(|e| format!("failed to load syncing tip {tip:?}: {e:?}"))?
+					{
+						Some(h) => {
+							let n: u64 = (*h.number()).unique_saturated_into();
+							if n >= skip_to_u64 {
+								retained_tips.push(tip);
+							}
+						}
+						None => {}
+					}
+				}
+				retained_tips.push(skip_hash);
+				frontier_backend
+					.meta()
+					.write_current_syncing_tips(retained_tips)?;
+				return Ok(true);
 			}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@client/mapping-sync/src/kv/mod.rs` around lines 309 - 336, The code currently
treats Err from client.hash(skip_to_number) as Ok(None) and treats header()
errors as false, which can drop tips or proceed to sync_block() wrongly; update
the client.hash(skip_to_number) handling in the block using skip_to_number so
that if client.hash(...) returns Err you log the error and return/propagate that
error (or otherwise bail out of the current sync attempt) instead of falling
through to sync_block(), and change the retain closure around
current_syncing_tips (the substrate_backend.blockchain().header(...) chain that
ends with .unwrap_or(false)) to conservatively keep tips on errors (use
.unwrap_or(true) or otherwise treat Err as “keep this tip”) so transient backend
failures don’t permanently drop queued tips.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@client/mapping-sync/src/kv/mod.rs`:
- Around line 309-336: The code currently treats Err from
client.hash(skip_to_number) as Ok(None) and treats header() errors as false,
which can drop tips or proceed to sync_block() wrongly; update the
client.hash(skip_to_number) handling in the block using skip_to_number so that
if client.hash(...) returns Err you log the error and return/propagate that
error (or otherwise bail out of the current sync attempt) instead of falling
through to sync_block(), and change the retain closure around
current_syncing_tips (the substrate_backend.blockchain().header(...) chain that
ends with .unwrap_or(false)) to conservatively keep tips on errors (use
.unwrap_or(true) or otherwise treat Err as “keep this tip”) so transient backend
failures don’t permanently drop queued tips.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 68a48923-fb28-459c-bacc-0ada47d07789

📥 Commits

Reviewing files that changed from the base of the PR and between 70cafb4 and 1d82875.

📒 Files selected for processing (1)
  • client/mapping-sync/src/kv/mod.rs

Copy link
Copy Markdown
Member

@librelois librelois left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sync_block now treats inability to resolve the canonical hash at a block number as a hard sync error.

Location: frontier/client/mapping-sync/src/kv/mod.rs (line 50).
Why it matters: this call path now does client.hash(*header.number())? and returns an error if it fails. That aborts the block sync attempt (Err(...)) instead of continuing with a write_none-style best-effort behavior used for pruned/partial state.
Risk: on nodes where hash-at-height lookup can fail intermittently (or by design in edge conditions), the worker can repeatedly retry and stall catch-up progress for that tip.

Also, Pruned-range skip logic has a silent fallback when client.hash(skip_to_number) returns None/error

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
client/mapping-sync/src/kv/mod.rs (1)

59-69: ⚠️ Potential issue | 🟠 Major

Don't turn client.hash() failures into Skip.

Err and Ok(None) both fall through to NumberMappingWrite::Skip, which can still commit a block as synced without a number mapping after a transient backend failure. That recreates the same partial-state outcome this PR is trying to eliminate; Skip should be reserved for a definite non-canonical hash.

#!/bin/bash
# Inspect the canonical-hash decision in sync_block and compare it with the
# explicit Err/Ok(None) handling in the pruning skip path.
sed -n '57,69p' client/mapping-sync/src/kv/mod.rs
echo
sed -n '309,364p' client/mapping-sync/src/kv/mod.rs
echo
# Show all current client.hash() handling in mapping-sync for consistency review.
rg -n -C2 'client\.hash\(' client/mapping-sync/src -g '*.rs'
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@client/mapping-sync/src/kv/mod.rs` around lines 59 - 69, The current logic
collapses client.hash(...) errors and Ok(None) into NumberMappingWrite::Skip;
instead, treat an Err from client.hash as a real backend failure and
propagate/return an error (do not set Skip), while only mapping Ok(None) to Skip
(since that represents a non-canonical/missing hash). Update the code around the
canonical_hash_at_number computation in sync_block (the client.hash call and
number_mapping_write assignment) to match: match client.hash(*header.number()) {
Ok(Some(h)) => compare to substrate_block_hash to decide Write vs Skip, Ok(None)
=> NumberMappingWrite::Skip, Err(e) => return Err(e) or convert to an
appropriate Sync error and abort the sync so we don't commit a block without a
number mapping. Ensure the change references client.hash,
canonical_hash_at_number, substrate_block_hash, and NumberMappingWrite.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@client/mapping-sync/src/kv/mod.rs`:
- Around line 59-69: The current logic collapses client.hash(...) errors and
Ok(None) into NumberMappingWrite::Skip; instead, treat an Err from client.hash
as a real backend failure and propagate/return an error (do not set Skip), while
only mapping Ok(None) to Skip (since that represents a non-canonical/missing
hash). Update the code around the canonical_hash_at_number computation in
sync_block (the client.hash call and number_mapping_write assignment) to match:
match client.hash(*header.number()) { Ok(Some(h)) => compare to
substrate_block_hash to decide Write vs Skip, Ok(None) =>
NumberMappingWrite::Skip, Err(e) => return Err(e) or convert to an appropriate
Sync error and abort the sync so we don't commit a block without a number
mapping. Ensure the change references client.hash, canonical_hash_at_number,
substrate_block_hash, and NumberMappingWrite.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: b58692db-8b2b-42c0-88f7-6f464d4ce87c

📥 Commits

Reviewing files that changed from the base of the PR and between 1d82875 and 38b7abd.

📒 Files selected for processing (1)
  • client/mapping-sync/src/kv/mod.rs

Copy link
Copy Markdown
Member

@librelois librelois left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possible unbounded sync stall during catch-up on heavily pruned nodes
In client/mapping-sync/src/kv/mod.rs:399-411, backfill_number_mappings(...) is invoked with from = skip_to_u64 and to = best_number_u64, and it scans the full inclusive range in one loop.
If best_number_u64 - skip_to_u64 is large, this can do many DB/header lookups in a single work

Copy link
Copy Markdown
Member

@librelois librelois left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add targeted tests for the new state_pruning_blocks flow in KV sync:
stubbed HeaderBackend::hash/header/is_synced behavior to cover Some, None, and error branches.

Add an integration-style assertion for skipped tips retention (ensuring in-window fork tips are preserved) after pruning catch-up.

Consider a comment or constant naming audit for state_pruning_blocks semantics (prune window derived from finalized number vs best number) to avoid future misconfiguration assumptions.

Copy link
Copy Markdown
Member

@librelois librelois left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Extend the backfill path so BlockHash-only canonical blocks become fully discoverable by KV lookups, not just present in BLOCK_NUMBER_MAPPING.
  • Change the tip-retention logic to keep unknown/erroring tips and retry them later instead of dropping them.
  • Add a targeted test for the pruned PostLog::BlockHash case

Copy link
Copy Markdown
Member

@librelois librelois left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Medium] Periodic worker no longer advances full-history canonical repair cursor

The KV worker now runs reconcile_recent_window(..., PERIODIC_RECONCILE_WINDOW = 16) when idle instead of the old cursor-driven reconcile_from_cursor_batch(..., CANONICAL_NUMBER_REPAIR_BATCH_SIZE). As a result, stale or missing BLOCK_NUMBER_MAPPING entries outside the most recent 16 blocks may remain unrepaired indefinitely unless another path happens to revisit them.

Suggested fix

Restore a background cursor-based sweep in addition to the recent-window reconcile.

Options:

  • Run reconcile_recent_window(...) for fast head repair, then also run repair_canonical_number_mappings_batch(...) with a bounded batch size each idle tick.
  • Or alternate between recent-window and cursor-based batch repair on successive idle ticks.

Recommended approach:

  • Keep the recent-window reconcile for head correctness.
  • Reintroduce repair_canonical_number_mappings_batch(...) with a capped batch size so old ranges are eventually repaired without reintroducing large per-tick stalls.

[Low] Reconciler tx repair only checks the first transaction

In the state-available reconciler path, transaction repair is triggered by checking metadata only for ethereum_block.transactions[0]. If tx0 is present but a later transaction mapping is missing, the reconciler will skip repair and leave TRANSACTION_MAPPING incomplete for that block.

Suggested fix

Change the repair predicate to validate all transactions in the block, not just the first one.

Options:

  • Treat the block as needing repair if any transaction hash is missing canonical metadata for the current substrate block.
  • Or compare the number of mapped transactions for this canonical block against ethereum_block.transactions.len() before deciding to skip repair.

Recommended approach:

  • Use any(...) across all transaction hashes and rewrite the full transaction mapping whenever any expected canonical entry is missing.
  • Keep the existing write_hashes(...) full rewrite behavior once repair is triggered.

@librelois
Copy link
Copy Markdown
Member

librelois commented Mar 12, 2026

@coderabbitai full review

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 12, 2026

✅ Actions performed

Full review triggered.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
client/mapping-sync/src/kv/mod.rs (1)

70-80: ⚠️ Potential issue | 🟠 Major

Propagate canonical-hash lookup failures instead of silently falling back to Skip.

Line 75 converts client.hash() failures into None, and the Skip branch still writes the block as synced without BLOCK_NUMBER_MAPPING. That recreates the partial-sync state this PR is trying to eliminate.

Suggested fix
-	let canonical_hash_at_number = client.hash(*header.number()).ok().flatten();
+	let canonical_hash_at_number = client
+		.hash(*header.number())
+		.map_err(|e| format!("failed to resolve canonical hash at #{block_number}: {e:?}"))?;
 	let number_mapping_write = if canonical_hash_at_number == Some(substrate_block_hash) {
 		fc_db::kv::NumberMappingWrite::Write
 	} else {
 		fc_db::kv::NumberMappingWrite::Skip
 	};

Run this to confirm that Skip still omits the number mapping while write_hashes persists the rest:

#!/bin/bash
set -euo pipefail

echo "sync_block canonical lookup:"
sed -n '67,82p' client/mapping-sync/src/kv/mod.rs

echo
echo "NumberMappingWrite / write_hashes behavior:"
rg -n "NumberMappingWrite|fn write_hashes|set_block_hash_by_number|SYNCED_MAPPING|BLOCK_NUMBER_MAPPING" client/db/src/kv/mod.rs -C 3
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@client/mapping-sync/src/kv/mod.rs` around lines 70 - 80, The code currently
swallows failures from client.hash() by mapping Err to None and then choosing
NumberMappingWrite::Skip, which allows the block to be marked synced without
BLOCK_NUMBER_MAPPING; instead, change the logic so that
client.hash(*header.number()) is not flattened into an Option but its Result is
inspected: if Ok(Some(h)) compare to substrate_block_hash and set
fc_db::kv::NumberMappingWrite::Write when equal or Skip when Ok(None) (meaning
no canonical hash), but if client.hash(...) returns Err propagate that Err
upward (return Err from sync_block or map it into the enclosing Result) so the
caller does not mark the block as fully synced; update the surrounding code path
that constructs number_mapping_write (and callers like write_hashes or the
sync_block function) to handle the propagated error rather than silently using
NumberMappingWrite::Skip.
🧹 Nitpick comments (2)
client/rpc/src/eth/mod.rs (1)

262-280: LGTM - Fast-path optimization is sound.

The fast-path correctly short-circuits when the latest indexed block is already readable, avoiding unnecessary cache lookups and backward scans. This aligns with the PR objective of improving "latest" block resolution on pruned nodes.

Optional: Improve log clarity.

The log message sets all flags (cache_hit, bounded_hit, exhaustive_hit, full_miss) to false, which doesn't clearly indicate that the fast-path was taken. Consider adding a fast_path_hit=true indicator for easier debugging and monitoring.

,

🔧 Suggested log improvement
 			log::debug!(
 				target: "rpc",
-				"latest readable selection cache_hit=false bounded_hit=false exhaustive_hit=false full_miss=false bounded_scanned_hops=0 exhaustive_scanned_hops=0 limit={}",
+				"latest readable selection fast_path_hit=true cache_hit=false bounded_hit=false exhaustive_hit=false full_miss=false bounded_scanned_hops=0 exhaustive_scanned_hops=0 limit={}",
 				self.latest_readable_scan_limit,
 			);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@client/rpc/src/eth/mod.rs` around lines 262 - 280, The fast-path log should
explicitly indicate the fast-path was taken; update the log::debug call inside
the branch that checks self.storage_override.current_block(latest_indexed_hash)
to include a fast_path_hit=true field (in addition to existing flags) so it's
clear when the short-circuit occurs; modify the log message invoked near
last_readable_latest.replace(latest_indexed_hash) (and referencing
self.latest_readable_scan_limit) to add fast_path_hit=true.
ts-tests/tests/test-reorg-block-consistency.ts (1)

75-93: Potential race condition between chain tip and mapping-sync.

After waitForBlock returns for expectedHead (Fork B's tip), the canonical chain has switched. However, the mapping-sync worker may not have updated the Ethereum block hash mappings at lower heights yet. The assertions on lines 84 and 90 query reorgHeightHex (a lower height), and could intermittently see stale Fork A data if mapping-sync lags.

Consider adding a polling retry loop around the post-reorg assertions to improve test stability:

🔧 Suggested retry pattern for robustness
 		// Wait for fork B to become canonical (longer chain).
 		const expectedHead = "0x" + (tipNumber + 4).toString(16);
 		await waitForBlock(context.web3, expectedHead, 20_000);

-		// After the reorg, the Ethereum block at the reorg height must differ.
-		const ethBlockAfterReorg = (await customRequest(context.web3, "eth_getBlockByNumber", [reorgHeightHex, false]))
-			.result;
-
-		expect(ethBlockAfterReorg).to.not.be.null;
-		expect(ethBlockAfterReorg.hash).to.not.equal(
-			ethHashForkA,
-			"block hash at the reorg height must change to fork B's block"
-		);
+		// Poll until the mapping-sync reflects Fork B at the reorg height.
+		const pollTimeout = 5_000;
+		const pollStart = Date.now();
+		let ethBlockAfterReorg: any = null;
+		while (Date.now() - pollStart < pollTimeout) {
+			ethBlockAfterReorg = (await customRequest(context.web3, "eth_getBlockByNumber", [reorgHeightHex, false]))
+				.result;
+			if (ethBlockAfterReorg && ethBlockAfterReorg.hash !== ethHashForkA) {
+				break;
+			}
+			await new Promise<void>((r) => setTimeout(r, 50));
+		}
+		expect(ethBlockAfterReorg).to.not.be.null;
+		expect(ethBlockAfterReorg.hash).to.not.equal(
+			ethHashForkA,
+			"block hash at the reorg height must change to fork B's block"
+		);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ts-tests/tests/test-reorg-block-consistency.ts` around lines 75 - 93, The
test can race with the mapping-sync worker: after waitForBlock(expectedHead) the
node's canonical tip has changed but mapping-sync may not have updated
lower-height Ethereum hashes yet, so wrap the post-reorg check that fetches
ethBlockAfterReorg (via customRequest(context.web3, "eth_getBlockByNumber",
[reorgHeightHex, false])) and the two expectations (ethBlockAfterReorg.hash !==
ethHashForkA and ethBlockAfterReorg.parentHash === anchorEthHash) in a short
polling retry loop that refetches the block until both conditions are true or a
timeout is reached; use exponential or fixed-interval backoff (e.g., 100–500ms)
and a reasonable global timeout (e.g., 5–20s) to avoid flakiness while failing
the test if mapping-sync never converges.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@client/mapping-sync/src/kv/canonical_reconciler.rs`:
- Around line 173-183: The range calculation mistakenly treats window_size == 0
as a full scan because window_size.saturating_sub(1) becomes u64::MAX; update
the start/end logic in the function that computes start and end (using symbols
window_size, start, end) to early-handle window_size == 0 the same way
reconcile_from_cursor_batch() does—i.e., if window_size == 0 return the no-op
result (e.g., Ok(None) or whatever the function returns for no work) before
computing saturating_sub and start, ensuring no full-range scan occurs.

In `@client/mapping-sync/src/kv/mod.rs`:
- Around line 471-490: The code pops operating_header from current_syncing_tips
then persists and returns on the Ok(None) and Err paths, which loses that
branch; modify both the Ok(None) and Err(e) handling to re-insert (requeue)
operating_header back into current_syncing_tips before calling
frontier_backend.meta().write_current_syncing_tips(...) and returning, so
fetch_header() will see this ancestor on the next tick; update the branches that
log the warning and the comment accordingly and ensure the same requeue logic is
applied for both paths.

---

Duplicate comments:
In `@client/mapping-sync/src/kv/mod.rs`:
- Around line 70-80: The code currently swallows failures from client.hash() by
mapping Err to None and then choosing NumberMappingWrite::Skip, which allows the
block to be marked synced without BLOCK_NUMBER_MAPPING; instead, change the
logic so that client.hash(*header.number()) is not flattened into an Option but
its Result is inspected: if Ok(Some(h)) compare to substrate_block_hash and set
fc_db::kv::NumberMappingWrite::Write when equal or Skip when Ok(None) (meaning
no canonical hash), but if client.hash(...) returns Err propagate that Err
upward (return Err from sync_block or map it into the enclosing Result) so the
caller does not mark the block as fully synced; update the surrounding code path
that constructs number_mapping_write (and callers like write_hashes or the
sync_block function) to handle the propagated error rather than silently using
NumberMappingWrite::Skip.

---

Nitpick comments:
In `@client/rpc/src/eth/mod.rs`:
- Around line 262-280: The fast-path log should explicitly indicate the
fast-path was taken; update the log::debug call inside the branch that checks
self.storage_override.current_block(latest_indexed_hash) to include a
fast_path_hit=true field (in addition to existing flags) so it's clear when the
short-circuit occurs; modify the log message invoked near
last_readable_latest.replace(latest_indexed_hash) (and referencing
self.latest_readable_scan_limit) to add fast_path_hit=true.

In `@ts-tests/tests/test-reorg-block-consistency.ts`:
- Around line 75-93: The test can race with the mapping-sync worker: after
waitForBlock(expectedHead) the node's canonical tip has changed but mapping-sync
may not have updated lower-height Ethereum hashes yet, so wrap the post-reorg
check that fetches ethBlockAfterReorg (via customRequest(context.web3,
"eth_getBlockByNumber", [reorgHeightHex, false])) and the two expectations
(ethBlockAfterReorg.hash !== ethHashForkA and ethBlockAfterReorg.parentHash ===
anchorEthHash) in a short polling retry loop that refetches the block until both
conditions are true or a timeout is reached; use exponential or fixed-interval
backoff (e.g., 100–500ms) and a reasonable global timeout (e.g., 5–20s) to avoid
flakiness while failing the test if mapping-sync never converges.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: cc316938-04cc-45a1-b2b9-cda6fbaeea60

📥 Commits

Reviewing files that changed from the base of the PR and between 0eeb426 and bacee58.

📒 Files selected for processing (12)
  • client/db/Cargo.toml
  • client/db/src/kv/mod.rs
  • client/mapping-sync/Cargo.toml
  • client/mapping-sync/src/kv/canonical_reconciler.rs
  • client/mapping-sync/src/kv/mod.rs
  • client/mapping-sync/src/kv/worker.rs
  • client/mapping-sync/src/sql/mod.rs
  • client/rpc/src/eth/mod.rs
  • template/node/src/eth.rs
  • template/node/src/service.rs
  • ts-tests/tests/test-pruning-skip.ts
  • ts-tests/tests/test-reorg-block-consistency.ts

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
client/mapping-sync/src/kv/mod.rs (1)

70-80: ⚠️ Potential issue | 🟠 Major

Propagate client.hash() failures instead of collapsing them into Skip.

Line 75 still treats backend failures the same as Ok(None). That lets sync_block() write a synced mapping without BLOCK_NUMBER_MAPPING; fetch_header() will then stop revisiting that block until a later reconcile pass repairs it. Only Ok(None) should fall back to Skip.

Suggested fix
-	let canonical_hash_at_number = client.hash(*header.number()).ok().flatten();
+	let canonical_hash_at_number = client
+		.hash(*header.number())
+		.map_err(|e| format!("failed to resolve canonical hash at #{block_number}: {e:?}"))?;
 	let number_mapping_write = if canonical_hash_at_number == Some(substrate_block_hash) {
 		fc_db::kv::NumberMappingWrite::Write
 	} else {
 		fc_db::kv::NumberMappingWrite::Skip
 	};

You can verify the partial-write path with this read-only check. Expected result: write_hashes() marks blocks synced, and fetch_header() skips synced blocks, so swallowing hash() errors can leave a hole behind.

#!/bin/bash
set -euo pipefail

sed -n '70,80p' client/mapping-sync/src/kv/mod.rs
echo
sed -n '617,628p' client/mapping-sync/src/kv/mod.rs
echo
rg -n --type rust 'fn write_hashes|is_synced\(|write_none\(' client/db/src/kv/mod.rs client/mapping-sync/src/kv/mod.rs

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 5679aeaa-0c7f-48f4-890f-595c6ef569e0

📥 Commits

Reviewing files that changed from the base of the PR and between bacee58 and 17bcb64.

📒 Files selected for processing (2)
  • client/mapping-sync/src/kv/canonical_reconciler.rs
  • client/mapping-sync/src/kv/mod.rs

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@client/mapping-sync/src/kv/mod.rs`:
- Around line 255-263: The code in backfill_number_mappings() silently discards
backend errors by using .ok().flatten() on client.hash(...) and
substrate_backend.header(...); change these to propagate backend errors instead
of treating them as missing data: replace the .ok().flatten() pattern for
client.hash(block_number_native) and substrate_backend.header(canonical_hash)
with error-propagating calls (use the Result path so backend errors return Err
from backfill_number_mappings rather than continuing), preserving the existing
behavior for genuine None (missing block) cases and keeping the existing checks
that use canonical_hash, frontier_backend.mapping().is_synced, and header.
- Around line 434-449: The retain currently drops tips when
substrate_backend.blockchain().header(*tip) returns Err or None; change the
predicate so that lookup failures preserve the tip (i.e., return true on
Err/None) to avoid discarding queued/unsynced forks — update the closure used by
current_syncing_tips.retain to treat unwrap_or(false) as unwrap_or(true) (or
otherwise return true when header lookup fails) so only proven out-of-window
tips are removed while transient lookup errors keep the tip.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 1790b62b-3599-41ef-beda-58ac797443d0

📥 Commits

Reviewing files that changed from the base of the PR and between 17bcb64 and 53a88f5.

📒 Files selected for processing (1)
  • client/mapping-sync/src/kv/mod.rs

@librelois librelois merged commit 46cf7a4 into master Mar 16, 2026
6 checks passed
@librelois librelois deleted the artur/stalled-latest branch March 16, 2026 12:08
arturgontijo added a commit to moonbeam-foundation/frontier that referenced this pull request Mar 16, 2026
* Improve "latest" block resolution on pruned nodes

* Use config.state_pruning as state_pruning_blocks in node template

* Coderabbit: better logs + use finalized block

* Retain in-window tips when skipping pruned blocks

* Skip target lookup failure explicitly

* Backfill BLOCK_NUMBER_MAPPING for the full live window

* Add max_blocks to backfill_number_mappings()

* Add pruning_skip_retains_in_window_tips() test

* Use find_post_log() + test-pruning-skip.ts

* Make latest_block_hash() read-only + rpc fast path

* Validate eth hash from digest before writing BLOCK_MAPPING

* Repair missing TRANSACTION_MAPPING when state becomes readable

* Add integration test for block and tx consistency after a reorg

* Check all txns when deciding tx mapping repair

* Restore a background cursor-based sweep

* Remove unnecessary comment/report

* coderabbit review

* coderabbit: propagate client.hash() failures

* coderabbit: no more ok().flatten()

* Remove unnecesasry reconcile #0..#0
manuelmauro pushed a commit to moonbeam-foundation/frontier that referenced this pull request Mar 17, 2026
* Improve "latest" block resolution on pruned nodes

* Use config.state_pruning as state_pruning_blocks in node template

* Coderabbit: better logs + use finalized block

* Retain in-window tips when skipping pruned blocks

* Skip target lookup failure explicitly

* Backfill BLOCK_NUMBER_MAPPING for the full live window

* Add max_blocks to backfill_number_mappings()

* Add pruning_skip_retains_in_window_tips() test

* Use find_post_log() + test-pruning-skip.ts

* Make latest_block_hash() read-only + rpc fast path

* Validate eth hash from digest before writing BLOCK_MAPPING

* Repair missing TRANSACTION_MAPPING when state becomes readable

* Add integration test for block and tx consistency after a reorg

* Check all txns when deciding tx mapping repair

* Restore a background cursor-based sweep

* Remove unnecessary comment/report

* coderabbit review

* coderabbit: propagate client.hash() failures

* coderabbit: no more ok().flatten()

* Remove unnecesasry reconcile #0..#0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants