Avoid eth_blockNumber returning 0x00 when mapping-sync lags#1832
Avoid eth_blockNumber returning 0x00 when mapping-sync lags#1832
eth_blockNumber returning 0x00 when mapping-sync lags#1832Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughPrefer persisted indexed pointer when recovering latest indexed canonical block; replace min-block-hint recovery with a bounded backward scan ( Changes
Sequence Diagram(s)(Skipped) Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs). Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
client/db/src/kv/mod.rs (1)
134-171:⚠️ Potential issue | 🟠 MajorGap: stale persisted pointer + large indexing lag still falls back to genesis.
When the persisted pointer is itself stale (e.g. reorged) and the indexed frontier is far behind
best_number, neither the single-block persisted check nor the walk-back frombest_numberwill find an indexed canonical block. Example:
persisted_number = 5000(stale due to reorg at that height)- Blocks 1–4999 are correctly indexed
best_number = 50000Flow: (1) fast path at best → miss, (2) persisted check at 5000 → stale → miss, (3) bounded scan
[41808, 49999]→ nothing indexed → miss, (4) rare scan[17232, 49999]→ nothing indexed → miss, (5) returns genesis.Walking back from
persisted_number - 1when the persisted check fails would cover this scenario:Proposed fix
if let Ok(Some(persisted_number)) = self.mapping.latest_canonical_indexed_block_number() { if persisted_number <= best_number { if let Some(canonical_hash) = self.indexed_canonical_hash_at(persisted_number)? { return Ok(canonical_hash); } + // Persisted pointer is stale (e.g. reorg); walk back from it. + if persisted_number > 0 { + if let Some((recovered_number, recovered_hash)) = self + .find_latest_indexed_canonical_block( + persisted_number.saturating_sub(1), + INDEXED_RECOVERY_SCAN_LIMIT, + )? + { + self.mapping + .set_latest_canonical_indexed_block(recovered_number)?; + return Ok(recovered_hash); + } + } } }
🧹 Nitpick comments (2)
client/db/src/kv/mod.rs (2)
134-143: Silently swallowing errors fromlatest_canonical_indexed_block_number.The
if let Ok(Some(..))on line 137 silently ignores read errors from the meta DB. A transient I/O error here would be masked, and the function would proceed to the expensive walk-back instead of surfacing the failure. Consider logging atdebugorwarnlevel when the read fails, to aid operator diagnostics alongside the existing rare-path warning.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@client/db/src/kv/mod.rs` around lines 134 - 143, The code currently uses an `if let Ok(Some(persisted_number)) = self.mapping.latest_canonical_indexed_block_number()` which hides any DB read error; change this to match on the Result from `self.mapping.latest_canonical_indexed_block_number()` and explicitly handle the `Err(e)` branch by logging the error (use `debug` or `warn`) with context (e.g. include `persisted_number` lookup and `best_number`) so operators can diagnose transient I/O issues, then continue the existing fallback behavior; keep the existing successful path that calls `self.indexed_canonical_hash_at(persisted_number)?` intact.
156-171: Nit: rare-path scan overlaps with the bounded scan.The exhaustive fallback re-scans the entire range already covered by the first pass (8192 blocks). Since this is documented as rare and already logs a warning, the overhead is acceptable. If you want to eliminate the redundancy later, you could start the second scan at
best_number.saturating_sub(INDEXED_RECOVERY_SCAN_LIMIT + 1)with the remaining budget.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@client/db/src/kv/mod.rs` around lines 156 - 171, The exhaustive fallback currently rescans the same range; change the call to find_latest_indexed_canonical_block to start from best_number.saturating_sub(INDEXED_RECOVERY_SCAN_LIMIT + 1) and reduce the scan budget to the remaining window (use INDEXED_RECOVERY_SCAN_LIMIT * 3 instead of *4) so it covers only the blocks not already checked; update the call site in the rare-path fallback (the block that logs "rare-path exhaustive fallback triggered") and keep the subsequent mapping.set_latest_canonical_indexed_block(recovered_number)? and return Ok(recovered_hash) logic unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@client/db/src/kv/mod.rs`:
- Around line 134-143: The code currently uses an `if let
Ok(Some(persisted_number)) =
self.mapping.latest_canonical_indexed_block_number()` which hides any DB read
error; change this to match on the Result from
`self.mapping.latest_canonical_indexed_block_number()` and explicitly handle the
`Err(e)` branch by logging the error (use `debug` or `warn`) with context (e.g.
include `persisted_number` lookup and `best_number`) so operators can diagnose
transient I/O issues, then continue the existing fallback behavior; keep the
existing successful path that calls
`self.indexed_canonical_hash_at(persisted_number)?` intact.
- Around line 156-171: The exhaustive fallback currently rescans the same range;
change the call to find_latest_indexed_canonical_block to start from
best_number.saturating_sub(INDEXED_RECOVERY_SCAN_LIMIT + 1) and reduce the scan
budget to the remaining window (use INDEXED_RECOVERY_SCAN_LIMIT * 3 instead of
*4) so it covers only the blocks not already checked; update the call site in
the rare-path fallback (the block that logs "rare-path exhaustive fallback
triggered") and keep the subsequent
mapping.set_latest_canonical_indexed_block(recovered_number)? and return
Ok(recovered_hash) logic unchanged.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
client/db/src/kv/mod.rs (2)
168-175: Consider throttling this rare-path warning to prevent log flooding.If
latest_block_hashis polled heavily while indexing remains behind, this can emit a warning on every call. A rate-limited or state-transition warning would keep signal quality higher.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@client/db/src/kv/mod.rs` around lines 168 - 175, The warning in latest_block_hash (in client/db/src/kv/mod.rs) should be rate-limited to avoid flooding: add a static/shared last-warning timestamp (e.g., via once_cell::sync::Lazy + std::sync::atomic::AtomicU64 or a Mutex-protected Instant) keyed to the rare-path log and only call log::warn! if now - last_warn >= a chosen interval (e.g., 30s–5m); update that timestamp when you emit the warning and keep the existing log text and target so operators still get the message but not on every poll.
299-307: Defensively handlescan_limit == 0to match the probe-count contract.Line 305 turns
0into an effective single probe. An explicit zero guard would make semantics unambiguous.Proposed hardening
fn find_latest_indexed_canonical_block( &self, start_block: u64, scan_limit: u64, ) -> Result<Option<(u64, Block::Hash)>, String> { + if scan_limit == 0 { + return Ok(None); + } let scan_limit = scan_limit.saturating_sub(1); let min_block = start_block.saturating_sub(scan_limit);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@client/db/src/kv/mod.rs` around lines 299 - 307, The loop in find_latest_indexed_canonical_block currently treats scan_limit == 0 as a single probe due to the saturating_sub(1); add an explicit guard at the start of find_latest_indexed_canonical_block that handles scan_limit == 0 (e.g., return Ok(None) or otherwise respect “0 probes” semantics) so the contract is unambiguous; update references to scan_limit/min_block/start_block as needed and keep the rest of the reverse range logic unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@client/db/src/kv/mod.rs`:
- Around line 137-155: The current code silently ignores Err from
self.mapping.latest_canonical_indexed_block_number(), which can hide DB/read
failures; change the handling in the block around
latest_canonical_indexed_block_number() so that any Err is propagated (return
Err) instead of being dropped—i.e. replace the if let Ok(Some(...)) pattern with
a match or use the ? operator to propagate errors from
latest_canonical_indexed_block_number(); keep the existing logic that checks
indexed_canonical_hash_at(), calls find_latest_indexed_canonical_block(), and
updates set_latest_canonical_indexed_block() when recovery succeeds.
---
Nitpick comments:
In `@client/db/src/kv/mod.rs`:
- Around line 168-175: The warning in latest_block_hash (in
client/db/src/kv/mod.rs) should be rate-limited to avoid flooding: add a
static/shared last-warning timestamp (e.g., via once_cell::sync::Lazy +
std::sync::atomic::AtomicU64 or a Mutex-protected Instant) keyed to the
rare-path log and only call log::warn! if now - last_warn >= a chosen interval
(e.g., 30s–5m); update that timestamp when you emit the warning and keep the
existing log text and target so operators still get the message but not on every
poll.
- Around line 299-307: The loop in find_latest_indexed_canonical_block currently
treats scan_limit == 0 as a single probe due to the saturating_sub(1); add an
explicit guard at the start of find_latest_indexed_canonical_block that handles
scan_limit == 0 (e.g., return Ok(None) or otherwise respect “0 probes”
semantics) so the contract is unambiguous; update references to
scan_limit/min_block/start_block as needed and keep the rest of the reverse
range logic unchanged.
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
client/db/src/kv/mod.rs (1)
300-314:⚠️ Potential issue | 🟡 Minor
scan_limit = 0silently scans 1 block instead of 0.
scan_limit.saturating_sub(1)whenscan_limit = 0yields0, making the range[start_block, start_block]— a single probe. The parameter name implies that0means "scan nothing." This is not exploitable with the current callers (8192and32768), but the semantics are surprising and could cause subtle bugs if the function is called with a computed limit.♻️ Proposed guard (if zero-limit semantics should mean "no scan")
fn find_latest_indexed_canonical_block( &self, start_block: u64, scan_limit: u64, ) -> Result<Option<(u64, Block::Hash)>, String> { + if scan_limit == 0 { + return Ok(None); + } let scan_limit = scan_limit.saturating_sub(1);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@client/db/src/kv/mod.rs` around lines 300 - 314, The function find_latest_indexed_canonical_block treats scan_limit=0 as scanning one block due to scan_limit.saturating_sub(1); add an explicit guard at the top: if scan_limit == 0 { return Ok(None); } so zero truly means "scan nothing", and then keep the existing saturating_sub(1) logic for positive limits; refer to find_latest_indexed_canonical_block, the scan_limit variable, and min_block calculation when making this change.
♻️ Duplicate comments (1)
client/db/src/kv/mod.rs (1)
134-155: Past review concern about error propagation is addressed.Line 137 now uses
?to propagate errors fromlatest_canonical_indexed_block_number(), resolving the previously flagged silentErrdrop.
Redundant
indexed_canonical_hash_atprobe whenpersisted_number == best_number.When
persisted_number == best_number, line 139 re-issuesindexed_canonical_hash_at(persisted_number)even though line 128 already established thatindexed_canonical_hash_at(best_number)returnedNone. This incurs an extra (unnecessary) DB round-trip on every call where the persisted pointer is at the chain tip.♻️ Proposed fix
- if let Some(persisted_number) = self.mapping.latest_canonical_indexed_block_number()? { - if persisted_number <= best_number { - if let Some(canonical_hash) = self.indexed_canonical_hash_at(persisted_number)? { - return Ok(canonical_hash); - } + if let Some(persisted_number) = self.mapping.latest_canonical_indexed_block_number()? { + if persisted_number < best_number { + // best_number was already checked in the fast path above; only probe + // if the persisted pointer is strictly below the chain tip. + if let Some(canonical_hash) = self.indexed_canonical_hash_at(persisted_number)? { + return Ok(canonical_hash); + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@client/db/src/kv/mod.rs` around lines 134 - 155, The persisted pointer probe duplicates a previous failed lookup when persisted_number == best_number, causing an unnecessary DB round-trip; change the logic around calling indexed_canonical_hash_at so you only call indexed_canonical_hash_at(persisted_number) when persisted_number < best_number (i.e., when it could differ from the earlier indexed_canonical_hash_at(best_number) probe), and otherwise skip straight to the "stale pointer / walk back" path (using find_latest_indexed_canonical_block and set_latest_canonical_indexed_block) for persisted_number == best_number; update the conditional around mapping.latest_canonical_indexed_block_number(), persisted_number, and the indexed_canonical_hash_at(persisted_number) call to reflect this to avoid the redundant DB call.
🧹 Nitpick comments (3)
client/db/src/kv/mod.rs (3)
168-183:log::warn!fires before determining whether the exhaustive fallback succeeds.The warning is emitted unconditionally at line 171 regardless of whether the fallback at line 176 actually finds a block. An operator who sees the warning cannot tell from it whether service degraded (returned genesis) or recovered successfully — a noisy alert that warrants investigation every time even when the node is fine.
♻️ Proposed restructuring
- // Rare path: both persisted pointer and bounded scan missed. Try exhaustive - // fallback with a larger scan limit. Log when this triggers so operators can - // observe indexing lag or other anomalies. - log::warn!( - target: "frontier-db", - "latest_block_hash: rare-path exhaustive fallback triggered (best_number={}, persisted and bounded scan missed)", - best_number, - ); - if let Some((recovered_number, recovered_hash)) = self.find_latest_indexed_canonical_block( - best_number.saturating_sub(1), - INDEXED_RECOVERY_SCAN_LIMIT * 4, - )? { - self.mapping - .set_latest_canonical_indexed_block(recovered_number)?; - return Ok(recovered_hash); - } + // Rare path: both persisted pointer and bounded scan missed. Try exhaustive + // fallback with a larger scan limit and log so operators can observe lag. + log::warn!( + target: "frontier-db", + "latest_block_hash: rare-path exhaustive fallback triggered (best_number={}); \ + indexing may be lagging significantly", + best_number, + ); + if let Some((recovered_number, recovered_hash)) = self.find_latest_indexed_canonical_block( + best_number.saturating_sub(1), + INDEXED_RECOVERY_SCAN_LIMIT * 4, + )? { + log::info!( + target: "frontier-db", + "latest_block_hash: exhaustive fallback recovered at block {}", + recovered_number, + ); + self.mapping + .set_latest_canonical_indexed_block(recovered_number)?; + return Ok(recovered_hash); + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@client/db/src/kv/mod.rs` around lines 168 - 183, The warning is logged unconditionally before the exhaustive fallback result is known; change the flow in latest_block_hash so you call find_latest_indexed_canonical_block(best_number.saturating_sub(1), INDEXED_RECOVERY_SCAN_LIMIT * 4)? first, then emit a log::warn! only if the fallback fails (and you return genesis or similar), and emit an info/debug or a different warn indicating recovery when the fallback succeeds; after a successful find, call mapping.set_latest_canonical_indexed_block(recovered_number)? and return recovered_hash as before. Ensure you update the log placement around log::warn! and references to find_latest_indexed_canonical_block, set_latest_canonical_indexed_block, and latest_block_hash.
176-183: Exhaustive fallback redundantly re-scans the firstINDEXED_RECOVERY_SCAN_LIMITblocks.Both the bounded scan (line 159) and the rare-path fallback (line 176) start from
best_number - 1. The fallback re-checks the same 8192 already-proven-empty blocks before reaching new territory. Since this path is rare, the runtime cost is low, but it's cleaner to start from where the bounded scan left off.♻️ Proposed fix – avoid redundant overlap
if let Some((recovered_number, recovered_hash)) = self.find_latest_indexed_canonical_block( - best_number.saturating_sub(1), - INDEXED_RECOVERY_SCAN_LIMIT * 4, + best_number.saturating_sub(INDEXED_RECOVERY_SCAN_LIMIT + 1), + INDEXED_RECOVERY_SCAN_LIMIT * 3, )? {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@client/db/src/kv/mod.rs` around lines 176 - 183, The fallback call to find_latest_indexed_canonical_block is redundantly rescanning blocks already checked by the initial bounded scan; change the start argument from best_number.saturating_sub(1) to best_number.saturating_sub(1 + INDEXED_RECOVERY_SCAN_LIMIT) (use saturating_sub to avoid underflow) so the rare-path fallback picks up immediately after the first INDEXED_RECOVERY_SCAN_LIMIT range; keep the rest of the logic (the call to mapping.set_latest_canonical_indexed_block and returning recovered_hash) unchanged.
622-726: Test suite is missing coverage for the rare-path exhaustive fallback and thepersisted_number > best_numberbranch.The single test validates the main reorg-recovery scenario well, but two important paths introduced by this PR remain untested:
- Rare path (lines 168–183): Neither the bounded scan failure nor the subsequent
INDEXED_RECOVERY_SCAN_LIMIT * 4scan is exercised. A test could index a block at, say, blockN - INDEXED_RECOVERY_SCAN_LIMIT - 1with no persisted pointer and confirm the fallback finds it.- Persisted pointer ahead of best block (
persisted_number > best_number): The guard at line 138 silently skips the pointer in this case; a regression test would confirm the code falls through to the bounded scan correctly rather than returning a future block.- Pointer update after recovery: the test asserts the returned hash is correct but does not assert that
latest_canonical_indexed_block_number()was subsequently updated to3.Would you like me to draft the additional test cases covering these scenarios?
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@client/db/src/kv/mod.rs` around lines 622 - 726, Add two tests to cover the untested branches: (1) an exhaustive-fallback test that leaves no persisted pointer (or sets it to a value older than INDEXED_RECOVERY_SCAN_LIMIT such that the bounded scan fails) and then writes a correct mapping at block N - INDEXED_RECOVERY_SCAN_LIMIT - 1 so latest_block_hash() must exercise the long exhaustive scan path; (2) a persisted-pointer-ahead test that sets mapping().set_latest_canonical_indexed_block(...) to a value > client.chain_info().best_number to ensure latest_block_hash() ignores that future pointer and falls back to the bounded scan. In both tests assert the returned substrate hash equals the expected indexed block and also assert mapping().latest_canonical_indexed_block_number() was updated to that block (verify pointer update). Use the existing helpers and methods from the file: latest_block_hash(), mapping().set_latest_canonical_indexed_block(), mapping().latest_canonical_indexed_block_number(), and the INDEXED_RECOVERY_SCAN_LIMIT constant to construct the scenarios.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@client/db/src/kv/mod.rs`:
- Around line 300-314: The function find_latest_indexed_canonical_block treats
scan_limit=0 as scanning one block due to scan_limit.saturating_sub(1); add an
explicit guard at the top: if scan_limit == 0 { return Ok(None); } so zero truly
means "scan nothing", and then keep the existing saturating_sub(1) logic for
positive limits; refer to find_latest_indexed_canonical_block, the scan_limit
variable, and min_block calculation when making this change.
---
Duplicate comments:
In `@client/db/src/kv/mod.rs`:
- Around line 134-155: The persisted pointer probe duplicates a previous failed
lookup when persisted_number == best_number, causing an unnecessary DB
round-trip; change the logic around calling indexed_canonical_hash_at so you
only call indexed_canonical_hash_at(persisted_number) when persisted_number <
best_number (i.e., when it could differ from the earlier
indexed_canonical_hash_at(best_number) probe), and otherwise skip straight to
the "stale pointer / walk back" path (using find_latest_indexed_canonical_block
and set_latest_canonical_indexed_block) for persisted_number == best_number;
update the conditional around mapping.latest_canonical_indexed_block_number(),
persisted_number, and the indexed_canonical_hash_at(persisted_number) call to
reflect this to avoid the redundant DB call.
---
Nitpick comments:
In `@client/db/src/kv/mod.rs`:
- Around line 168-183: The warning is logged unconditionally before the
exhaustive fallback result is known; change the flow in latest_block_hash so you
call find_latest_indexed_canonical_block(best_number.saturating_sub(1),
INDEXED_RECOVERY_SCAN_LIMIT * 4)? first, then emit a log::warn! only if the
fallback fails (and you return genesis or similar), and emit an info/debug or a
different warn indicating recovery when the fallback succeeds; after a
successful find, call
mapping.set_latest_canonical_indexed_block(recovered_number)? and return
recovered_hash as before. Ensure you update the log placement around log::warn!
and references to find_latest_indexed_canonical_block,
set_latest_canonical_indexed_block, and latest_block_hash.
- Around line 176-183: The fallback call to find_latest_indexed_canonical_block
is redundantly rescanning blocks already checked by the initial bounded scan;
change the start argument from best_number.saturating_sub(1) to
best_number.saturating_sub(1 + INDEXED_RECOVERY_SCAN_LIMIT) (use saturating_sub
to avoid underflow) so the rare-path fallback picks up immediately after the
first INDEXED_RECOVERY_SCAN_LIMIT range; keep the rest of the logic (the call to
mapping.set_latest_canonical_indexed_block and returning recovered_hash)
unchanged.
- Around line 622-726: Add two tests to cover the untested branches: (1) an
exhaustive-fallback test that leaves no persisted pointer (or sets it to a value
older than INDEXED_RECOVERY_SCAN_LIMIT such that the bounded scan fails) and
then writes a correct mapping at block N - INDEXED_RECOVERY_SCAN_LIMIT - 1 so
latest_block_hash() must exercise the long exhaustive scan path; (2) a
persisted-pointer-ahead test that sets
mapping().set_latest_canonical_indexed_block(...) to a value >
client.chain_info().best_number to ensure latest_block_hash() ignores that
future pointer and falls back to the bounded scan. In both tests assert the
returned substrate hash equals the expected indexed block and also assert
mapping().latest_canonical_indexed_block_number() was updated to that block
(verify pointer update). Use the existing helpers and methods from the file:
latest_block_hash(), mapping().set_latest_canonical_indexed_block(),
mapping().latest_canonical_indexed_block_number(), and the
INDEXED_RECOVERY_SCAN_LIMIT constant to construct the scenarios.
librelois
left a comment
There was a problem hiding this comment.
latest_block_hash() now always emits warn! and runs an additional 32k-block scan whenever both persisted pointer and 8k scan miss. That condition is expected during startup/initial indexing (or after pointer loss), so this can become frequent rather than rare under RPC polling. Result: repetitive warning logs plus extra storage lookups per call until indexing catches up.
Please gate the warn! so it only logs on a throttled cadence (or first occurrence) and/or only after confirming there is at least one indexed block somewhere.
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@client/db/src/kv/mod.rs`:
- Around line 197-202: The log::warn! invocation in mod.rs (the
"latest_block_hash: exhaustive fallback triggered..." message) uses space-based
indentation on the wrapped string lines which violates EditorConfig; update the
indentation of the second and third physical lines of that log macro so they use
the project's expected indentation style (match surrounding lines—replace the
leading spaces with tabs or the established indent used in this file) and ensure
the multi-line string continuation aligns with the opening log::warn!
parentheses to satisfy editorconfig-checker.
- Around line 337-339: The loop currently does one probe when scan_limit == 0
because of the saturating_sub(1) and range (min_block..=start_block); add an
explicit check for scan_limit == 0 before computing min_block and entering the
for loop (e.g., return/continue an empty result or skip probing) so that zero
probes are performed. Specifically, before creating scan_limit =
scan_limit.saturating_sub(1) and computing min_block/start_block, handle
scan_limit == 0 (using the surrounding function's return/flow) and only run the
existing min_block and for block_number in (min_block..=start_block).rev() logic
when scan_limit > 0.
There was a problem hiding this comment.
🧹 Nitpick comments (1)
client/db/src/kv/mod.rs (1)
205-213: Consider starting exhaustive scan past the already-scanned range.The exhaustive fallback re-scans the first
INDEXED_RECOVERY_SCAN_LIMITblocks that were already checked by the bounded scan at lines 167-170. Since this is a rare path, the overhead may be acceptable, but you could avoid redundant lookups by starting lower:if let Some((recovered_number, recovered_hash)) = self .find_latest_indexed_canonical_block( - best_number.saturating_sub(1), + best_number.saturating_sub(INDEXED_RECOVERY_SCAN_LIMIT + 1), INDEXED_RECOVERY_SCAN_LIMIT * 4, )? {This would scan the next ~32k blocks below the bounded scan's range instead of overlapping.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@client/db/src/kv/mod.rs` around lines 205 - 213, The exhaustive fallback currently calls find_latest_indexed_canonical_block(start = best_number.saturating_sub(1), limit = INDEXED_RECOVERY_SCAN_LIMIT * 4) which re-scans blocks already covered by the earlier bounded scan; change the start to skip the already-scanned window (e.g. use best_number.saturating_sub(INDEXED_RECOVERY_SCAN_LIMIT)) so the exhaustive scan begins just below the bounded-scan range while keeping the same limit, and then continue to call self.mapping.set_latest_canonical_indexed_block(...) and return the recovered_hash as before.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@client/db/src/kv/mod.rs`:
- Around line 205-213: The exhaustive fallback currently calls
find_latest_indexed_canonical_block(start = best_number.saturating_sub(1), limit
= INDEXED_RECOVERY_SCAN_LIMIT * 4) which re-scans blocks already covered by the
earlier bounded scan; change the start to skip the already-scanned window (e.g.
use best_number.saturating_sub(INDEXED_RECOVERY_SCAN_LIMIT)) so the exhaustive
scan begins just below the bounded-scan range while keeping the same limit, and
then continue to call self.mapping.set_latest_canonical_indexed_block(...) and
return the recovered_hash as before.
librelois
left a comment
There was a problem hiding this comment.
latest_block_hash can now return a stale block even when newer canonical indexed blocks exist.
Evidence: mod.rs (lines 145-149) now returns immediately if latest_canonical_indexed_block_number is canonical, without checking whether a higher indexed canonical block exists between persisted_number+1 and best_number-1.
librelois
left a comment
There was a problem hiding this comment.
Exhaustive recovery is skipped when the persisted pointer is absent, even if indexed history exists, so latest_block_hash() can still fall back to genesis in a recoverable state.
Evidence: has_indexed_history is defined as latest_canonical_indexed_block_number().is_some() and gates the wide scan, so None pointer means no exhaustive scan at all (mod.rs:183, mod.rs:188).
Risk: If LATEST_CANONICAL_INDEXED_BLOCK is missing/stale after restart/corruption while older indexed mappings still exist beyond the 8k window, RPC can still return genesis.
|
Thanks @librelois, lmk if the workflow is good now. |
librelois
left a comment
There was a problem hiding this comment.
latest_block_hash() can still return genesis on non-genesis chains when indexed data exists but sits outside both scan windows and pointer fallback is unavailable/invalid.
Evidence: fallback is unconditional at client/db/src/kv/mod.rs:220, and this behavior is explicitly codified in tests client/db/src/kv/mod.rs:886 and client/db/src/kv/mod.rs:901.
Suggestion:
Replace final fallback logic with a non-genesis recovery path when best_number > 0 (for example, deeper chunked backward scan and/or durable “last known non-genesis indexed canonical” pointer).
librelois
left a comment
There was a problem hiding this comment.
pointer fallback is ordered before deep recovery, so stale-but-valid pointers can mask newer indexed blocks in the deep range.
Location: client/db/src/kv/mod.rs (line 202), client/db/src/kv/mod.rs (line 223)
Suggestions:
- Move deep recovery before persisted-pointer return, or only accept pointer after proving no newer hit exists in deep range.
- Add a regression test for “stale canonical pointer + newer deep-window indexed block” (with test limit 8, e.g. best 40, pointer 1, indexed 3).
librelois
left a comment
There was a problem hiding this comment.
tokio is added as a dev-dependency without explicit features in client/db/Cargo.toml:48, while tests use #[tokio::test] in client/db/src/kv/mod.rs (for example around line 768).
Workspace tokio has default-features = false in Cargo.toml:92, so non-default feature builds can become brittle depending on feature unification.
Recommendation: make test-required features explicit in client/db dev-dependencies (at least macros plus a runtime feature).
|
@arturgontijo can you fix integration tests? |
|
Locally it passes 🤔 I'm investigating what's going on... |
…lkadot-evm#1832) * Add latest_block_hash_scans_past_stale_reorg_window test * Add rare-path exhaustive fallback * Use persisted_number-1 as start when it is stale * Coderabbit review * Throtled exhaustive fallback log and scan * lint + scan_limit == 0 * editorconfig + better starting point to continue scanning * Bounded scan before persisted-pointer * 4 layers before genesis fallback + better tests * Add Layer 4: deep recovery * Deep recovery before persisted pointer * Remove redundant tokio from dev deps
…-evm#1832) Cherry-pick of upstream polkadot-evm#1832
Problem
eth_blockNumbercan return0x00(genesis) on busy chains becauselatest_block_hash()falls back to genesis when:1 - The chain head is not indexed yet.
2 - The persisted
LATEST_CANONICAL_INDEXED_BLOCKcheck fails.3 - The bounded 8k-block scan does not find an indexed block.
If
mapping-synclags by more than~8kblocks, the scan never reaches indexed blocks and the RPC returns genesis.Solution
Use persisted pointer before walk-back
When the fast path fails, try the stored
latest_canonical_indexed_block_number. If it is still indexed and canonical, use it instead of running the walk-back. This avoids returning genesis when the chain head has moved beyond the scan window.Rare-path fallback
If both the persisted pointer and the bounded scan fail, run a wider scan
(INDEXED_RECOVERY_SCAN_LIMIT * 4). Alog::warn!is emitted when this path runs so operators can monitor indexing lag or other anomalies.Refactor
find_latest_indexed_canonical_blocknow takes a scan_limit parameter so both normal and rare paths use a shared implementation.