Fix eth_getBlockByNumber("latest") returning null intermittently#1816
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughConverted Changes
Sequence Diagram(s)sequenceDiagram
participant C as Client
participant S as EthApiServer
participant I as EthApiImpl
participant B as Backend
participant DB as MappingDB
C->>S: eth_author / eth_blockNumber
S->>I: call async author()/block_number() and await
I->>B: await latest_block_hash()
B-->>I: latest indexed block hash
I->>DB: lookup eth->substrate hash / number
DB-->>I: substrate hash / block number
I->>I: canonical verification (if number > 0)
I-->>S: RpcResult(author / number)
S-->>C: JSON-RPC response
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches🧪 Generate unit tests (beta)
Comment |
Ensures eth_getFilterChanges, eth_getFilterLogs, and eth_getLogs use the
latest indexed block number instead of Substrate's best_number when
resolving "latest" block references.
This prevents inconsistencies where these filter RPCs could reference
blocks not yet visible via eth_getBlockByNumber("latest").
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
client/rpc/src/eth/filter.rs (1)
285-305:⚠️ Potential issue | 🟠 MajorAvoid advancing
last_pollpast the indexed range.With
current_numbercapped tolatest_indexed_number, keepinglast_poll = best_number + 1can skip logs when the indexer lags (blocks between latest indexed and best are never revisited).🐛 Suggested fix
- // Update filter `last_poll`. - locked.insert( - key, - FilterPoolItem { - last_poll: BlockNumberOrHash::Num(best_number + 1), - filter_type: pool_item.filter_type.clone(), - at_block: pool_item.at_block, - pending_transaction_hashes: HashSet::new(), - }, - ); - // Either the filter-specific `to` block or latest indexed block. // Use latest indexed block to ensure consistency with other RPCs. let mut current_number = filter .to_block .and_then(|v| v.to_min_block_num()) .map(|s| s.unique_saturated_into()) .unwrap_or(latest_indexed_number); if current_number > latest_indexed_number { current_number = latest_indexed_number; } + + // Update filter `last_poll` based on the capped range. + let next_poll = + UniqueSaturatedInto::<u64>::unique_saturated_into(current_number) + .saturating_add(1); + locked.insert( + key, + FilterPoolItem { + last_poll: BlockNumberOrHash::Num(next_poll), + filter_type: pool_item.filter_type.clone(), + at_block: pool_item.at_block, + pending_transaction_hashes: HashSet::new(), + }, + );
There was a problem hiding this comment.
Findings
-
High:
latestcan regress to genesis on KV backend instead of returning the most recent indexed head
The newlatestpath now depends onbackend.latest_block_hash()inclient/rpc/src/eth/mod.rs:156,client/rpc/src/eth/client.rs:75, andclient/rpc/src/eth/client.rs:99.
In the KV backend,client/db/src/kv/mod.rs:102returns genesis wheneverbest_numberis not yet indexed (:134-135). During the same lag window that currently producesnull, this change can return block0(andeth_blockNumber = 0) transiently.
So this likely removesnull, but may replace it with a worse semantic error on KV deployments. -
Medium: new error surface for
eth_blockNumber/eth_coinbasewhen no indexed canonical block exists yet
eth_blockNumberandeth_coinbaseare now async and calllatest_block_hash()directly inclient/rpc/src/eth/client.rs:71andclient/rpc/src/eth/client.rs:95.
If backend returns error (for example SQL path when no canonical indexed row yet), these now return RPC internal errors rather than previous best-chain values.
Open Questions / Assumptions
- Are affected operators on SQL backend or KV backend?
If SQL: this fix is much more likely to address the reportednull latestbehavior correctly.
If KV: I would not merge as-is without changing KVlatest_block_hash()semantics.
Merge Risk Summary
- For SQL-backed nodes, the
eth_getBlockByNumber("latest")race is likely fixed by using indexed latest hash instead ofclient.info().best_number. - For KV-backed nodes, this is not safe yet because
latestcan transiently jump to genesis under indexing lag.
Recommended Next Actions
- Adjust KV
client/db/src/kv/mod.rs:102to return the highest indexed canonical block<= best_number(not genesis fallback on tip miss). - Add regression tests for
eth_getBlockByNumber("latest")andeth_blockNumberunder simulated mapping lag (both SQL and KV). - Re-test operator scenario after that; expected behavior should be “latest indexed block”, never
null, never sudden block0unless truly at genesis.
@librelois addressed in e1177f8 |
The SQL backend is not affected by this issue because its latest_block_hash() implementation queries the database directly: SELECT substrate_block_hash FROM blocks WHERE is_canon = 1 ORDER BY block_number DESC LIMIT 1 |
librelois
left a comment
There was a problem hiding this comment.
Suggested improvements:
- Add fallback/backfill when the meta key is absent (derive from currently indexed canonical best instead of hard-falling to genesis).
- Add an integration test for upgrade/restart behavior: existing synced DB + no new imports yet +
eth_getBlockByNumber("latest"). - on stale cached number (reorg case), walk back to prior indexed canonical block instead of returning genesis.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@ts-tests/tests/test-filter-api.ts`:
- Around line 149-172: The test step "should not skip log filter changes when
best head is ahead of indexed head" calls waitForBlock(..., 10000) which may
wait up to 10s, so add an explicit Mocha timeout at the start of the step's
async function to avoid Mocha's default 2s timeout; inside the step(...)
callback (the async function where waitForBlock is called) call
this.timeout(15000) (or similar >=10000) to ensure the test has enough time to
complete while keeping the function signature (async function () { ... }) so
this is available.
🧹 Nitpick comments (2)
client/rpc/src/eth/filter.rs (1)
230-244: Consider aligning Block filter with Log filter for consistency.The
FilterType::Blockbranch still usesbest_numberfor the upper bound (line 232), while theFilterType::Logbranch useslatest_indexed_number. This creates potential inconsistency:
- Block filters may iterate over unindexed blocks and silently skip them (lines 356-359 return no hash when
blockisNone)- Log filters explicitly cap at the latest indexed block
This may be intentional since block filters gracefully handle missing blocks, but documenting this distinction would help maintainability. If consistent behavior is preferred, consider capping the Block branch similarly.
client/db/src/kv/mod.rs (1)
495-504: Consider adding write lock for consistency with other write methods.
set_latest_canonical_indexed_blockdoesn't acquirewrite_lock, unlike other write methods (e.g.,write_hashesat line 408). While atomic single-key writes are safe at the database level, and last-write-wins is acceptable for this metadata, using the lock would maintain consistency with the codebase patterns.The current implementation is functionally correct—concurrent calls would just race to set valid values—but adding the lock would align with the defensive coding style used elsewhere.
♻️ Optional: Add write lock for consistency
pub fn set_latest_canonical_indexed_block(&self, block_number: u64) -> Result<(), String> { + let _lock = self.write_lock.lock(); let mut transaction = sp_database::Transaction::new(); transaction.set( columns::META, static_keys::LATEST_CANONICAL_INDEXED_BLOCK, &block_number.encode(), ); self.db.commit(transaction).map_err(|e| e.to_string()) }
…kadot-evm#1816) * fix: 🐛 do not rely on substrate state for latest block info * fix: 🐛 use latest indexed block for eth_blockNumber and eth_coinbase * fix: 🐛 use latest indexed block for eth_getLogs and filter RPCs Ensures eth_getFilterChanges, eth_getFilterLogs, and eth_getLogs use the latest indexed block number instead of Substrate's best_number when resolving "latest" block references. This prevents inconsistencies where these filter RPCs could reference blocks not yet visible via eth_getBlockByNumber("latest"). * fix: 🐛 track latest canonical indexed block in mapping-sync * fix: 🐛 walk back to find indexed canonical block instead of returning genesis * fix: 🐛 walk back on stale cached block during reorg instead of returning genesis * test: ✅ add integration tests * backfill and persist latest indexed canonical block on miss/stale meta * rustfmt * prevent filter log skips under indexing lag & harden lat pointer recover * explicit Mocha timeout * add more tests * prettier --------- Co-authored-by: Éloïs <c@elo.tf>
…kadot-evm#1816) * fix: 🐛 do not rely on substrate state for latest block info * fix: 🐛 use latest indexed block for eth_blockNumber and eth_coinbase * fix: 🐛 use latest indexed block for eth_getLogs and filter RPCs Ensures eth_getFilterChanges, eth_getFilterLogs, and eth_getLogs use the latest indexed block number instead of Substrate's best_number when resolving "latest" block references. This prevents inconsistencies where these filter RPCs could reference blocks not yet visible via eth_getBlockByNumber("latest"). * fix: 🐛 track latest canonical indexed block in mapping-sync * fix: 🐛 walk back to find indexed canonical block instead of returning genesis * fix: 🐛 walk back on stale cached block during reorg instead of returning genesis * test: ✅ add integration tests * backfill and persist latest indexed canonical block on miss/stale meta * rustfmt * prevent filter log skips under indexing lag & harden lat pointer recover * explicit Mocha timeout * add more tests * prettier --------- Co-authored-by: Éloïs <c@elo.tf>
Goal of the changes
Fix intermittent
null/ inconsistentlatestEthereum RPC responses by makinglatest-based RPCs read from the latest indexed canonical block instead of the client best head, especially while mapping-sync lags or after reorgs.What reviewers need to know
client/db/src/kv/mod.rsLATEST_CANONICAL_INDEXED_BLOCK.latest_block_hash()now:MAX_WALKBACK_DEPTH = 16) to recover from stale pointers/reorgs,latestqueries.client/mapping-sync/src/kv/mod.rsLATEST_CANONICAL_INDEXED_BLOCKwhen syncing a new best canonical block.client/rpc/src/eth/mod.rs,client/rpc/src/eth/client.rs,client/rpc/src/eth/filter.rs,client/rpc-core/src/eth.rseth_getBlockByNumber("latest"),eth_blockNumber,eth_coinbase, and log/filter paths now usebackend.latest_block_hash()/ latest indexed number for consistent behavior.eth_blockNumberandeth_coinbasemethods become async in RPC core/impl to support indexed-head lookup.last_pollfrom the same indexed-capped head it queries, preventing skipped logs when best head is ahead of indexed head.Risks / compatibility notes
latestsemantics now track indexed canonical visibility (consistent across RPCs) rather than raw client best head.16blocks). Very deep stale windows still degrade gracefully (eventual catch-up behavior covered by tests).Testing
ts-tests/tests/test-latest-block-consistency.tseth_blockNumber/eth_getBlockByNumber("latest")consistency,eth_coinbaseavailability,ts-tests/tests/test-filter-api.tsReferences
eth_getTransactionReceiptrace condition #1802Summary by CodeRabbit
Bug Fixes
New Features
Tests