Skip to content

Conversation

@Cordtus
Copy link
Contributor

@Cordtus Cordtus commented Jan 9, 2026

Summary

Simplifies pruned node handling by probing at startup instead of reactive recovery during extraction.

Changes

  • Probe earliest available block height during startup
  • Adjusts start height to match earliest available height

Details

Since there is no current gRPC method that can access the required value directly until cosmos/cosmos-sdk #25647 is merged.
In the meantime, we can fetch that same value by forcing a particular error as a silly workaround:

  1. If DB has data, resume from latest indexed + 1 (no probe)
  2. If DB is empty, attempt to fetch block height 1, which if not exists will return the earliest available height
  3. If response returns "lowest height is X" error, start indexing from height X

Test

Unit test for error message parsing in internal/utils/block_test.go.

Tested manually using a pruned Juno node - behaves as expected.

When connecting to a pruned CosmosSDK node, the extractor now automatically
detects "lowest height is X" errors and restarts extraction from the lowest
available height instead of failing.

Key changes:
- Add ErrHeightNotAvailable error type to signal pruned block detection
- Add prunedNodeSignal for thread-safe signaling without errgroup race
- Parse lowest available height from gRPC error messages
- Retry loop in extractBlocksAndTransactions auto-restarts from new height
- Unit tests for regex parsing, error type, and concurrent signaling

The implementation uses a separate signaling mechanism instead of returning
errors from goroutines to avoid errgroup's context cancellation race condition
where multiple workers could trigger "Processing cancelled by user" before
the restart logic could capture the lowest available height.
Copy link
Contributor

@fmorency fmorency left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Cordtus!

As discussed on TG, you could do it in multiple stages.

  1. Try extracting at block 1. If it works, start the full extraction routine from block 1.
  2. If it fails, detect the block from the error message (N)
  3. Try extracting at block N. If it works, start the full extraction routine from block N
  4. If it fails, detect the block from the error message again (N'). If N' == N, exit with error.
  5. Try extracting at block N'. If it works, start the full extraction routine from block N'. N = N'

Repeat 3-5. You could even add some heuristic if it fails too many time, like N' = N + 100.

E.g.

> Can I extract from block 1?
No, the lowest is block 12345
> Can I extract from block 12345?
No, the lowest is block 15555 (we hit another node, that happens)
> Can I extract from block 15555?
No, the lowest is block 23456 (we hit yet another node, that happens)
> Can I extract from block 23456?
No, the lowest is block 23666 (we hit yet another node, that happens)
> Aight, can I extract from block 23766 (heuristic N + 100)
Yes, that works, let's go!

Phase 1 is "detect what the minimum block is". It can fail even if you get the minimum block from the error message, because we're never sure which node we'll hit, and they could all be at a different height, thus repeat max M times and/or use a heuristic if we can't determine the minimum block after 3 tries.

You wouldn't need the new signal at all and should be able to re-use existing code without (much) modifications.

Phase 2 is the full extraction loop.

You could reuse the approach above once Cosmos merges cosmos/cosmos-sdk#25647 and cosmos/cosmos-sdk#25648. Instead of parsing the error message to find the minimum block, you'd poke the gRPC endpoint. Same retry logic.

WDYT?

Replace reactive recovery during extraction with a single probe at startup.
The extractor now always verifies the start height is available before
spawning workers, automatically adjusting if the node is pruned.

Changes:
- Remove prunedNodeSignal struct and retry loop from block.go
- Add GetEarliestBlockHeight() to utils/block.go
- Always probe earliest available height in setBlockRange()
- Remove block_test.go (tested removed code)

This approach is simpler (~215 fewer lines), handles all scenarios
(fresh start, resume, node re-synced higher), and avoids thread
synchronization complexity.
@Cordtus Cordtus force-pushed the feat/pruned-node-recovery branch from 5babe96 to d922277 Compare January 10, 2026 23:28
@Cordtus Cordtus requested a review from fmorency January 11, 2026 01:10
@Cordtus
Copy link
Contributor Author

Cordtus commented Jan 11, 2026

Reworked this one, and updated the description accordingly.

Copy link
Contributor

@fmorency fmorency left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Much better! A single minor comment.

nit: Add more tests but we can do that in another PR. Would love e2e tests to cover some edge-cases like the ones I described in my other review.

// It probes block 1 to check if the node is an archive node or pruned.
// For archive nodes, returns 1. For pruned nodes, parses the error message
// to extract the lowest available height.
func GetEarliestBlockHeight(gRPCClient *client.GRPCClient, maxRetries uint) (uint64, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make this function a little more robust like suggested in the first review?

I saw cases there the lowest height from error wasn't working because the query hit another node and it didn't have that height. I.e., the other node has a lowest height higher than previously reported.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I didn't consider the case of load balancers using nodes with varying heights.. is this common?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I didn't consider the case of load balancers using nodes with varying heights.. is this common?

I encountered this issue multiple times while building this project, primarily with Osmosis and the Hub. I'm not sure if it's common, but I believe it's common enough to address. I'm surprised you didn't encounter this issue during your tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I typically use my own nodes for dev/testing (mainly because of issues like this with public ones).
I will have something for this early tomorrow.

This comment was marked as outdated.

@Cordtus Cordtus force-pushed the feat/pruned-node-recovery branch from 3fa0312 to c5502c4 Compare January 16, 2026 06:12
Adds fallback to startup probe: if extraction hits a higher pruning
boundary than initially detected, adjust start height and retry.
Handles load-balanced endpoints where backend nodes vary.
@Cordtus Cordtus force-pushed the feat/pruned-node-recovery branch from c5502c4 to 2b1bea4 Compare January 16, 2026 06:14
@Cordtus Cordtus requested a review from fmorency January 16, 2026 06:29
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements automatic recovery from pruned node errors during blockchain data extraction by probing for the earliest available block at startup and adjusting extraction parameters accordingly.

Changes:

  • Added utility functions to detect earliest available block height by probing block 1 and parsing pruning error messages
  • Modified extraction startup logic to query earliest available block when starting with an empty database
  • Implemented retry loop in batch extraction to handle pruned node errors during extraction

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
internal/utils/block.go Added GetEarliestBlockHeight and ParseLowestHeightFromError functions to detect and parse pruning boundaries
internal/utils/block_test.go Added unit tests for ParseLowestHeightFromError error message parsing
internal/extractor/extractor.go Updated setBlockRange to probe earliest block for empty databases; added retry loop for pruned node recovery during batch extraction
internal/extractor/block.go Refactored error handling for context cancellation; standardized import ordering
go.mod Updated Go version to 1.25.5
README.md Updated Go version requirement to 1.25.5
.github/workflows/release.yml Updated GO_VERSION to 1.25.5
.github/workflows/ci.yml Updated GO_VERSION to 1.25.5
internal/utils/grpc.go Standardized import ordering
internal/metrics/server.go Standardized import ordering
internal/metrics/server_test.go Standardized import ordering
internal/client/client.go Standardized import ordering
cmd/yaci/postgres.go Standardized import ordering
cmd/yaci/extract.go Standardized import ordering

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@codecov
Copy link

codecov bot commented Jan 16, 2026

Codecov Report

❌ Patch coverage is 42.42424% with 38 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
internal/extractor/extractor.go 25.64% 24 Missing and 5 partials ⚠️
internal/extractor/block.go 0.00% 9 Missing ⚠️

📢 Thoughts on this report? Let us know!

@fmorency fmorency merged commit 5e84e49 into manifest-network:main Jan 16, 2026
3 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants