Skip to content

Endpoint failover not working - Indexer stuck on unhealthy endpoint when multiple endpoints configured #3034

@99Kies

Description

@99Kies

Description:

I'm experiencing an issue with endpoint failover when configuring multiple endpoints in the SubQuery Cosmos indexer. The indexer does not properly rotate to healthy endpoints when one becomes unavailable, instead getting stuck trying to connect to the failing endpoint.

Configuration:

I have configured two endpoints in my project:

endpoint: [
	'https://vota-rpc.dorafactory.org:443',
	'http://54.169.53.97:26657',
],

Expected Behavior:

When one endpoint becomes unhealthy (e.g., connection timeout), the indexer should automatically failover to the next configured endpoint and continue indexing.

Actual Behavior:

The indexer gets stuck on the second endpoint (54.169.53.97:26657) and continuously attempts to connect to it, showing timeout errors. The error log indicates that while the endpoint is marked as unhealthy, the indexer doesn't attempt to switch to the other configured endpoint.

Error Logs:

2026-03-30T08:20:50.220Z <FetchService> ERROR Having a problem when getting finalized block AxiosError: connect ETIMEDOUT 54.169.53.97:26657
2026-03-30T08:20:51.214Z <FetchService> ERROR Having a problem when getting best block AxiosError: connect ETIMEDOUT 54.169.53.97:26657
2026-03-30T08:20:52.157Z <FetchService> ERROR Having a problem when getting finalized block AxiosError: connect ETIMEDOUT 54.169.53.97:26657
2026-03-30T08:20:53.159Z <FetchService> ERROR Having a problem when getting best block AxiosError: connect ETIMEDOUT 54.169.53.97:26657
2026-03-30T08:20:54.104Z <health> ERROR undefined Error: Endpoint is not healthy
2026-03-30T08:20:54.106Z <health> ERROR undefined Error: Endpoint is not healthy
2026-03-30T08:20:54.316Z <FetchService> ERROR Having a problem when getting finalized block AxiosError: connect ETIMEDOUT 54.169.53.97:26657
2026-03-30T08:21:04.119Z <health> ERROR undefined Error: Endpoint is not healthy
2026-03-30T08:21:04.120Z <health> ERROR undefined Error: Endpoint is not healthy
2026-03-30T08:21:04.939Z <benchmark> INFO INDEXING: Fully synced, waiting for new blocks
2026-03-30T08:21:14.132Z <health> ERROR undefined Error: Endpoint is not healthy
2026-03-30T08:21:14.134Z <health> ERROR undefined Error: Endpoint is not healthy
2026-03-30T08:21:24.144Z <health> ERROR undefined Error: Endpoint is not healthy

Environment:

  • Image: onfinality/subql-node-cosmos:v5.3.0

Steps to Reproduce:

  1. Configure two endpoints in project.yaml, where one is potentially unstable
  2. Run the indexer with subql-node-cosmos:v5.3.0
  3. When the active endpoint becomes unhealthy, observe that the indexer doesn't failover to the healthy endpoint

Additional Context:

The issue appears to be that once an endpoint is marked unhealthy, the indexer doesn't implement proper failover logic to cycle through the available endpoints. The health check continues to report the endpoint as unhealthy, but no switch occurs.

Feel free to adjust the deployment details and network information as needed for your specific setup.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions