Skip to content

Conversation

@jmaicher
Copy link
Contributor

@jmaicher jmaicher commented Aug 29, 2025

When reading from *cluster functions, e.g. s3Cluster or deltaLakeCluster, and all replicas are unavailable, we currently return an empty result:

:) SELECT * FROM s3Cluster('test_cluster_multiple_nodes_all_unavailable', 'http://minio:9023/test/a.parquet');

Query id: 8c5e3f30-44c1-47e0-9d58-9d802f29d209

Ok.

0 rows in set. Elapsed: 0.013 sec. 

Logs:

2025.08.29 16:20:20.280297 [ 1444017 ] {8c5e3f30-44c1-47e0-9d58-9d802f29d209} <Trace> Connection (127.0.0.1:1234): Connecting. Database: (not specified). User: default. Bind_Host: (not specified)
2025.08.29 16:20:20.280472 [ 1444017 ] {8c5e3f30-44c1-47e0-9d58-9d802f29d209} <Warning> ConnectionPoolWithFailover: Connection failed at try №1, reason: Code: 210. DB::NetException: Connection refused (127.0.0.1:1234). (NETWORK_ERROR) (version 25.9.1.1)
2025.08.29 16:20:20.280476 [ 1444017 ] {8c5e3f30-44c1-47e0-9d58-9d802f29d209} <Trace> Connection (127.0.0.1:1234): Connecting. Database: (not specified). User: default. Bind_Host: (not specified)
2025.08.29 16:20:20.280547 [ 1444017 ] {8c5e3f30-44c1-47e0-9d58-9d802f29d209} <Warning> ConnectionPoolWithFailover: Connection failed at try №2, reason: Code: 210. DB::NetException: Connection refused (127.0.0.1:1234). (NETWORK_ERROR) (version 25.9.1.1)
2025.08.29 16:20:20.280551 [ 1444017 ] {8c5e3f30-44c1-47e0-9d58-9d802f29d209} <Trace> Connection (127.0.0.1:1234): Connecting. Database: (not specified). User: default. Bind_Host: (not specified)
2025.08.29 16:20:20.280624 [ 1444017 ] {8c5e3f30-44c1-47e0-9d58-9d802f29d209} <Warning> ConnectionPoolWithFailover: Connection failed at try №3, reason: Code: 210. DB::NetException: Connection refused (127.0.0.1:1234). (NETWORK_ERROR) (version 25.9.1.1)
2025.08.29 16:20:20.280631 [ 1444017 ] {8c5e3f30-44c1-47e0-9d58-9d802f29d209} <Trace> Connection (127.0.0.2:1234): Connecting. Database: (not specified). User: default. Bind_Host: (not specified)
2025.08.29 16:20:20.280711 [ 1444017 ] {8c5e3f30-44c1-47e0-9d58-9d802f29d209} <Warning> ConnectionPoolWithFailover: Connection failed at try №1, reason: Code: 210. DB::NetException: Connection refused (127.0.0.2:1234). (NETWORK_ERROR) (version 25.9.1.1)
2025.08.29 16:20:20.280716 [ 1444017 ] {8c5e3f30-44c1-47e0-9d58-9d802f29d209} <Trace> Connection (127.0.0.2:1234): Connecting. Database: (not specified). User: default. Bind_Host: (not specified)
2025.08.29 16:20:20.280783 [ 1444017 ] {8c5e3f30-44c1-47e0-9d58-9d802f29d209} <Warning> ConnectionPoolWithFailover: Connection failed at try №2, reason: Code: 210. DB::NetException: Connection refused (127.0.0.2:1234). (NETWORK_ERROR) (version 25.9.1.1)
2025.08.29 16:20:20.280786 [ 1444017 ] {8c5e3f30-44c1-47e0-9d58-9d802f29d209} <Trace> Connection (127.0.0.2:1234): Connecting. Database: (not specified). User: default. Bind_Host: (not specified)
2025.08.29 16:20:20.280850 [ 1444017 ] {8c5e3f30-44c1-47e0-9d58-9d802f29d209} <Warning> ConnectionPoolWithFailover: Connection failed at try №3, reason: Code: 210. DB::NetException: Connection refused (127.0.0.2:1234). (NETWORK_ERROR) (version 25.9.1.1)
2025.08.29 16:20:20.280858 [ 1444017 ] {8c5e3f30-44c1-47e0-9d58-9d802f29d209} <Trace> Connection (127.0.0.3:1234): Connecting. Database: (not specified). User: default. Bind_Host: (not specified)
2025.08.29 16:20:20.280924 [ 1444017 ] {8c5e3f30-44c1-47e0-9d58-9d802f29d209} <Warning> ConnectionPoolWithFailover: Connection failed at try №1, reason: Code: 210. DB::NetException: Connection refused (127.0.0.3:1234). (NETWORK_ERROR) (version 25.9.1.1)
2025.08.29 16:20:20.280927 [ 1444017 ] {8c5e3f30-44c1-47e0-9d58-9d802f29d209} <Trace> Connection (127.0.0.3:1234): Connecting. Database: (not specified). User: default. Bind_Host: (not specified)
2025.08.29 16:20:20.280991 [ 1444017 ] {8c5e3f30-44c1-47e0-9d58-9d802f29d209} <Warning> ConnectionPoolWithFailover: Connection failed at try №2, reason: Code: 210. DB::NetException: Connection refused (127.0.0.3:1234). (NETWORK_ERROR) (version 25.9.1.1)
2025.08.29 16:20:20.280994 [ 1444017 ] {8c5e3f30-44c1-47e0-9d58-9d802f29d209} <Trace> Connection (127.0.0.3:1234): Connecting. Database: (not specified). User: default. Bind_Host: (not specified)
2025.08.29 16:20:20.281069 [ 1444017 ] {8c5e3f30-44c1-47e0-9d58-9d802f29d209} <Warning> ConnectionPoolWithFailover: Connection failed at try №3, reason: Code: 210. DB::NetException: Connection refused (127.0.0.3:1234). (NETWORK_ERROR) (version 25.9.1.1)
2025.08.29 16:20:20.281458 [ 1444699 ] {8c5e3f30-44c1-47e0-9d58-9d802f29d209} <Test> PipelineExecutor: Thread finished. Total time: 1.7942e-05 sec. Execution time: 4.231e-06 sec. Processing time: 6.24e-06 sec. Wait time: 7.471e-06 sec.
2025.08.29 16:20:20.281585 [ 1444017 ] {8c5e3f30-44c1-47e0-9d58-9d802f29d209} <Debug> TCPHandler: Processed in 0.013430705 sec.

It seems that the problem was introduced here by allowing to skip unavailable endpoints in ConnectionPoolWithFailover::getMany.

Changelog category (leave one):

  • Bug Fix (user-visible misbehavior in an official stable release)

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Fail if all replicas are unavailable when reading from *cluster functions

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)


auto pipe = Pipe::unitePipes(std::move(pipes));
if (pipe.empty())
pipe = Pipe(std::make_shared<NullSource>(getOutputHeader()));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't found a valid reason for the NullSource fallback here.

@GrigoryPervakov GrigoryPervakov added the can be tested Allows running workflows for external contributors label Aug 29, 2025
@clickhouse-gh
Copy link

clickhouse-gh bot commented Aug 29, 2025

Workflow [PR], commit [d3124d2]

Summary:

job_name test_name status info comment
Stateless tests (amd_binary, old analyzer, s3 storage, DatabaseReplicated, parallel) failure
02443_detach_attach_partition FAIL
Stress test (amd_ubsan) failure
Server died FAIL
Hung check failed, possible deadlock found (see hung_check.log) FAIL
Killed by signal (in clickhouse-server.log) FAIL
Fatal message in clickhouse-server.log (see fatal_messages.txt) FAIL
Killed by signal (output files) FAIL
Found signal in gdb.log FAIL
Upgrade check (amd_asan) failure
Killed by signal (in clickhouse-server.log) FAIL
Fatal message in clickhouse-server.log (see fatal_messages.txt) FAIL
Killed by signal (output files) FAIL
Found signal in gdb.log FAIL
Upgrade check (amd_msan) failure
Killed by signal (in clickhouse-server.log) FAIL
Fatal message in clickhouse-server.log (see fatal_messages.txt) FAIL
Killed by signal (output files) FAIL
Found signal in gdb.log FAIL

@clickhouse-gh clickhouse-gh bot added the pr-bugfix Pull request with bugfix, not backported by default label Aug 29, 2025
@nickitat nickitat self-assigned this Aug 29, 2025
@nickitat nickitat added this pull request to the merge queue Aug 29, 2025
Merged via the queue into ClickHouse:master with commit dc19962 Aug 29, 2025
118 of 122 checks passed
@robot-clickhouse-ci-1 robot-clickhouse-ci-1 added the pr-synced-to-cloud The PR is synced to the cloud repo label Aug 29, 2025
Enmk added a commit to Altinity/ClickHouse that referenced this pull request Sep 12, 2025
Backport of ClickHouse#86414 Fail when all replicas are unavailable for *cluster functions
@jmaicher
Copy link
Contributor Author

Can we backport this to 25.8 LTS?

@robot-ch-test-poll1 robot-ch-test-poll1 added pr-backports-created-cloud deprecated label, NOOP pr-must-backport-synced The `*-must-backport` labels are synced into the cloud Sync PR labels Sep 18, 2025
robot-clickhouse-ci-1 added a commit that referenced this pull request Sep 18, 2025
Cherry pick #86414 to 25.8: Fail if all replicas are unavailable when reading from *cluster functions
robot-clickhouse added a commit that referenced this pull request Sep 18, 2025
@robot-ch-test-poll3 robot-ch-test-poll3 added the pr-backports-created Backport PRs are successfully created, it won't be processed by CI script anymore label Sep 18, 2025
nickitat added a commit that referenced this pull request Sep 24, 2025
Backport #86414 to 25.8: Fail if all replicas are unavailable when reading from *cluster functions
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

can be tested Allows running workflows for external contributors pr-backports-created Backport PRs are successfully created, it won't be processed by CI script anymore pr-backports-created-cloud deprecated label, NOOP pr-bugfix Pull request with bugfix, not backported by default pr-must-backport-synced The `*-must-backport` labels are synced into the cloud Sync PR pr-synced-to-cloud The PR is synced to the cloud repo v25.8-must-backport

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants