Read optimization using Iceberg metadata #1019

ianton-ru · 2025-09-17T09:52:49Z

Changelog category (leave one):

Experimental Feature

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

WIP: Read optimization using Iceberg metadata

Documentation entry for user-facing changes

Solved #1000
Iceberg return min and max values for columns in each data file. With this info clickhouse can skip reading some columns for specific file when min=max, instead of fill it as constant with value from Iceberg metadata.

Main change:
Metadata for each file are sent to StorageObjectStorageSource::createReader, than checks if some columns are constant in current file (min value = max value and no nulls).
These columns removed from requested_columns lists.
Later in StorageObjectStorageSource::generate inserted back with value from metadata.

Other important change:
This metadata are sent with file name to other hosts during cluster requests. For this class CommandInTaskResponse from PR #866 reused. Serialization/deserialization is ugly for now, but works.

Tests are coming soon.

Exclude tests:

github-actions · 2025-09-17T09:53:39Z

Workflow [PR], commit [fd23354]

ianton-ru · 2025-09-17T12:19:34Z

TODO:

tests

Optional TODO:

refactor to keep structures/classes (at least DataFileInfo) in separate header files to reduce file dependencies

hodgesrm · 2025-09-19T14:26:54Z

@ianton-ru what's the speedup you are seeing with this optimization? Does it match the query response of a simple SELECT count() as described in #1000?

ianton-ru · 2025-09-22T15:30:58Z

@ianton-ru what's the speedup you are seeing with this optimization? Does it match the query response of a simple SELECT count() as described in #1000?

Yes, partially. Now not covered case with column renames and case with constant NULL . this pr is required - ClickHouse#85829. But on cases with constant non-NULL values in non-renamed columns ClickHouse should take count and values of constant columns from Iceberg metadata.

ianton-ru · 2025-09-22T15:32:38Z

Test 03413_experimental_settings_cannot_be_enabled_by_default failed...

Read optimization based on Iceberg metadata

5af7474

Merge branch 'antalya-25.6.5' into feature/optimize_count_in_datalake

5a8aba7

ianton-ru added 4 commits September 17, 2025 15:57

Better range serialization

de7545d

Setting allow_experimental_iceberg_read_optimization, 0 by default

dd98094

Fix column indexes

8fb2aa2

Optimization for NULLs, count form metadata, test

bb2da6b

ianton-ru changed the title ~~WIP: Read optimization using Iceberg metadata~~ Read optimization using Iceberg metadata Sep 18, 2025

ianton-ru marked this pull request as ready for review September 18, 2025 16:26

Remove optimiation for NULLs

f8caa7f

ianton-ru and others added 4 commits September 29, 2025 12:12

Ignore negative nulls count

825fe68

Merge branch 'antalya-25.6.5' into feature/optimize_count_in_datalake

e2357c1

Remove debug record on each chunk

3c77ee2

Fix build

fd23354

ianton-ru added antalya antalya-25.6 antalya-25.6.5 labels Oct 1, 2025

Enmk merged commit a1c4e5e into antalya-25.6.5 Oct 1, 2025
111 of 136 checks passed

ianton-ru mentioned this pull request Oct 8, 2025

25.8 Antalya ports: Read optimization using Iceberg metadata #1069

Merged

13 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Read optimization using Iceberg metadata #1019

Read optimization using Iceberg metadata #1019

Uh oh!

ianton-ru commented Sep 17, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Sep 17, 2025 •

edited

Loading

Uh oh!

ianton-ru commented Sep 17, 2025 •

edited

Loading

Uh oh!

hodgesrm commented Sep 19, 2025

Uh oh!

ianton-ru commented Sep 22, 2025

Uh oh!

ianton-ru commented Sep 22, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Read optimization using Iceberg metadata #1019

Read optimization using Iceberg metadata #1019

Uh oh!

Conversation

ianton-ru commented Sep 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changelog category (leave one):

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Documentation entry for user-facing changes

Exclude tests:

Uh oh!

github-actions bot commented Sep 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ianton-ru commented Sep 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hodgesrm commented Sep 19, 2025

Uh oh!

ianton-ru commented Sep 22, 2025

Uh oh!

ianton-ru commented Sep 22, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ianton-ru commented Sep 17, 2025 •

edited

Loading

github-actions bot commented Sep 17, 2025 •

edited

Loading

ianton-ru commented Sep 17, 2025 •

edited

Loading