-
Notifications
You must be signed in to change notification settings - Fork 9
engine=Hybrid improvements #1156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
test failures look unrelated |
|
@codex review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
src/Storages/StorageDistributed.cpp
Outdated
| segment.predicate_ast, | ||
| columns_to_cast_ptr, | ||
| &metadata_columns, | ||
| segment_actual_columns); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unfortunatelly this approach also seem to misbahave.
When doing those rewrites (col -> CAST(col,...) ) it seems like in may potentially affect the order of the columns coming back from the subplans. And when different segments get processed differently the reorder somehow can also be different. And there is a match by positions near converting actions, which then may lead to situation then it try to align / mix together 2 different column. :|
It seems like moving that rewrite out (make it global for the main table) solves the issue (plus reduces the number of query tree traversals)
I've moved that logic to the dedicated Analyzer Pass (which seems to be a more logical place for such global rewrites of the query)...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the record - it looked like that:
2025.11.21 17:27:03.478289 [ 1817274 ] {82a3c4b5-345b-4373-8591-e731c192f8e6} <Error> executeQuery: Code: 38. DB::Exception: Cannot parse date: value is too short: Cannot parse Date from String: while converting source column `_CAST(__table1.name, 'String'_String)` to destination column `__table1.date`: while executing 'FUNCTION
_CAST(_CAST(__table1.name, 'String'_String) :: 2, Date :: 5) -> _CAST(_CAST(__table1.name, 'String'_String), Date) Date : 4'. (CANNOT_PARSE_DATE) (version 25.8.9.20000.altinityantalya) (from 127.0.0.1:39696) (query 1, line 1) (in query: SELECT * FROM test_tiered_watermark ORDER BY id SETTINGS enable_analyzer = 1
Plan
› Expression (Project names)
Header: id UInt32
name String
date Date
value UInt32
Actions: INPUT : 0 -> __table1.id UInt32 : 0
INPUT : 1 -> __table1.name String : 1
INPUT : 2 -> __table1.date Date : 2
INPUT : 3 -> __table1.value UInt32 : 3
ALIAS __table1.id :: 0 -> id UInt32 : 4
ALIAS __table1.name :: 1 -> name String : 0
ALIAS __table1.date :: 2 -> date Date : 1
ALIAS __table1.value :: 3 -> value UInt32 : 2
Positions: 4 0 1 2
Sorting (Merge sorted streams after aggregation stage for ORDER BY)
Header: __table1.id UInt32
__table1.name String
__table1.date Date
__table1.value UInt32
Sort description: __table1.id ASC
Union
Header: __table1.id UInt32
__table1.name String
__table1.date Date
__table1.value UInt32
Expression (Change remote column names to local column names)
Header: __table1.id UInt32
__table1.name String
__table1.date Date
__table1.value UInt32
Actions: INPUT : 0 -> _CAST(__table1.id, 'UInt32'_String) UInt32 : 0
INPUT : 1 -> __table1.date Date : 1
INPUT : 2 -> _CAST(__table1.name, 'String'_String) String : 2
INPUT : 3 -> _CAST(__table1.value, 'UInt32'_String) UInt32 : 3
COLUMN Const(String) -> String String : 4
COLUMN Const(String) -> Date String : 5
ALIAS _CAST(__table1.id, 'UInt32'_String) :: 0 -> __table1.id UInt32 : 6
ALIAS _CAST(__table1.value, 'UInt32'_String) :: 3 -> __table1.value UInt32 : 0
FUNCTION _CAST(__table1.date :: 1, String :: 4) -> _CAST(__table1.date, String) String : 3
FUNCTION _CAST(_CAST(__table1.name, 'String'_String) :: 2, Date :: 5) -> _CAST(_CAST(__table1.name, 'String'_String), Date) Date : 4
ALIAS _CAST(__table1.date, String) :: 3 -> __table1.name String : 5
ALIAS _CAST(_CAST(__table1.name, 'String'_String), Date) :: 4 -> __table1.date Date : 3
Positions: 6 5 3 0
ReadFromRemote (Read from remote replica)
Header: _CAST(__table1.id, 'UInt32'_String) UInt32
__table1.date Date
_CAST(__table1.name, 'String'_String) String
_CAST(__table1.value, 'UInt32'_String) UInt32
Expression (Change remote column names to local column names)
Header: __table1.id UInt32
__table1.name String
__table1.date Date
__table1.value UInt32
Actions: INPUT : 0 -> _CAST(__table1.id, 'UInt32'_String) UInt32 : 0
INPUT : 1 -> __table1.date Date : 1
INPUT : 2 -> _CAST(__table1.name, 'String'_String) String : 2
INPUT : 3 -> _CAST(__table1.value, 'UInt32'_String) UInt32 : 3
COLUMN Const(String) -> String String : 4
COLUMN Const(String) -> Date String : 5
ALIAS _CAST(__table1.id, 'UInt32'_String) :: 0 -> __table1.id UInt32 : 6
ALIAS _CAST(__table1.value, 'UInt32'_String) :: 3 -> __table1.value UInt32 : 0
FUNCTION _CAST(__table1.date :: 1, String :: 4) -> _CAST(__table1.date, String) String : 3
FUNCTION _CAST(_CAST(__table1.name, 'String'_String) :: 2, Date :: 5) -> _CAST(_CAST(__table1.name, 'String'_String), Date) Date : 4
ALIAS _CAST(__table1.date, String) :: 3 -> __table1.name String : 5
ALIAS _CAST(_CAST(__table1.name, 'String'_String), Date) :: 4 -> __table1.date Date : 3
Positions: 6 5 3 0
Sorting (Merge sorted streams after aggregation stage for ORDER BY)
Header: _CAST(__table1.id, 'UInt32'_String) UInt32
__table1.date Date
_CAST(__table1.name, 'String'_String) String
_CAST(__table1.value, 'UInt32'_String) UInt32
Sort description: _CAST(__table1.id, 'UInt32'_String) ASC
ReadFromRemote (Read from remote replica)
Header: _CAST(__table1.id, 'UInt32'_String) UInt32
__table1.date Date
_CAST(__table1.name, 'String'_String) String
_CAST(__table1.value, 'UInt32'_String) UInt32
maybe, it comes from positional matching here (i'm not sure)
ClickHouse/src/Planner/PlannerJoinTree.cpp
Line 1410 in 25ed9aa
| ActionsDAG::MatchColumnsMode::Position, |
…d make the expected non-null columns explicit
…vide the required columns
ee309f7 to
0896a85
Compare
|
@codex review |
arthurpassos
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To the extents of my knowledge, it looks good. Plus, it is experimental.
|
Codex Review: Didn't find any major issues. 🎉 ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
… (anyway it works only if experimetal hybrid tables are enabled)
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Added experimental support in the Hybrid table engine to automatically reconcile column-type mismatches across segments via
hybrid_table_auto_cast_columns(analyzer only). This allows queries to return consistent headers even when the underlying tables use different physical types.Documentation entry for user-facing changes
Introduced hybrid_table_auto_cast_columns (requires allow_experimental_analyzer = 1) so Hybrid segments can auto-insert CAST operations and keep their headers aligned when schemas diverge.
Reiterated that engine = Hybrid remains experimental and must be enabled with
allow_experimental_hybrid_table.Updated terminology and wording in the Hybrid docs (tiered → hybrid, layers → segments) for consistency.
CI/CD Options
Exclude tests:
Regression jobs to run: