Skip to content

Conversation

@filimonov
Copy link
Member

@filimonov filimonov commented Nov 19, 2025

Changelog category (leave one):

  • Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Added experimental support in the Hybrid table engine to automatically reconcile column-type mismatches across segments via hybrid_table_auto_cast_columns (analyzer only). This allows queries to return consistent headers even when the underlying tables use different physical types.

Documentation entry for user-facing changes

  • Introduced hybrid_table_auto_cast_columns (requires allow_experimental_analyzer = 1) so Hybrid segments can auto-insert CAST operations and keep their headers aligned when schemas diverge.

  • Reiterated that engine = Hybrid remains experimental and must be enabled with allow_experimental_hybrid_table.

  • Updated terminology and wording in the Hybrid docs (tiered → hybrid, layers → segments) for consistency.

CI/CD Options

Exclude tests:

  • Fast test
  • Integration Tests
  • Stateless tests
  • Stateful tests
  • Performance tests
  • All with ASAN
  • All with TSAN
  • All with MSAN
  • All with UBSAN
  • All with Coverage
  • All with Aarch64
  • All Regression
  • Disable CI Cache

Regression jobs to run:

  • Fast suites (mostly <1h)
  • Aggregate Functions (2h)
  • Alter (1.5h)
  • Benchmark (30m)
  • ClickHouse Keeper (1h)
  • Iceberg (2h)
  • LDAP (1h)
  • Parquet (1.5h)
  • RBAC (1.5h)
  • SSL Server (1h)
  • S3 (2h)
  • Tiered Storage (2h)

@github-actions
Copy link

github-actions bot commented Nov 19, 2025

Workflow [PR], commit [8c0d982]

@filimonov
Copy link
Member Author

test failures look unrelated

@arthurpassos
Copy link
Collaborator

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

segment.predicate_ast,
columns_to_cast_ptr,
&metadata_columns,
segment_actual_columns);
Copy link
Member Author

@filimonov filimonov Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unfortunatelly this approach also seem to misbahave.

When doing those rewrites (col -> CAST(col,...) ) it seems like in may potentially affect the order of the columns coming back from the subplans. And when different segments get processed differently the reorder somehow can also be different. And there is a match by positions near converting actions, which then may lead to situation then it try to align / mix together 2 different column. :|

It seems like moving that rewrite out (make it global for the main table) solves the issue (plus reduces the number of query tree traversals)

I've moved that logic to the dedicated Analyzer Pass (which seems to be a more logical place for such global rewrites of the query)...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the record - it looked like that:

2025.11.21 17:27:03.478289 [ 1817274 ] {82a3c4b5-345b-4373-8591-e731c192f8e6} <Error> executeQuery: Code: 38. DB::Exception: Cannot parse date: value is too short: Cannot parse Date from String: while converting source column `_CAST(__table1.name, 'String'_String)` to destination column `__table1.date`: while executing 'FUNCTION
  _CAST(_CAST(__table1.name, 'String'_String) :: 2, Date :: 5) -> _CAST(_CAST(__table1.name, 'String'_String), Date) Date : 4'. (CANNOT_PARSE_DATE) (version 25.8.9.20000.altinityantalya) (from 127.0.0.1:39696) (query 1, line 1) (in query: SELECT * FROM test_tiered_watermark ORDER BY id SETTINGS enable_analyzer = 1

Plan

› Expression (Project names)
  Header: id UInt32
          name String
          date Date
          value UInt32
  Actions: INPUT : 0 -> __table1.id UInt32 : 0
           INPUT : 1 -> __table1.name String : 1
           INPUT : 2 -> __table1.date Date : 2
           INPUT : 3 -> __table1.value UInt32 : 3
           ALIAS __table1.id :: 0 -> id UInt32 : 4
           ALIAS __table1.name :: 1 -> name String : 0
           ALIAS __table1.date :: 2 -> date Date : 1
           ALIAS __table1.value :: 3 -> value UInt32 : 2
  Positions: 4 0 1 2
    Sorting (Merge sorted streams after aggregation stage for ORDER BY)
    Header: __table1.id UInt32
            __table1.name String
            __table1.date Date
            __table1.value UInt32
    Sort description: __table1.id ASC
      Union
      Header: __table1.id UInt32
              __table1.name String
              __table1.date Date
              __table1.value UInt32
        Expression (Change remote column names to local column names)
        Header: __table1.id UInt32
                __table1.name String
                __table1.date Date
                __table1.value UInt32
        Actions: INPUT : 0 -> _CAST(__table1.id, 'UInt32'_String) UInt32 : 0
                 INPUT : 1 -> __table1.date Date : 1
                 INPUT : 2 -> _CAST(__table1.name, 'String'_String) String : 2
                 INPUT : 3 -> _CAST(__table1.value, 'UInt32'_String) UInt32 : 3
                 COLUMN Const(String) -> String String : 4
                 COLUMN Const(String) -> Date String : 5
                 ALIAS _CAST(__table1.id, 'UInt32'_String) :: 0 -> __table1.id UInt32 : 6
                 ALIAS _CAST(__table1.value, 'UInt32'_String) :: 3 -> __table1.value UInt32 : 0
                 FUNCTION _CAST(__table1.date :: 1, String :: 4) -> _CAST(__table1.date, String) String : 3
                 FUNCTION _CAST(_CAST(__table1.name, 'String'_String) :: 2, Date :: 5) -> _CAST(_CAST(__table1.name, 'String'_String), Date) Date : 4
                 ALIAS _CAST(__table1.date, String) :: 3 -> __table1.name String : 5
                 ALIAS _CAST(_CAST(__table1.name, 'String'_String), Date) :: 4 -> __table1.date Date : 3
        Positions: 6 5 3 0
          ReadFromRemote (Read from remote replica)
          Header: _CAST(__table1.id, 'UInt32'_String) UInt32
                  __table1.date Date
                  _CAST(__table1.name, 'String'_String) String
                  _CAST(__table1.value, 'UInt32'_String) UInt32
        Expression (Change remote column names to local column names)
        Header: __table1.id UInt32
                __table1.name String
                __table1.date Date
                __table1.value UInt32
        Actions: INPUT : 0 -> _CAST(__table1.id, 'UInt32'_String) UInt32 : 0
                 INPUT : 1 -> __table1.date Date : 1
                 INPUT : 2 -> _CAST(__table1.name, 'String'_String) String : 2
                 INPUT : 3 -> _CAST(__table1.value, 'UInt32'_String) UInt32 : 3
                 COLUMN Const(String) -> String String : 4
                 COLUMN Const(String) -> Date String : 5
                 ALIAS _CAST(__table1.id, 'UInt32'_String) :: 0 -> __table1.id UInt32 : 6
                 ALIAS _CAST(__table1.value, 'UInt32'_String) :: 3 -> __table1.value UInt32 : 0
                 FUNCTION _CAST(__table1.date :: 1, String :: 4) -> _CAST(__table1.date, String) String : 3
                 FUNCTION _CAST(_CAST(__table1.name, 'String'_String) :: 2, Date :: 5) -> _CAST(_CAST(__table1.name, 'String'_String), Date) Date : 4
                 ALIAS _CAST(__table1.date, String) :: 3 -> __table1.name String : 5
                 ALIAS _CAST(_CAST(__table1.name, 'String'_String), Date) :: 4 -> __table1.date Date : 3
        Positions: 6 5 3 0
          Sorting (Merge sorted streams after aggregation stage for ORDER BY)
          Header: _CAST(__table1.id, 'UInt32'_String) UInt32
                  __table1.date Date
                  _CAST(__table1.name, 'String'_String) String
                  _CAST(__table1.value, 'UInt32'_String) UInt32
          Sort description: _CAST(__table1.id, 'UInt32'_String) ASC
            ReadFromRemote (Read from remote replica)
            Header: _CAST(__table1.id, 'UInt32'_String) UInt32
                    __table1.date Date
                    _CAST(__table1.name, 'String'_String) String
                    _CAST(__table1.value, 'UInt32'_String) UInt32

maybe, it comes from positional matching here (i'm not sure)

ActionsDAG::MatchColumnsMode::Position,

@arthurpassos
Copy link
Collaborator

@codex review

arthurpassos
arthurpassos previously approved these changes Nov 21, 2025
Copy link
Collaborator

@arthurpassos arthurpassos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To the extents of my knowledge, it looks good. Plus, it is experimental.

@chatgpt-codex-connector
Copy link

Codex Review: Didn't find any major issues. 🎉

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

… (anyway it works only if experimetal hybrid tables are enabled)
@Enmk Enmk merged commit 807c012 into antalya-25.8 Nov 25, 2025
403 of 430 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants