Skip to content

Conversation

@ianton-ru
Copy link

@ianton-ru ianton-ru commented Oct 15, 2025

Changelog category (leave one):

  • Bug Fix (user-visible misbehavior in an official stable release)

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Fix joins with Iceberg tables
Solved #1063 (I hope)

Documentation entry for user-facing changes

  • Joins works with Iceberg tables, not only with table functions
  • Fix for local columns in ORDER BY/GROUP BY
  • WHERE/PREWHERE condition for subquery
  • More tests

CI/CD Options

Exclude tests:

  • Fast test
  • Integration Tests
  • Stateless tests
  • Stateful tests
  • Performance tests
  • All with ASAN
  • All with TSAN
  • All with MSAN
  • All with UBSAN
  • All with Coverage
  • All with Aarch64
  • All Regression
  • Disable CI Cache

Regression jobs to run:

  • Fast suites (mostly <1h)
  • Aggregate Functions (2h)
  • Alter (1.5h)
  • Benchmark (30m)
  • ClickHouse Keeper (1h)
  • Iceberg (2h)
  • LDAP (1h)
  • Parquet (1.5h)
  • RBAC (1.5h)
  • SSL Server (1h)
  • S3 (2h)
  • Tiered Storage (2h)

@github-actions
Copy link

github-actions bot commented Oct 15, 2025

Workflow [PR], commit [2513c29]

@ianton-ru
Copy link
Author

ianton-ru commented Oct 16, 2025

Failed tests:
03572_export_replicated_merge_tree_part_to_object_storage - looks like intersection in object_storage_cluster and part export
02995_new_settings_history:

2025-10-15 14:20:45 +PLEASE ADD THE NEW SETTING TO SettingsChangesHistory.cpp: export_merge_tree_part_overwrite_file_if_exists WAS ADDED
2025-10-15 14:20:45 +PLEASE ADD THE NEW SETTING TO SettingsChangesHistory.cpp: allow_experimental_export_merge_tree_part WAS ADDED
2025-10-15 14:20:45 +PLEASE ADD THE NEW SETTING TO SettingsChangesHistory.cpp: allow_experimental_export_merge_tree_part WAS ADDED

01079_parallel_alter_detach_table_zookeeper - looks like flap

if (query_node.hasWhere() || query_node.hasPrewhere())
{
CollectUsedColumnsForSourceVisitor collector_where(table_function_node, context, true);
if (query_node.hasPrewhere())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's quite odd that the original code didn't take prewhere into account...

If those are actually needed, we need to place a mental pin here that after backporting prewhere+row_policy we'll need to add hasRowLevelFilter here as well.

assert node.query(f"SELECT * FROM {CATALOG_NAME}.`{root_namespace}.{table_encoded_name}`") == "\\N\tAAPL\t193.24\t193.31\t('bot')\n"


def test_cluster_joins(started_cluster):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test seems to be passing on the current Antalya branch:

test_database_iceberg/test.py::test_cluster_joins PASSED  

Is it expected?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch, test doesn't cover case with local right table, thanks!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And not on cluster...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a test for the CROSS JOIN case as well?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@ianton-ru ianton-ru changed the title 25.8 Antalya: Fix joins with Iceberg tables [WIP] 25.8 Antalya: Fix joins with Iceberg tables Oct 24, 2025
@ianton-ru ianton-ru force-pushed the feature/antalya-25.8/s3cluster_global_join_fixes branch from fba187a to 39882f5 Compare November 6, 2025 13:39
@ianton-ru
Copy link
Author

Remove unused code

@ianton-ru ianton-ru changed the title [WIP] 25.8 Antalya: Fix joins with Iceberg tables 25.8 Antalya: Fix joins with Iceberg tables Nov 13, 2025
@arthurpassos
Copy link
Collaborator

@ianton-ru can you give me a bit more context on what the issue was in each specific bullet point and how it was fixed?

For example:

Joins works with Iceberg tables, not only with table functions

Why wasn't it working? What was missing? And what did you do to get it working?

Same for the others.

@ianton-ru
Copy link
Author

Joins works with Iceberg tables, not only with table functions

When I search tree node to make subquery to replicas, I have searched only QueryTreeNodeType::TABLE_FUNCTION before (in #972).
Now QueryTreeNodeType::TABLE is processed too.

Fix for local columns in ORDER BY/GROUP BY

ORDER BY condition sent to replica nodes and got error about unknown identifiers when it with local columns, now condition works only on local node.

WHERE/PREWHERE condition for subquery
In first realization I removed WHERE/PREWHERE condition in subquery when is contains local columns.

SELECT ice.x, local.y
  FROM iceberg.table AS ice
  JOIN local.table AS local
  ON ice.key=local.key
WHERE
  ice.x >= 10 AND local.y >= 20

Befor subquery was like

SELECT ice.x
  FROM iceberg.table AS ice

Now removeExpressionsThatDoNotDependOnTableIdentifiers is calles, and subquery with part of WHERE condition ()

SELECT ice.x
  FROM iceberg.table AS ice
WHERE
  ixe.x >= 10

Also fixed bugs founded by Alsu:

  • CROSS JOIN (has separate node type in query tree)
  • Lost strings (forgotten convertToFullColumnIfConst in StorageObjectStorageSource::generate)
  • Incorrect work when table function in joined with itself
SELECT * FROM iceberg('path') JOIN iceberg('path') ...

storage reused in this case, but I tried to save some info about query (has_join) inside storage object. As result was attempt to process right table as left table with strange errors.

Copy link
Collaborator

@arthurpassos arthurpassos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given my ignorance on the subject, I did my best for the 1st pass. Mostly questions so I can better understand the code and do a follow up. I beg for patience :)

auto table_function_node = table_function_searcher.getNode();
if (!table_function_node)
throw Exception(ErrorCodes::LOGICAL_ERROR, "Can't find table function node");
SearcherVisitor table_function_searcher({QueryTreeNodeType::TABLE, QueryTreeNodeType::TABLE_FUNCTION}, context);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: is this still a table_FUNCTION_visitor?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Table and table function. Try to make better name. Actually I try to get left expression of join here, something like left_table_expression_searcher.

auto & query_node = query_tree->as<QueryNode &>();
if (query_node.hasWhere() || query_node.hasPrewhere())
{
CollectUsedColumnsForSourceVisitor collector_where(table_function_node, context, true);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand CollectUsedColumnsForSourceVisitor is meant to collect all columns for a given "source". As I understand, the source is provided in the constructor. But what makes it local? Doesn't the query_tree object you are passing represent the entire query tree? I don't get it, please educate me

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, query_tree contains all, But in different places.
Columns can be in selected list, in where condition, in join condition, etc.

SELECT table1.column1,... FROM table1 JOIN table2 ON table1.column2=table2.column2 WHERE table1.column3=...

and collector traverses the tree and collect all cases in single list.
Need to all these columns to select list for left table subquery.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the columns can be in different places, but let's say there are no local columns. Wouldn't collector_where.getColumns() be non-empty regardless? Btw, what makes a table local or not local assuming you can't check for its existence on the remote node?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually CollectUsedColumnsForSourceVisitor find here all columns from other sources (last boolean flag collect_columns_from_other_sources_ in constructor).
Columns without source skipped in any case.
So when if where something like table1.column1 =1 AND table2.column2 = 2 AND (table1.column3 + table2.column4) = 3 AND randUniform(1,6) = 4 result contains only table2.column2 and table2.column4. Actually I don't know is table2 local or not, but author of query can know. If he used object_storage_cluster_join_mode='local', need to process as if it local.


if (info.has_join || info.has_cross_join)
{
if (table_function_searcher.getType().value() == QueryTreeNodeType::TABLE_FUNCTION)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I get it right that you are extracting ONLY the table function from the original AST? Why doesn't it have to be done for tables as well?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, I understand it is being done for tables when grabbing the left side. But why is it different. Is the query tree different for table function vs regular tables?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Looks like lats two blocks work for table function too!
But now need to retest everything.

auto & table_function_ast = table_function->as<ASTFunction &>();
query_tree_distributed->setAlias(table_function_ast.alias);
}
else if (info.has_join)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I get it right that the iceberg table is always the left one?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes (actually not only iceberg, s3Cluster too). In other cases code does not come into IStorageCluster.
When join with local table on the left and iceberg on the right clickhouse processes query as local join and only on final stage reads right table without attempts to send left to remote nodes as part of the subquery, so I did not touch that case.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... So in a nutshell, there are two cases: iceberg / s3cluster table on the left, in which we need to prevent local tables from reaching remote nodes. and the other case, in which the local is resolved locally before reaching this code path?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. In this code left table always s3Cluster/icebergCluster or Iceberg table.

}
else
{
SearcherVisitor join_searcher({QueryTreeNodeType::CROSS_JOIN}, context);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this become a method in QueryNode? For example: QueryNode::getCrossJoinTree

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All others methods in QueryNode don't search on request, but returns pre-filled fields for children vector. Now is very specific case, not sure that for this single case need to make and try to fill this position in all queries.
May be it make sense, but need to review all cross-join-related code and make refactoring.

auto cross_join_node = join_searcher.getNode();
if (!cross_join_node)
throw Exception(ErrorCodes::LOGICAL_ERROR, "Can't find CROSS JOIN node");
query_tree_distributed = cross_join_node->as<CrossJoinNode>()->getTableExpressions()[0]->clone();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does [0] mean left?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. CROSS JOIN has a vector of expression, can have more than two sources. So element 0 is a left. Can't be empty, as I understand.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps just add a comment, but not required

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add comment

@ianton-ru ianton-ru force-pushed the feature/antalya-25.8/s3cluster_global_join_fixes branch from 7255bda to 6ceecc4 Compare November 17, 2025 11:43
Copy link
Collaborator

@arthurpassos arthurpassos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Enmk Enmk merged commit 7799454 into antalya-25.8 Nov 17, 2025
105 of 108 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants