-
Notifications
You must be signed in to change notification settings - Fork 11
25.8 Antalya: Fix joins with Iceberg tables #1082
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
25.8 Antalya: Fix joins with Iceberg tables #1082
Conversation
|
Failed tests:
|
src/Storages/IStorageCluster.cpp
Outdated
| if (query_node.hasWhere() || query_node.hasPrewhere()) | ||
| { | ||
| CollectUsedColumnsForSourceVisitor collector_where(table_function_node, context, true); | ||
| if (query_node.hasPrewhere()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's quite odd that the original code didn't take prewhere into account...
If those are actually needed, we need to place a mental pin here that after backporting prewhere+row_policy we'll need to add hasRowLevelFilter here as well.
| assert node.query(f"SELECT * FROM {CATALOG_NAME}.`{root_namespace}.{table_encoded_name}`") == "\\N\tAAPL\t193.24\t193.31\t('bot')\n" | ||
|
|
||
|
|
||
| def test_cluster_joins(started_cluster): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test seems to be passing on the current Antalya branch:
test_database_iceberg/test.py::test_cluster_joins PASSED
Is it expected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch, test doesn't cover case with local right table, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And not on cluster...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a test for the CROSS JOIN case as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
…alya-25.8/s3cluster_global_join_fixes
fba187a to
39882f5
Compare
This reverts commit bff8bba.
|
Remove unused code |
|
@ianton-ru can you give me a bit more context on what the issue was in each specific bullet point and how it was fixed? For example:
Why wasn't it working? What was missing? And what did you do to get it working? Same for the others. |
When I search tree node to make subquery to replicas, I have searched only
Befor subquery was like Now Also fixed bugs founded by Alsu:
storage reused in this case, but I tried to save some info about query ( |
arthurpassos
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given my ignorance on the subject, I did my best for the 1st pass. Mostly questions so I can better understand the code and do a follow up. I beg for patience :)
src/Storages/IStorageCluster.cpp
Outdated
| auto table_function_node = table_function_searcher.getNode(); | ||
| if (!table_function_node) | ||
| throw Exception(ErrorCodes::LOGICAL_ERROR, "Can't find table function node"); | ||
| SearcherVisitor table_function_searcher({QueryTreeNodeType::TABLE, QueryTreeNodeType::TABLE_FUNCTION}, context); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: is this still a table_FUNCTION_visitor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Table and table function. Try to make better name. Actually I try to get left expression of join here, something like left_table_expression_searcher.
| auto & query_node = query_tree->as<QueryNode &>(); | ||
| if (query_node.hasWhere() || query_node.hasPrewhere()) | ||
| { | ||
| CollectUsedColumnsForSourceVisitor collector_where(table_function_node, context, true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand CollectUsedColumnsForSourceVisitor is meant to collect all columns for a given "source". As I understand, the source is provided in the constructor. But what makes it local? Doesn't the query_tree object you are passing represent the entire query tree? I don't get it, please educate me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, query_tree contains all, But in different places.
Columns can be in selected list, in where condition, in join condition, etc.
SELECT table1.column1,... FROM table1 JOIN table2 ON table1.column2=table2.column2 WHERE table1.column3=...
and collector traverses the tree and collect all cases in single list.
Need to all these columns to select list for left table subquery.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand the columns can be in different places, but let's say there are no local columns. Wouldn't collector_where.getColumns() be non-empty regardless? Btw, what makes a table local or not local assuming you can't check for its existence on the remote node?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually CollectUsedColumnsForSourceVisitor find here all columns from other sources (last boolean flag collect_columns_from_other_sources_ in constructor).
Columns without source skipped in any case.
So when if where something like table1.column1 =1 AND table2.column2 = 2 AND (table1.column3 + table2.column4) = 3 AND randUniform(1,6) = 4 result contains only table2.column2 and table2.column4. Actually I don't know is table2 local or not, but author of query can know. If he used object_storage_cluster_join_mode='local', need to process as if it local.
src/Storages/IStorageCluster.cpp
Outdated
|
|
||
| if (info.has_join || info.has_cross_join) | ||
| { | ||
| if (table_function_searcher.getType().value() == QueryTreeNodeType::TABLE_FUNCTION) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do I get it right that you are extracting ONLY the table function from the original AST? Why doesn't it have to be done for tables as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean, I understand it is being done for tables when grabbing the left side. But why is it different. Is the query tree different for table function vs regular tables?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Looks like lats two blocks work for table function too!
But now need to retest everything.
src/Storages/IStorageCluster.cpp
Outdated
| auto & table_function_ast = table_function->as<ASTFunction &>(); | ||
| query_tree_distributed->setAlias(table_function_ast.alias); | ||
| } | ||
| else if (info.has_join) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do I get it right that the iceberg table is always the left one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes (actually not only iceberg, s3Cluster too). In other cases code does not come into IStorageCluster.
When join with local table on the left and iceberg on the right clickhouse processes query as local join and only on final stage reads right table without attempts to send left to remote nodes as part of the subquery, so I did not touch that case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm... So in a nutshell, there are two cases: iceberg / s3cluster table on the left, in which we need to prevent local tables from reaching remote nodes. and the other case, in which the local is resolved locally before reaching this code path?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. In this code left table always s3Cluster/icebergCluster or Iceberg table.
src/Storages/IStorageCluster.cpp
Outdated
| } | ||
| else | ||
| { | ||
| SearcherVisitor join_searcher({QueryTreeNodeType::CROSS_JOIN}, context); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this become a method in QueryNode? For example: QueryNode::getCrossJoinTree
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All others methods in QueryNode don't search on request, but returns pre-filled fields for children vector. Now is very specific case, not sure that for this single case need to make and try to fill this position in all queries.
May be it make sense, but need to review all cross-join-related code and make refactoring.
src/Storages/IStorageCluster.cpp
Outdated
| auto cross_join_node = join_searcher.getNode(); | ||
| if (!cross_join_node) | ||
| throw Exception(ErrorCodes::LOGICAL_ERROR, "Can't find CROSS JOIN node"); | ||
| query_tree_distributed = cross_join_node->as<CrossJoinNode>()->getTableExpressions()[0]->clone(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does [0] mean left?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. CROSS JOIN has a vector of expression, can have more than two sources. So element 0 is a left. Can't be empty, as I understand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps just add a comment, but not required
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add comment
7255bda to
6ceecc4
Compare
arthurpassos
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Fix joins with Iceberg tables
Solved #1063 (I hope)
Documentation entry for user-facing changes
CI/CD Options
Exclude tests:
Regression jobs to run: