Don't preserve functional dependency when generating UNION logical plan #44
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Rationale for this change
This PR discards the functional dependencies when generating the UNION logical plan, thus helping avoid FDs that no longer exist being used by further operations, e.g. aggregation.
When the datafusion logical planner build the
AGGREGATEplan, it adds additional columns in thegroup_exprbased on the functional dependencies.However, for queries that are aggregating upon table obatined through
UNIONoperation, the functional dependency is still preserved in the schema ofUNIONplan, while the functional dependency no longer retains after theUNION.Table 1:
Table 2:
In both Table1 and Table2, the functional dependency
col1 -> col2holds. However, whenselect * from table1 UNION select * from table2, the functional dependencycol1 -> col2no longer holds.This causes trouble in further aggregation based on UNION results, consider the following query:
Due to the wrongly preserved functional dependency, the query generates wrong logical plan in the final aggregation step
In the test added, the result would contain duplicated groups without changes made in this PR:

What changes are included in this PR?
UNIONlogical planAre these changes tested?
Yes
Are there any user-facing changes?
The target columns described by FD will no longer be wrongly included in the aggregate
group_bycolumns