-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-34200][SQL] Ambiguous column reference should consider attribute availability #31287
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Ngone51
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
viirya
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
|
BTW, the |
|
Kubernetes integration test starting |
|
Ah actually this fixes a 3.1 regression. Previously we merged a bug fix: #30488 , which makes sure ambiguous self-join check is always applied. That said, in 3.0 the ambiguous self-join check is skipped under some cases, which hides the bug this PR is fixing. For the query below, it works in 3.0, but fails in 3.1. After this PR, it works again. cc @HyukjinKwon @dongjoon-hyun I think it's a 3.1.1 blocker. |
|
Kubernetes integration test status success |
|
Test build #134353 has finished for PR 31287 at commit
|
…te availability
### What changes were proposed in this pull request?
This is a long-standing bug that exists since we have the ambiguous self-join check. A column reference is not ambiguous if it can only come from one join side (e.g. the other side has a project to only pick a few columns). An example is
```
Join(b#1 = 3)
TableScan(t, [a#0, b#1])
Project(a#2)
TableScan(t, [a#2, b#3])
```
It's a self-join, but `b#1` is not ambiguous because it can't come from the right side, which only has column `a`.
### Why are the changes needed?
to not fail valid self-join queries.
### Does this PR introduce _any_ user-facing change?
yea as a bug fix
### How was this patch tested?
a new test
Closes #31287 from cloud-fan/self-join.
Authored-by: Wenchen Fan <[email protected]>
Signed-off-by: HyukjinKwon <[email protected]>
(cherry picked from commit b8a6906)
Signed-off-by: HyukjinKwon <[email protected]>
…te availability
This is a long-standing bug that exists since we have the ambiguous self-join check. A column reference is not ambiguous if it can only come from one join side (e.g. the other side has a project to only pick a few columns). An example is
```
Join(b#1 = 3)
TableScan(t, [a#0, b#1])
Project(a#2)
TableScan(t, [a#2, b#3])
```
It's a self-join, but `b#1` is not ambiguous because it can't come from the right side, which only has column `a`.
to not fail valid self-join queries.
yea as a bug fix
a new test
Closes #31287 from cloud-fan/self-join.
Authored-by: Wenchen Fan <[email protected]>
Signed-off-by: HyukjinKwon <[email protected]>
(cherry picked from commit b8a6906)
Signed-off-by: HyukjinKwon <[email protected]>
|
Merged to master, branch-3.1 and branch-3.0. |
…te availability
### What changes were proposed in this pull request?
This is a long-standing bug that exists since we have the ambiguous self-join check. A column reference is not ambiguous if it can only come from one join side (e.g. the other side has a project to only pick a few columns). An example is
```
Join(b#1 = 3)
TableScan(t, [a#0, b#1])
Project(a#2)
TableScan(t, [a#2, b#3])
```
It's a self-join, but `b#1` is not ambiguous because it can't come from the right side, which only has column `a`.
### Why are the changes needed?
to not fail valid self-join queries.
### Does this PR introduce _any_ user-facing change?
yea as a bug fix
### How was this patch tested?
a new test
Closes apache#31287 from cloud-fan/self-join.
Authored-by: Wenchen Fan <[email protected]>
Signed-off-by: HyukjinKwon <[email protected]>
What changes were proposed in this pull request?
This is a long-standing bug that exists since we have the ambiguous self-join check. A column reference is not ambiguous if it can only come from one join side (e.g. the other side has a project to only pick a few columns). An example is
It's a self-join, but
b#1is not ambiguous because it can't come from the right side, which only has columna.Why are the changes needed?
to not fail valid self-join queries.
Does this PR introduce any user-facing change?
yea as a bug fix
How was this patch tested?
a new test