Skip to content

Commit c400519

Browse files
cloud-fandongjoon-hyun
authored andcommitted
[SPARK-31956][SQL] Do not fail if there is no ambiguous self join
### What changes were proposed in this pull request? This is a followup of #28695 , to fix the problem completely. The root cause is that, `df("col").as("name")` is not a column reference anymore, and should not have the special column metadata. However, this was broken in ba7adc4#diff-ac415c903887e49486ba542a65eec980L1050-L1053 This PR fixes the regression, by strip the special column metadata in `Column.name`, which is the behavior before #28326 . ### Why are the changes needed? Fix a regression. We shouldn't fail if there is no ambiguous self-join. ### Does this PR introduce _any_ user-facing change? Yes, the query in the test can run now. ### How was this patch tested? updated test Closes #28783 from cloud-fan/self-join. Authored-by: Wenchen Fan <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
1 parent 43063e2 commit c400519

2 files changed

Lines changed: 7 additions & 2 deletions

File tree

sql/core/src/main/scala/org/apache/spark/sql/Column.scala

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1042,7 +1042,7 @@ class Column(val expr: Expression) extends Logging {
10421042
* @since 2.0.0
10431043
*/
10441044
def name(alias: String): Column = withExpr {
1045-
Alias(expr, alias)()
1045+
Alias(normalizedExpr(), alias)()
10461046
}
10471047

10481048
/**

sql/core/src/test/scala/org/apache/spark/sql/DataFrameSelfJoinSuite.scala

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -204,14 +204,19 @@ class DataFrameSelfJoinSuite extends QueryTest with SharedSparkSession {
204204
}
205205
}
206206

207-
test("SPARK-28344: don't fail as ambiguous self join when there is no join") {
207+
test("SPARK-28344: don't fail if there is no ambiguous self join") {
208208
withSQLConf(
209209
SQLConf.FAIL_AMBIGUOUS_SELF_JOIN_ENABLED.key -> "true") {
210210
val df = Seq(1, 1, 2, 2).toDF("a")
211211
val w = Window.partitionBy(df("a"))
212212
checkAnswer(
213213
df.select(df("a").alias("x"), sum(df("a")).over(w)),
214214
Seq((1, 2), (1, 2), (2, 4), (2, 4)).map(Row.fromTuple))
215+
216+
val joined = df.join(spark.range(1)).select($"a")
217+
checkAnswer(
218+
joined.select(joined("a").alias("x"), sum(joined("a")).over(w)),
219+
Seq((1, 2), (1, 2), (2, 4), (2, 4)).map(Row.fromTuple))
215220
}
216221
}
217222
}

0 commit comments

Comments
 (0)