-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Description
Describe the bug
If you use the DataFrame API or LogicalPlanBuilder to group using an expression that is aliased to the same name as an existing column (which you can't do directly via SQL) it will error with Schema error: Schema contains duplicate unqualified field name
To Reproduce
#[tokio::test]
async fn test_aggregate_alias() -> Result<()> {
let df = test_table().await?;
let df = df
// GROUP BY `c2 + 1`
.aggregate(vec![col("c2") + lit(1)], vec![])?
// SELECT `c2 + 1` as c2
.select(vec![(col("c2") + lit(1)).alias("c2")])?
// GROUP BY c2 as "c2" (alias in expr is not supported by SQL)
.aggregate(vec![col("c2").alias("c2")], vec![])?;
let df_results = df.collect().await?;
#[rustfmt::skip]
assert_batches_sorted_eq!([
"+----+",
"| c2 |",
"+----+",
"| 2 |",
"| 3 |",
"| 4 |",
"| 5 |",
"| 6 |",
"+----+",
],
&df_results
);
Ok(())
}Will error with
Error: SchemaError(DuplicateUnqualifiedField { name: "c2" })
As the code in XXX will introduce a duplicate copy of c2
Expected behavior
The test should pass
Additional context
This is a regression introduced in #8356. The test passes prior to that PR and fails after warts
Error querying: Tonic(Status { code: InvalidArgument, message: "Error while planning query: Schema error: Schema contains duplicate unqualified field name time", metadata: MetadataMap { headers: {"content-type": "application/grpc", "date": "Fri, 22 Dec 2023 15:46:50 GMT", "content-length": "0"} }, source: None })