Fix return type for sum(REAL) Spark aggregate#9818
Conversation
✅ Deploy Preview for meta-velox canceled.
|
|
@mbasmanova Can you help to review? Thanks. |
|
@mbasmanova it's a quick bug fix for window operator. It causes Gluten failure in some users. |
| } | ||
|
|
||
| TEST_F(SumAggregationTest, sumFloat) { | ||
| auto data = makeRowVector({makeFlatVector<float>({2.00, 1.00})}); |
There was a problem hiding this comment.
Did this test fail before the change?
There was a problem hiding this comment.
@FelixYBW I wonder what was the failure. I tried running this test on 'main' and it passed.
There was a problem hiding this comment.
:( @JkSelf Did you do the test? Looks the UT can't detect the type mismatch.
The PR does solved the customer issue but the issue is caused by window function validation.
There was a problem hiding this comment.
When running the sum(float) aggregate window function with gluten, the following error occurs:
Error Source: USER
Error Code: INVALID_ARGUMENT
Reason: Unexpected return type for window function sum(REAL). Expected REAL. Got DOUBLE.
Retriable: False
This is due to a mismatch between the function signatures in Spark and Velox. In Spark, the sum(real) function is expected to return a double type, whereas in Velox, the same function is registered to return a real type, leading to incompatibility. It is hard to reproduce this exception in Velox. I have modified the unit test to trigger an overflow error if the current patch is not applied. Please help to review again. Thanks.
| } | ||
|
|
||
| TEST_F(SumAggregationTest, sumFloat) { | ||
| auto data = |
There was a problem hiding this comment.
Does this test fail without the change? Looks like it is same with Presto test which result type is float.
Wonder do we need to backport SumTest to sparksql? https://github.com/facebookincubator/velox/blob/main/velox/functions/prestosql/aggregates/tests/SumTest.cpp#L78
There was a problem hiding this comment.
@jinchengchenghh
Without this patch, the current test will throw an overflow error.
I believe there is no need to test all SumTests again, as the only differences between SumAggregate in Spark SQL and Presto now are the decimal type and the conversion of sum(real) -> double in this PR. The registration of other functions is the same. Of course, if deemed necessary, we can open another PR later to conduct separate tests.
|
@pedroerp has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
|
Conbench analyzed the 1 benchmark run on commit There were no benchmark performance regressions. 🎉 The full Conbench report has more details. |
The return type of
sum(real)in spark sql should bedouble, notrealhttps://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Sum.scala#L81.