-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-26535][SQL] Parse literals as DOUBLE instead of DECIMALS #23468
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #100796 has finished for PR 23468 at commit
|
This argument is not enough. For example, Presto even treats I would prefer to documenting our current behavior first. Update our Spark SQL doc? Below is some references, |
|
@gatorsmile thanks for your comment. Let me cc also @cloud-fan, @dilipbiswal and @rxin who partecipated in the discussion related to this too. I agree on improving the doc on this. If you are fine with that, I'll create a JIRA and a PR for that ASAP. The main argument for this is: we are basing our decimal implementation on what Hive and MSSQL do. But we have mainly 2 differences with them at the moment:
|
|
Hi, @mgaido91 and @gatorsmile . First of all, Hive starts to use hive> select version();
OK
3.1.1 rf4e0529634b6231a0072295da48af466cf2f10b7
Time taken: 0.089 seconds, Fetched: 1 row(s)
hive> explain select 2.3;
OK
STAGE DEPENDENCIES:
Stage-0 is a root stage
STAGE PLANS:
Stage: Stage-0
Fetch Operator
limit: -1
Processor Tree:
TableScan
alias: _dummy_table
Row Limit Per Split: 1
Statistics: Num rows: 1 Data size: 10 Basic stats: COMPLETE Column stats: COMPLETE
Select Operator
expressions: 2.3 (type: decimal(2,1))
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 112 Basic stats: COMPLETE Column stats: COMPLETE
ListSink |
|
Test build #100807 has finished for PR 23468 at commit
|
|
thanks for your comment @dongjoon-hyun. I wasn't aware of the behavior change in Hive. Then, if we don't want to do this, I think #21599 becomes even more important and we cannot forbid negative scales in decimals, because we would not be able anymore to represent numbers such as We may - instead - here try and reduce the cases when we use negative scales, so that |
|
If we are going to support negative scale, isn't |
|
@cloud-fan it depends on what we consider as "better". It is better because requires less precision avoiding potential truncation with subsequent operations. It is worse because if the output is written to a datasource which doesn't support decimals with negative scales it doesn't work. Let me close this and let's address #21599 first, then we can get back to this later if needed. Thanks. |
What changes were proposed in this pull request?
In Spark 2.x, literal values are parsed as
DECIMALs. Many RDBMS, instead, treat them asDOUBLE. Among those, we can name Presto, Hive and MSSQL. The last 2 are particularly important for us, because they are the 2 which we used as reference for our implementation of decimal operations.In the current scenario, specific constant - eg.
1e10- are parsed asDECIMALs with negative scale. This is a case which is not handled properly by Spark currently and there is an ongoing PR for fixing the operations rules for this case. Despite this PR doesn't forbid completely decimals with negative scale, anyway it reduces considerably the cases when this can happen, resolving naturally the problem mentioned above.The PR introduces the config option
spark.sql.legacy.literals.asDecimalwhich can be used in order to restore the previous behaviorHow was this patch tested?
modified/enhanced UTs