-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-16323][SQL] Add IntegralDivide expression #22395
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 6 commits
649b458
a0c0849
315bb86
02a2369
fca5e62
71255a1
c471bef
3550d29
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -314,6 +314,32 @@ case class Divide(left: Expression, right: Expression) extends DivModLike { | |
| override def evalOperation(left: Any, right: Any): Any = div(left, right) | ||
| } | ||
|
|
||
| @ExpressionDescription( | ||
| usage = "expr1 _FUNC_ expr2 - Divide `expr1` by `expr2` rounded to the long integer.", | ||
| examples = """ | ||
| Examples: | ||
| > SELECT 3 _FUNC_ 2; | ||
| 1 | ||
| """, | ||
| since = "3.0.0") | ||
|
||
| case class IntegralDivide(left: Expression, right: Expression) extends DivModLike { | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Shall we add this to
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think so, please see the discussion at #14036 (comment)
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ur, sorry, but why not? As @viirya suggested, without that, the description added here is not meaningless. spark-sql> describe function 'div';
Function: div not found.
Time taken: 0.016 seconds, Fetched 1 row(s)Also, Hive accepts that like the following. (from Hive 3.1.0) 0: jdbc:hive2://ctr-e138-1518143905142-429335> describe function div;
+----------------------------------------------------+
| tab_name |
+----------------------------------------------------+
| a div b - Divide a by b rounded to the long integer |
+----------------------------------------------------+
0: jdbc:hive2://ctr-e138-1518143905142-429335> select 3 / 2, 3 div 2, `/`(3,2), `div`(3,2);
+------+------+------+------+
| _c0 | _c1 | _c2 | _c3 |
+------+------+------+------+
| 1.5 | 1 | 1.5 | 1 |
+------+------+------+------+
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @dongjoon-hyun because if we add it there, we can write:
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @mgaido91 . I gave you the example of Hive in the above. :)
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ah, sorry I missed the back-ticks. I am adding it, sorry. Thanks. |
||
|
|
||
| override def inputType: AbstractDataType = IntegralType | ||
| override def dataType: DataType = LongType | ||
|
|
||
| override def symbol: String = "/" | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What is the reason we are using
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. used in
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yes, exactly, it is used there |
||
| override def sqlOperator: String = "div" | ||
|
|
||
| private lazy val div: (Any, Any) => Long = left.dataType match { | ||
| case i: IntegralType => | ||
| val divide = i.integral.asInstanceOf[Integral[Any]].quot _ | ||
| val toLong = i.integral.asInstanceOf[Integral[Any]].toLong _ | ||
| (x, y) => toLong(divide(x, y)) | ||
| } | ||
|
|
||
| override def evalOperation(left: Any, right: Any): Any = div(left, right) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sorry I may not recall it very clearly. Can you check Hive and other databases and see if the result type of
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sure, so:
So the behavior is not homogeneous among the RDBMs
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Then I'd prefer always returning long, since it was the behavior before. We can consider changing the behavior in another PR.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. +1 for @cloud-fan 's suggestion.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, I think it is reasonable as that is what we defined: |
||
| } | ||
|
|
||
| @ExpressionDescription( | ||
| usage = "expr1 _FUNC_ expr2 - Returns the remainder after `expr1`/`expr2`.", | ||
| examples = """ | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -143,16 +143,14 @@ class ArithmeticExpressionSuite extends SparkFunSuite with ExpressionEvalHelper | |
| } | ||
| } | ||
|
|
||
| // By fixing SPARK-15776, Divide's inputType is required to be DoubleType of DecimalType. | ||
| // TODO: in future release, we should add a IntegerDivide to support integral types. | ||
| ignore("/ (Divide) for integral type") { | ||
| checkEvaluation(Divide(Literal(1.toByte), Literal(2.toByte)), 0.toByte) | ||
| checkEvaluation(Divide(Literal(1.toShort), Literal(2.toShort)), 0.toShort) | ||
| checkEvaluation(Divide(Literal(1), Literal(2)), 0) | ||
| checkEvaluation(Divide(Literal(1.toLong), Literal(2.toLong)), 0.toLong) | ||
| checkEvaluation(Divide(positiveShortLit, negativeShortLit), 0.toShort) | ||
| checkEvaluation(Divide(positiveIntLit, negativeIntLit), 0) | ||
| checkEvaluation(Divide(positiveLongLit, negativeLongLit), 0L) | ||
| test("/ (Divide) for integral type") { | ||
| checkEvaluation(IntegralDivide(Literal(1.toByte), Literal(2.toByte)), 0L) | ||
| checkEvaluation(IntegralDivide(Literal(1.toShort), Literal(2.toShort)), 0L) | ||
| checkEvaluation(IntegralDivide(Literal(1), Literal(2)), 0L) | ||
| checkEvaluation(IntegralDivide(Literal(1.toLong), Literal(2.toLong)), 0L) | ||
| checkEvaluation(IntegralDivide(positiveShortLit, negativeShortLit), 0L) | ||
| checkEvaluation(IntegralDivide(positiveIntLit, negativeIntLit), 0L) | ||
| checkEvaluation(IntegralDivide(positiveLongLit, negativeLongLit), 0L) | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could you add a test case for For now, this PR seems to follow the behavior of Spark scala> sql("select 2 / 0, 2 div 0").show()
+---------------------------------------+---------+
|(CAST(2 AS DOUBLE) / CAST(0 AS DOUBLE))|(2 div 0)|
+---------------------------------------+---------+
| null| null|
+---------------------------------------+---------+0: jdbc:hive2://ctr-e138-1518143905142-477481> select 2 / 0;
+-------+
| _c0 |
+-------+
| NULL |
+-------+
0: jdbc:hive2://ctr-e138-1518143905142-477481> select 2 div 0;
Error: Error while compiling statement: FAILED:
SemanticException [Error 10014]: Line 1:7 Wrong arguments '0':
org.apache.hadoop.hive.ql.metadata.HiveException:
Unable to execute method public org.apache.hadoop.io.LongWritable org.apache.hadoop.hive.ql.udf.UDFOPLongDivide.evaluate(org.apache.hadoop.io.LongWritable,org.apache.hadoop.io.LongWritable)
with arguments {2,0}:/ by zero (state=42000,code=10014)
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. good catch! We should clearly define the behavior in the doc string too.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The test for this case is present in
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we don't really need to change current behavior, but it is worth describing this in the doc string.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree with you @viirya. I updated the doc string with the current behavior. Thanks. |
||
| } | ||
|
|
||
| test("% (Remainder)") { | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The failure looks like relevant.