-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-33498][SQL] Datetime parsing should fail if the input string can't be parsed, or the pattern string is invalid #30442
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@cloud-fan FYI. |
|
Test build #131418 has finished for PR 30442 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
cc @MaxGekk too FYI |
docs/sql-ref-ansi-compliance.md
Outdated
| - `element_at`: This function throws `ArrayIndexOutOfBoundsException` if using invalid indices. | ||
| - `element_at`: This function throws `NoSuchElementException` if key does not exist in map. | ||
| - `elt`: This function throws `ArrayIndexOutOfBoundsException` if using invalid indices. | ||
| - `to_date` This function should fail with Exception if the input string can't be parsed, or the pattern string is invalid. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would call it an exception instead of Exception.
e6f5634 to
67f2d41
Compare
|
Kubernetes integration test starting |
|
Test build #131529 has finished for PR 30442 at commit
|
|
Kubernetes integration test status success |
docs/sql-ref-ansi-compliance.md
Outdated
| The behavior of some SQL operators can be different under ANSI mode (`spark.sql.ansi.enabled=true`). | ||
| - `array_col[index]`: This operator throws `ArrayIndexOutOfBoundsException` if using invalid indices. | ||
| - `map_col[key]`: This operator throws `NoSuchElementException` if key does not exist in map. | ||
| - `cast to timestamp`: This operator should fail with an exception if the input string can't be parsed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: CAST(string_col AS TIMESTAMP) to be user-facing.
| } | ||
| } | ||
|
|
||
| def stringToTimestamp(s: UTF8String, timeZoneId: ZoneId, ansiEnabled: Boolean): Option[Long] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's better to add a stringToTimestampAnsi method. Non-ansi code path should call the original stringToTimestamp method.
| } | ||
| } | ||
|
|
||
| test("SPARK-33498: TimestampType cast with parseError") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it duplicated with CastSuiteBase?
| UnixTimestamp(Literal("2020-01-27T20:06:11.847"), Literal("yyyy-MM-dd HH:mm:ss.SSS")), | ||
| UnixTimestamp(Literal("Unparseable"), Literal("yyyy-MM-dd HH:mm:ss.SSS")), | ||
| ToUnixTimestamp(Literal("2020-01-27T20:06:11.847"), Literal("yyyy-MM-dd HH:mm:ss.SSS")), | ||
| ToUnixTimestamp(Literal("Unparseable"), Literal("yyyy-MM-dd HH:mm:ss.SSS")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we test one more case that throws SparkUpgradeException?
|
Test build #131592 has finished for PR 30442 at commit
|
67f2d41 to
0417977
Compare
|
Test build #131635 has finished for PR 30442 at commit
|
0417977 to
9b76d6a
Compare
|
Test build #131656 has finished for PR 30442 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should call stringToTimestampAnsi when ansi mode is on, and call the original method when ansi mode is off.
… runtime exception when parsing to timestamp fail with ANSI mode on. Change-Id: Ie76d494906c7615871860c89602896c64ed2d9d6
… multi thread and not respect SQLConf.get. Change-Id: I60be4963378de992c763cec952af4c03acbcc99f
Change-Id: Iafb1e1abfbf19e063b44f358be729515bea3a6f0
9b76d6a to
140081d
Compare
Change-Id: Iddbbe72295a0db24df4c020e79d74b8bdaa95082
|
Test build #131739 has finished for PR 30442 at commit
|
|
Test build #131744 has finished for PR 30442 at commit
|
|
@cloud-fan Test passed. |
| } | ||
| } | ||
|
|
||
| def stringToTimestampAnsi(s: UTF8String, timeZoneId: ZoneId): Option[Long] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this method still need to return option?
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
Show resolved
Hide resolved
| ctx.addReferenceObj("zoneId", zoneId, zoneIdClass.getName), | ||
| zoneIdClass) | ||
| val longOpt = ctx.freshVariable("longOpt", classOf[Option[Long]]) | ||
| val stringToTimestampFunc = if (ansiEnabled) "stringToTimestampAnsi" else "stringToTimestamp" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If stringToTimestampAnsi returns Long directly, the codegen for it will be largely simplified.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
| s"Cannot cast $str to TimestampType.") | ||
| } | ||
|
|
||
| withSQLConf(SQLConf.ANSI_ENABLED.key -> currentAnsiEnabled.toString) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to be more robust:
val activeConf = conf
new ParVector...
SQLConf.withExistingConf(activeConf) ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
| } | ||
| } | ||
|
|
||
| test("ANSI mode: timestamp type casting with parse error") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this a separate test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
merged in one test
| struct<> | ||
| -- !query output | ||
|
|
||
| org.apache.spark.sql.AnalysisException |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not related to this PR but it's weird that the data preparation step for a test is broken. We should fix it later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems this datatimes table was not used at all. I could refine this test case in new PR.
|
Test build #131865 has finished for PR 30442 at commit
|
Change-Id: I2bfe150923e5a4d14ac6a44f9a2acff9698d5e5d
4706576 to
e4fe5ee
Compare
| checkCastWithParseError("2015-03-18T12:03:17-0:70") | ||
|
|
||
| val input = "abdef" | ||
| checkExceptionInExpression[DateTimeException]( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it just checkCastWithParseError("abdef")?
|
GA passed, merging to master! |
|
Test build #131879 has finished for PR 30442 at commit
|
What changes were proposed in this pull request?
Datetime parsing should fail if the input string can't be parsed, or the pattern string is invalid, when ANSI mode is enable. This patch should update GetTimeStamp, UnixTimeStamp, ToUnixTimeStamp and Cast.
Why are the changes needed?
For ANSI mode.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Added UT and Existing UT.