Skip to content

Commit 17974e2

Browse files
MaxGekkdongjoon-hyun
authored andcommitted
[SPARK-28015][SQL] Check stringToDate() consumes entire input for the yyyy and yyyy-[m]m formats
Fix `stringToDate()` for the formats `yyyy` and `yyyy-[m]m` that assumes there are no additional chars after the last components `yyyy` and `[m]m`. In the PR, I propose to check that entire input was consumed for the formats. After the fix, the input `1999 08 01` will be invalid because it matches to the pattern `yyyy` but the strings contains additional chars ` 08 01`. Since Spark 1.6.3 ~ 2.4.3, the behavior is the same. ``` spark-sql> SELECT CAST('1999 08 01' AS DATE); 1999-01-01 ``` This PR makes it return NULL like Hive. ``` spark-sql> SELECT CAST('1999 08 01' AS DATE); NULL ``` Added new checks to `DateTimeUtilsSuite` for the `1999 08 01` and `1999 08` inputs. Closes #25097 from MaxGekk/spark-28015-invalid-date-format. Authored-by: Maxim Gekk <maxim.gekk@databricks.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
1 parent 1abac14 commit 17974e2

2 files changed

Lines changed: 10 additions & 0 deletions

File tree

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -498,6 +498,10 @@ object DateTimeUtils {
498498
// year should have exact four digits
499499
return None
500500
}
501+
if (i < 2 && j < bytes.length) {
502+
// For the `yyyy` and `yyyy-[m]m` formats, entire input must be consumed.
503+
return None
504+
}
501505
segments(i) = currentSegmentValue
502506
if (isInvalidDate(segments(0), segments(1), segments(2))) {
503507
return None

sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/DateTimeUtilsSuite.scala

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -160,6 +160,9 @@ class DateTimeUtilsSuite extends SparkFunSuite {
160160
assert(stringToDate(UTF8String.fromString("015-03-18")).isEmpty)
161161
assert(stringToDate(UTF8String.fromString("015")).isEmpty)
162162
assert(stringToDate(UTF8String.fromString("02015")).isEmpty)
163+
assert(stringToDate(UTF8String.fromString("1999 08 01")).isEmpty)
164+
assert(stringToDate(UTF8String.fromString("1999-08 01")).isEmpty)
165+
assert(stringToDate(UTF8String.fromString("1999 08")).isEmpty)
163166
}
164167

165168
test("string to time") {
@@ -336,6 +339,9 @@ class DateTimeUtilsSuite extends SparkFunSuite {
336339
checkStringToTimestamp("2015-03-18T12:03.17-20:0", None)
337340
checkStringToTimestamp("2015-03-18T12:03.17-0:70", None)
338341
checkStringToTimestamp("2015-03-18T12:03.17-1:0:0", None)
342+
checkStringToTimestamp("1999 08 01", None)
343+
checkStringToTimestamp("1999-08 01", None)
344+
checkStringToTimestamp("1999 08", None)
339345

340346
// Truncating the fractional seconds
341347
c = Calendar.getInstance(TimeZone.getTimeZone("GMT+00:00"))

0 commit comments

Comments
 (0)