Skip to content

Conversation

@leanken-zz
Copy link
Contributor

What changes were proposed in this pull request?

Datetime parsing should fail if the input string can't be parsed, or the pattern string is invalid, when ANSI mode is enable. This patch should update GetTimeStamp, UnixTimeStamp, ToUnixTimeStamp and Cast.

Why are the changes needed?

For ANSI mode.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Added UT and Existing UT.

@leanken-zz
Copy link
Contributor Author

leanken-zz commented Nov 20, 2020

@cloud-fan FYI.

@SparkQA
Copy link

SparkQA commented Nov 20, 2020

Test build #131418 has finished for PR 30442 at commit e6f5634.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class UnixTimestamp(

@SparkQA
Copy link

SparkQA commented Nov 20, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/36024/

@SparkQA
Copy link

SparkQA commented Nov 20, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/36024/

@HyukjinKwon
Copy link
Member

cc @MaxGekk too FYI

- `element_at`: This function throws `ArrayIndexOutOfBoundsException` if using invalid indices.
- `element_at`: This function throws `NoSuchElementException` if key does not exist in map.
- `elt`: This function throws `ArrayIndexOutOfBoundsException` if using invalid indices.
- `to_date` This function should fail with Exception if the input string can't be parsed, or the pattern string is invalid.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would call it an exception instead of Exception.

@SparkQA
Copy link

SparkQA commented Nov 23, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/36131/

@SparkQA
Copy link

SparkQA commented Nov 23, 2020

Test build #131529 has finished for PR 30442 at commit 67f2d41.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 23, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/36131/

The behavior of some SQL operators can be different under ANSI mode (`spark.sql.ansi.enabled=true`).
- `array_col[index]`: This operator throws `ArrayIndexOutOfBoundsException` if using invalid indices.
- `map_col[key]`: This operator throws `NoSuchElementException` if key does not exist in map.
- `cast to timestamp`: This operator should fail with an exception if the input string can't be parsed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: CAST(string_col AS TIMESTAMP) to be user-facing.

}
}

def stringToTimestamp(s: UTF8String, timeZoneId: ZoneId, ansiEnabled: Boolean): Option[Long] = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to add a stringToTimestampAnsi method. Non-ansi code path should call the original stringToTimestamp method.

}
}

test("SPARK-33498: TimestampType cast with parseError") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it duplicated with CastSuiteBase?

UnixTimestamp(Literal("2020-01-27T20:06:11.847"), Literal("yyyy-MM-dd HH:mm:ss.SSS")),
UnixTimestamp(Literal("Unparseable"), Literal("yyyy-MM-dd HH:mm:ss.SSS")),
ToUnixTimestamp(Literal("2020-01-27T20:06:11.847"), Literal("yyyy-MM-dd HH:mm:ss.SSS")),
ToUnixTimestamp(Literal("Unparseable"), Literal("yyyy-MM-dd HH:mm:ss.SSS"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we test one more case that throws SparkUpgradeException?

@SparkQA
Copy link

SparkQA commented Nov 24, 2020

Test build #131592 has finished for PR 30442 at commit 67f2d41.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 24, 2020

Test build #131635 has finished for PR 30442 at commit 0417977.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 24, 2020

Test build #131656 has finished for PR 30442 at commit 9b76d6a.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should call stringToTimestampAnsi when ansi mode is on, and call the original method when ansi mode is off.

… runtime exception when parsing to timestamp fail with ANSI mode on.

Change-Id: Ie76d494906c7615871860c89602896c64ed2d9d6
… multi thread and not respect SQLConf.get.

Change-Id: I60be4963378de992c763cec952af4c03acbcc99f
Change-Id: Iafb1e1abfbf19e063b44f358be729515bea3a6f0
Change-Id: Iddbbe72295a0db24df4c020e79d74b8bdaa95082
@SparkQA
Copy link

SparkQA commented Nov 25, 2020

Test build #131739 has finished for PR 30442 at commit 140081d.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 25, 2020

Test build #131744 has finished for PR 30442 at commit 5ab59d4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@leanken-zz
Copy link
Contributor Author

@cloud-fan Test passed.

}
}

def stringToTimestampAnsi(s: UTF8String, timeZoneId: ZoneId): Option[Long] = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this method still need to return option?

ctx.addReferenceObj("zoneId", zoneId, zoneIdClass.getName),
zoneIdClass)
val longOpt = ctx.freshVariable("longOpt", classOf[Option[Long]])
val stringToTimestampFunc = if (ansiEnabled) "stringToTimestampAnsi" else "stringToTimestamp"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If stringToTimestampAnsi returns Long directly, the codegen for it will be largely simplified.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

s"Cannot cast $str to TimestampType.")
}

withSQLConf(SQLConf.ANSI_ENABLED.key -> currentAnsiEnabled.toString) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to be more robust:

val activeConf = conf
new ParVector...
  SQLConf.withExistingConf(activeConf) ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

}
}

test("ANSI mode: timestamp type casting with parse error") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this a separate test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

merged in one test

struct<>
-- !query output

org.apache.spark.sql.AnalysisException
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not related to this PR but it's weird that the data preparation step for a test is broken. We should fix it later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems this datatimes table was not used at all. I could refine this test case in new PR.

@SparkQA
Copy link

SparkQA commented Nov 27, 2020

Test build #131865 has finished for PR 30442 at commit 4706576.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Change-Id: I2bfe150923e5a4d14ac6a44f9a2acff9698d5e5d
checkCastWithParseError("2015-03-18T12:03:17-0:70")

val input = "abdef"
checkExceptionInExpression[DateTimeException](
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it just checkCastWithParseError("abdef")?

@cloud-fan
Copy link
Contributor

GA passed, merging to master!

@cloud-fan cloud-fan closed this in b9f2f78 Nov 27, 2020
@SparkQA
Copy link

SparkQA commented Nov 27, 2020

Test build #131879 has finished for PR 30442 at commit e4fe5ee.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants