Skip to content

Conversation

@MaxGekk
Copy link
Member

@MaxGekk MaxGekk commented Apr 19, 2020

What changes were proposed in this pull request?

Convert java.time.LocalDate to java.sql.Date in pushed down filters to ORC datasource when Java 8 time API enabled.

Closes #28272

Why are the changes needed?

The changes fix the exception raised while pushing date filters when spark.sql.datetime.java8API.enabled is set to true:

Wrong value class java.time.LocalDate for DATE.EQUALS leaf
java.lang.IllegalArgumentException: Wrong value class java.time.LocalDate for DATE.EQUALS leaf
	at org.apache.hadoop.hive.ql.io.sarg.SearchArgumentImpl$PredicateLeafImpl.checkLiteralType(SearchArgumentImpl.java:192)
	at org.apache.hadoop.hive.ql.io.sarg.SearchArgumentImpl$PredicateLeafImpl.<init>(SearchArgumentImpl.java:75)
	at org.apache.hadoop.hive.ql.io.sarg.SearchArgumentImpl$BuilderImpl.equals(SearchArgumentImpl.java:352)
	at org.apache.spark.sql.execution.datasources.orc.OrcFilters$.buildLeafSearchArgument(OrcFilters.scala:229)

Does this PR introduce any user-facing change?

Yes

How was this patch tested?

Added tests to OrcFilterSuite.

@SparkQA
Copy link

SparkQA commented Apr 19, 2020

Test build #121482 has finished for PR 28261 at commit 392c0e6.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@MaxGekk
Copy link
Member Author

MaxGekk commented Apr 19, 2020

jenkins, retest this, please

@SparkQA
Copy link

SparkQA commented Apr 20, 2020

Test build #121485 has finished for PR 28261 at commit 392c0e6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@MaxGekk
Copy link
Member Author

MaxGekk commented Apr 20, 2020

@dongjoon-hyun @cloud-fan @HyukjinKwon Could you take a look at the PR, please.

val dates = Seq("2017-08-18", "2017-08-19", "2017-08-20", "2017-08-21").map { day =>
LocalDate.parse(day)
}
withSQLConf(SQLConf.DATETIME_JAVA8API_ENABLED.key -> "true") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes me realize that this config can break data source v1/v2 as it changes the value in Filter.

Shall we fix DataSourceAnalysis.translateLeafNodeFilter instead to always return java 7 Date/Timestamp?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cloud-fan Here is the PR #28272

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dongjoon-hyun ^^ The PR covers this PR

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Thanks, @MaxGekk !

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Apr 20, 2020

Thank you for pinging me, @MaxGekk .
Could you handle the following together?

  • sql/core/v1.2/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFilters.scala
  • sql/core/v1.2/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcFilterSuite.scala

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-31489][SQL] Fix pushing down filters with java.time.LocalDate values in ORC [SPARK-31489][SQL][test-hive1.2] Fix pushing down filters with java.time.LocalDate values in ORC Apr 20, 2020
@dongjoon-hyun
Copy link
Member

Retest this please.

}
}

test("filter pushdown - local date") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit. test("SPARK-31489: filter.

@dongjoon-hyun
Copy link
Member

For this PR, please pass the PR builders with both Hive 1.2 and Hive 2.3.

@SparkQA
Copy link

SparkQA commented Apr 20, 2020

Test build #121523 has finished for PR 28261 at commit 392c0e6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@MaxGekk
Copy link
Member Author

MaxGekk commented Apr 23, 2020

jenkins, retest this, please

@MaxGekk MaxGekk changed the title [SPARK-31489][SQL][test-hive1.2] Fix pushing down filters with java.time.LocalDate values in ORC [SPARK-31489][SQL] Fix pushing down filters with java.time.LocalDate values in ORC Apr 23, 2020
@SparkQA
Copy link

SparkQA commented Apr 23, 2020

Test build #121696 has finished for PR 28261 at commit e5952e1.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 23, 2020

Test build #121685 has finished for PR 28261 at commit 392c0e6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

retest this please

@SparkQA
Copy link

SparkQA commented Apr 24, 2020

Test build #121719 has finished for PR 28261 at commit e5952e1.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@MaxGekk
Copy link
Member Author

MaxGekk commented Apr 24, 2020

jenkins, retest this, please

@SparkQA
Copy link

SparkQA commented Apr 24, 2020

Test build #121735 has finished for PR 28261 at commit e5952e1.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@MaxGekk
Copy link
Member Author

MaxGekk commented Apr 24, 2020

jenkins, retest this, please

@SparkQA
Copy link

SparkQA commented Apr 24, 2020

Test build #121754 has finished for PR 28261 at commit e5952e1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@MaxGekk MaxGekk changed the title [SPARK-31489][SQL] Fix pushing down filters with java.time.LocalDate values in ORC [SPARK-31489][SQL][test-hive1.2] Fix pushing down filters with java.time.LocalDate values in ORC Apr 25, 2020
@MaxGekk
Copy link
Member Author

MaxGekk commented Apr 25, 2020

jenkins, retest this, please

@SparkQA
Copy link

SparkQA commented Apr 25, 2020

Test build #121813 has finished for PR 28261 at commit e5952e1.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@MaxGekk
Copy link
Member Author

MaxGekk commented Apr 26, 2020

jenkins, retest this, please

@SparkQA
Copy link

SparkQA commented Apr 26, 2020

Test build #121833 has finished for PR 28261 at commit e5952e1.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 26, 2020

Test build #121836 has finished for PR 28261 at commit aa57be0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@MaxGekk MaxGekk changed the title [SPARK-31489][SQL][test-hive1.2] Fix pushing down filters with java.time.LocalDate values in ORC [SPARK-31489][SQL] Fix pushing down filters with java.time.LocalDate values in ORC Apr 26, 2020
@MaxGekk
Copy link
Member Author

MaxGekk commented Apr 26, 2020

jenkins, retest this, please

@SparkQA
Copy link

SparkQA commented Apr 26, 2020

Test build #121847 has finished for PR 28261 at commit aa57be0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@MaxGekk
Copy link
Member Author

MaxGekk commented Apr 26, 2020

This PR has been tested for hive-1.2 and hive-2.3, and it is ready for review. @cloud-fan @dongjoon-hyun Please, take a look at it.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Thank you, @MaxGekk and @cloud-fan .
Merged to master/3.0.

dongjoon-hyun pushed a commit that referenced this pull request Apr 26, 2020
…` values in ORC

### What changes were proposed in this pull request?
Convert `java.time.LocalDate` to `java.sql.Date` in pushed down filters to ORC datasource when Java 8 time API enabled.

Closes #28272

### Why are the changes needed?
The changes fix the exception raised while pushing date filters when `spark.sql.datetime.java8API.enabled` is set to `true`:
```
Wrong value class java.time.LocalDate for DATE.EQUALS leaf
java.lang.IllegalArgumentException: Wrong value class java.time.LocalDate for DATE.EQUALS leaf
	at org.apache.hadoop.hive.ql.io.sarg.SearchArgumentImpl$PredicateLeafImpl.checkLiteralType(SearchArgumentImpl.java:192)
	at org.apache.hadoop.hive.ql.io.sarg.SearchArgumentImpl$PredicateLeafImpl.<init>(SearchArgumentImpl.java:75)
	at org.apache.hadoop.hive.ql.io.sarg.SearchArgumentImpl$BuilderImpl.equals(SearchArgumentImpl.java:352)
	at org.apache.spark.sql.execution.datasources.orc.OrcFilters$.buildLeafSearchArgument(OrcFilters.scala:229)
```

### Does this PR introduce any user-facing change?
Yes

### How was this patch tested?
Added tests to `OrcFilterSuite`.

Closes #28261 from MaxGekk/orc-date-filter-pushdown.

Authored-by: Max Gekk <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
(cherry picked from commit bd139bd)
Signed-off-by: Dongjoon Hyun <[email protected]>
@MaxGekk MaxGekk deleted the orc-date-filter-pushdown branch June 5, 2020 19:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants