-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-32346][SQL] Support filters pushdown in Avro datasource #29145
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #126053 has finished for PR 29145 at commit
|
|
Test build #126063 has finished for PR 29145 at commit
|
|
Test build #126072 has finished for PR 29145 at commit
|
|
Test build #126074 has finished for PR 29145 at commit
|
|
Test build #126114 has finished for PR 29145 at commit
|
|
Test build #126115 has finished for PR 29145 at commit
|
|
Test build #126128 has finished for PR 29145 at commit
|
|
@gengliangwang @dongjoon-hyun @HyukjinKwon @cloud-fan Please, take a look at this PR. |
|
Thank you for pinging me, @MaxGekk . |
|
@dongjoon-hyun I am looking forward to review comments from you. |
|
Test build #126199 has finished for PR 29145 at commit
|
external/avro/src/main/scala/org/apache/spark/sql/avro/AvroDeserializer.scala
Show resolved
Hide resolved
external/avro/src/main/scala/org/apache/spark/sql/avro/AvroFileFormat.scala
Outdated
Show resolved
Hide resolved
|
Test build #126255 has finished for PR 29145 at commit
|
|
retest this please |
|
Test build #126303 has finished for PR 29145 at commit
|
|
jenkins, retest this, please |
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/UnivocityParser.scala
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/OrderedFilters.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/OrderedFilters.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/OrderedFilters.scala
Show resolved
Hide resolved
|
Test build #126310 has finished for PR 29145 at commit
|
|
Test build #126339 has finished for PR 29145 at commit
|
…checking out ### What changes were proposed in this pull request? Refactoring of `JsonFilters`: - Add an assert to the `skipRow` method to check the input `index` - Move checking of the SQL config `spark.sql.json.filterPushdown.enabled` from `JsonFilters` to `JacksonParser`. ### Why are the changes needed? 1. The assert should catch incorrect usage of `JsonFilters` 2. The config checking out of `JsonFilters` makes it consistent with `OrderedFilters` (see #29145). 3. `JsonFilters` can be used by other datasource in the future and don't depend from the JSON configs. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? By existing tests suites: ``` $ build/sbt "sql/test:testOnly org.apache.spark.sql.execution.datasources.json.*" $ build/sbt "test:testOnly org.apache.spark.sql.catalyst.json.*" ``` Closes #29206 from MaxGekk/json-filters-pushdown-followup. Authored-by: Max Gekk <[email protected]> Signed-off-by: HyukjinKwon <[email protected]>
|
@cloud-fan Please, review this PR. |
|
@gengliangwang Are you ok with this PR? |
external/avro/src/test/scala/org/apache/spark/sql/avro/AvroCatalystDataConversionSuite.scala
Show resolved
Hide resolved
gengliangwang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM except for one comment
|
Thanks, merging to master |
What changes were proposed in this pull request?
In the PR, I propose to support pushed down filters in Avro datasource V1 and V2.
spark.sql.avro.filterPushdown.enabledto control filters pushdown to Avro datasource. It is on by default.CSVFilterstoOrderedFilters.OrderedFiltersis used inAvroFileFormat(DSv1) and inAvroPartitionReaderFactory(DSv2)AvroDeserializerto return None from thedeserializemethod when pushdown filters returnfalse.Why are the changes needed?
The changes improve performance on synthetic benchmarks up to 2 times on JDK 11:
Does this PR introduce any user-facing change?
No
How was this patch tested?
AvroCatalystDataConversionSuiteandAvroSuiteAvroReadBenchmarkusing Amazon EC2:sudo add-apt-repository ppa:openjdk-r/ppa&sudo apt install openjdk-11-jdkand
./dev/run-benchmarks: