Skip to content

Conversation

@LuciferYang
Copy link
Contributor

What changes were proposed in this pull request?

This PR adds a corresponding Parquet Data Page V2 test scenario for each Parquet Data Page V1 test scenario to DataSourceReadBenchmark.

Why are the changes needed?

Add micro benchmark for Parquet Data Page V2.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Pass GA

@github-actions github-actions bot added the SQL label Jan 17, 2022
@LuciferYang
Copy link
Contributor Author

cc @dongjoon-hyun @sunchao @viirya I give this pr to adds a corresponding Parquet Data Page V2 test scenario for each Parquet Data Page V1 test scenario to DataSourceReadBenchmark, if @parthchandra plans to do this part of work in SPARK-37912, I will close this pr.

@parthchandra
Copy link
Contributor

I was planning to add these in subsequent PR)s) which includes implementation of the remaining encodings for Parquet V2.
But since this is done, and supersedes my PR, let's merge this one and I can close my PR (or rename it to just address the review comments for the original delta binary PR).

Copy link
Member

@sunchao sunchao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

val enableOffHeapColumnVector = spark.sessionState.conf.offHeapColumnVectorEnabled
val vectorizedReaderBatchSize = spark.sessionState.conf.parquetVectorizedReaderBatchSize
// Decoding in vectorized but having the reader return rows.
parquetReaderBenchmark
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think we can just put this in the first withParquetVersions above?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4611bbd fix this

@LuciferYang
Copy link
Contributor Author

I was planning to add these in subsequent PR)s) which includes implementation of the remaining encodings for Parquet V2.
But since this is done, and supersedes my PR, let's merge this one and I can close my PR (or rename it to just address the review comments for the original delta binary PR).

thanks ~ @parthchandra

@sunchao sunchao closed this in c13d019 Jan 19, 2022
@sunchao
Copy link
Member

sunchao commented Jan 19, 2022

Thanks @LuciferYang , merged to master!

@LuciferYang
Copy link
Contributor Author

thanks all

@dongjoon-hyun
Copy link
Member

+1, late LGTM. Thank you all!

@LuciferYang LuciferYang deleted the SPARK-37928 branch October 22, 2023 07:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants