-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Ensure page encoding statistics are written to Parquet file #7643
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
cc @JigaoLuo |
|
Thanks I will try it once merged |
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @etseidl
I think the code looks good but I don't think the test covers the code
Specifically, when I reverted the code change
diff --git a/parquet/src/file/writer.rs b/parquet/src/file/writer.rs
index 5961f10ec5..c05bd2d5c8 100644
--- a/parquet/src/file/writer.rs
+++ b/parquet/src/file/writer.rs
@@ -689,9 +689,6 @@ impl<'a, W: Write + Send> SerializedRowGroupWriter<'a, W> {
if let Some(statistics) = metadata.statistics() {
builder = builder.set_statistics(statistics.clone())
}
- if let Some(page_encoding_stats) = metadata.page_encoding_stats() {
- builder = builder.set_page_encoding_stats(page_encoding_stats.clone())
- }
builder = self.set_column_crypto_metadata(builder, &metadata);
close.metadata = builder.build()?;The test still passes 🤔
$ cargo test -p parquet --features=arrow -- test_page_encoding_statistics_roundtrip
Compiling parquet v55.1.0 (/Users/andrewlamb/Software/arrow-rs/parquet)
Finished `test` profile [unoptimized + debuginfo] target(s) in 5.74s
Running unittests src/lib.rs (target/debug/deps/parquet-1f71ccbaab8d67d7)
running 1 test
test file::writer::tests::test_page_encoding_statistics_roundtrip ... ok
...
😱 Let me see if I can get something to trigger the bug then. Odd. |
|
Ugh. I need to figure out how |
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @etseidl -- looks great to me and thank you for the fix
| } | ||
|
|
||
| #[test] | ||
| fn test_page_encoding_statistics_roundtrip() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now when I run this test without the code change it fails like this (as expected)
assertion failed: chunk_meta.encoding_stats.is_some()
thread 'arrow::arrow_writer::tests::test_page_encoding_statistics_roundtrip' panicked at parquet/src/arrow/arrow_writer/mod.rs:3865:9:
assertion failed: chunk_meta.encoding_stats.is_some()
stack backtrace:
0: __rustc::rust_begin_unwind
at /rustc/17067e9ac6d7ecb70e50f92c1944e545188d2359/library/std/src/panicking.rs:697:5
1: core::panicking::panic_fmt
at /rustc/17067e9ac6d7ecb70e50f92c1944e545188d2359/library/core/src/panicking.rs:75:14
2: core::panicking::panic
at /rustc/17067e9ac6d7ecb70e50f92c1944e545188d2359/library/core/src/panicking.rs:145:5
3: parquet::arrow::arrow_writer::tests::test_page_encoding_statistics_roundtrip
at ./src/arrow/arrow_writer/mod.rs:3865:9
4: parquet::arrow::arrow_writer::tests::test_page_encoding_statistics_roundtrip::{{closure}}
at ./src/arrow/arrow_writer/mod.rs:3841:49
5: core::ops::function::FnOnce::call_once
at /Users/andrewlamb/.rustup/toolchains/stable-aarch64-apple-darwin/lib/rustlib/src/rust/library/core/src/ops/function.rs:250:5
6: core::ops::function::FnOnce::call_once
at /rustc/17067e9ac6d7ecb70e50f92c1944e545188d2359/library/core/src/ops/function.rs:250:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
👍
Which issue does this PR close?
encoding_statsnot present in Parquet generated byparquet-rewrite#7616.Rationale for this change
Page encoding statistics are not copied from the column chunk result to the final output metadata.
What changes are included in this PR?
Make sure stats are copied and add a round trip test to make sure they end up in the Parquet file.
Are there any user-facing changes?
No