Commit 4344f3f
committed
[SPARK-55273][SQL] Replace
### What changes were proposed in this pull request?
This pr replace `ParquetFileReader.open().getFooter()` with `ParquetFileReader.readFooter()` to avoid unnecessary operations in `ParquetFooterReader`
### Why are the changes needed?
Compared to `ParquetFileReader.readFooter()`, `ParquetFileReader.open()` performs more operations as follows:
```
this.converter = new ParquetMetadataConverter(options);
this.file = file;
this.f = f;
this.options = options;
this.footer = footer;
this.fileMetaData = footer.getFileMetaData();
this.fileDecryptor = fileMetaData.getFileDecryptor(); // must be called before filterRowGroups!
if (null != fileDecryptor && fileDecryptor.plaintextFile()) {
this.fileDecryptor = null; // Plaintext file. No need in decryptor
}
try {
this.blocks = filterRowGroups(footer.getBlocks());
} catch (Exception e) {
// In case that filterRowGroups throws an exception in the constructor, the new stream
// should be closed. Otherwise, there's no way to close this outside.
f.close();
throw e;
}
this.blockIndexStores = listWithNulls(this.blocks.size());
this.blockRowRanges = listWithNulls(this.blocks.size());
for (ColumnDescriptor col : footer.getFileMetaData().getSchema().getColumns()) {
paths.put(ColumnPath.get(col.getPath()), col);
}
if (options.usePageChecksumVerification()) {
this.crc = new CRC32();
this.crcAllocator = ReusingByteBufferAllocator.strict(options.getAllocator());
} else {
this.crc = null;
this.crcAllocator = null;
}
```
https://github.com/apache/parquet-java/blob/fac0c746532e133beb928a7f6a7e57b510b477a1/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileReader.java#L960-L1016
And these operations can be omitted for the requirement of `readFooter`.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Pass GitHub Acitons
### Was this patch authored or co-authored using generative AI tooling?
No
Closes #54055 from LuciferYang/ParquetFooterReader-readFooter.
Authored-by: yangjie01 <[email protected]>
Signed-off-by: yangjie01 <[email protected]>ParquetFileReader.open().getFooter() with readFooter() to avoid unnecessary operations in ParquetFooterReader
1 parent b3cbff3 commit 4344f3f
1 file changed
Lines changed: 2 additions & 2 deletions
File tree
- sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet
sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/ParquetFooterReader.java
Lines changed: 2 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
64 | 64 | | |
65 | 65 | | |
66 | 66 | | |
67 | | - | |
68 | | - | |
| 67 | + | |
| 68 | + | |
69 | 69 | | |
70 | 70 | | |
71 | 71 | | |
| |||
0 commit comments