Skip to content

Commit 4344f3f

Browse files
committed
[SPARK-55273][SQL] Replace ParquetFileReader.open().getFooter() with readFooter() to avoid unnecessary operations in ParquetFooterReader
### What changes were proposed in this pull request? This pr replace `ParquetFileReader.open().getFooter()` with `ParquetFileReader.readFooter()` to avoid unnecessary operations in `ParquetFooterReader` ### Why are the changes needed? Compared to `ParquetFileReader.readFooter()`, `ParquetFileReader.open()` performs more operations as follows: ``` this.converter = new ParquetMetadataConverter(options); this.file = file; this.f = f; this.options = options; this.footer = footer; this.fileMetaData = footer.getFileMetaData(); this.fileDecryptor = fileMetaData.getFileDecryptor(); // must be called before filterRowGroups! if (null != fileDecryptor && fileDecryptor.plaintextFile()) { this.fileDecryptor = null; // Plaintext file. No need in decryptor } try { this.blocks = filterRowGroups(footer.getBlocks()); } catch (Exception e) { // In case that filterRowGroups throws an exception in the constructor, the new stream // should be closed. Otherwise, there's no way to close this outside. f.close(); throw e; } this.blockIndexStores = listWithNulls(this.blocks.size()); this.blockRowRanges = listWithNulls(this.blocks.size()); for (ColumnDescriptor col : footer.getFileMetaData().getSchema().getColumns()) { paths.put(ColumnPath.get(col.getPath()), col); } if (options.usePageChecksumVerification()) { this.crc = new CRC32(); this.crcAllocator = ReusingByteBufferAllocator.strict(options.getAllocator()); } else { this.crc = null; this.crcAllocator = null; } ``` https://github.com/apache/parquet-java/blob/fac0c746532e133beb928a7f6a7e57b510b477a1/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileReader.java#L960-L1016 And these operations can be omitted for the requirement of `readFooter`. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Pass GitHub Acitons ### Was this patch authored or co-authored using generative AI tooling? No Closes #54055 from LuciferYang/ParquetFooterReader-readFooter. Authored-by: yangjie01 <[email protected]> Signed-off-by: yangjie01 <[email protected]>
1 parent b3cbff3 commit 4344f3f

1 file changed

Lines changed: 2 additions & 2 deletions

File tree

sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/ParquetFooterReader.java

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -64,8 +64,8 @@ public static ParquetMetadata readFooter(
6464
ParquetReadOptions readOptions = HadoopReadOptions
6565
.builder(inputFile.getConfiguration(), inputFile.getPath())
6666
.withMetadataFilter(filter).build();
67-
try (var fileReader = ParquetFileReader.open(inputFile, readOptions)) {
68-
return fileReader.getFooter();
67+
try (var stream = inputFile.newStream()) {
68+
return ParquetFileReader.readFooter(inputFile, readOptions, stream);
6969
}
7070
}
7171

0 commit comments

Comments
 (0)