Skip to content

Conversation

@LuciferYang
Copy link
Contributor

@LuciferYang LuciferYang commented Jan 29, 2026

What changes were proposed in this pull request?

This pr replace ParquetFileReader.open().getFooter() with ParquetFileReader.readFooter() to avoid unnecessary operations in ParquetFooterReader

Why are the changes needed?

Compared to ParquetFileReader.readFooter(), ParquetFileReader.open() performs more operations as follows:

    this.converter = new ParquetMetadataConverter(options);
    this.file = file;
    this.f = f;
    this.options = options;
    this.footer = footer;

    this.fileMetaData = footer.getFileMetaData();
    this.fileDecryptor = fileMetaData.getFileDecryptor(); // must be called before filterRowGroups!
    if (null != fileDecryptor && fileDecryptor.plaintextFile()) {
      this.fileDecryptor = null; // Plaintext file. No need in decryptor
    }

    try {
      this.blocks = filterRowGroups(footer.getBlocks());
    } catch (Exception e) {
      // In case that filterRowGroups throws an exception in the constructor, the new stream
      // should be closed. Otherwise, there's no way to close this outside.
      f.close();
      throw e;
    }
    this.blockIndexStores = listWithNulls(this.blocks.size());
    this.blockRowRanges = listWithNulls(this.blocks.size());
    for (ColumnDescriptor col : footer.getFileMetaData().getSchema().getColumns()) {
      paths.put(ColumnPath.get(col.getPath()), col);
    }

    if (options.usePageChecksumVerification()) {
      this.crc = new CRC32();
      this.crcAllocator = ReusingByteBufferAllocator.strict(options.getAllocator());
    } else {
      this.crc = null;
      this.crcAllocator = null;
    }

https://github.com/apache/parquet-java/blob/fac0c746532e133beb928a7f6a7e57b510b477a1/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileReader.java#L960-L1016

And these operations can be omitted for the requirement of readFooter.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Pass GitHub Acitons

Was this patch authored or co-authored using generative AI tooling?

No

@LuciferYang LuciferYang marked this pull request as draft January 29, 2026 08:52
@github-actions github-actions bot added the SQL label Jan 29, 2026
@github-actions
Copy link

github-actions bot commented Jan 29, 2026

JIRA Issue Information

=== Improvement SPARK-55273 ===
Summary: Replace ParquetFileReader.open().getFooter() with readFooter() to avoid unnecessary operations in ParquetFooterReader
Assignee: None
Status: Open
Affected: ["4.2.0"]


This comment was automatically generated by GitHub Actions

@LuciferYang LuciferYang changed the title Replace ParquetFileReader.open().getFooter() with readFooter() to avoid unnecessary operational overhead in ParquetFooterReader [SPARK-55273][SQL] Replace ParquetFileReader.open().getFooter() with readFooter() to avoid unnecessary operations in ParquetFooterReader Jan 29, 2026
@LuciferYang
Copy link
Contributor Author

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Thank you, @LuciferYang .

@LuciferYang LuciferYang marked this pull request as ready for review January 29, 2026 11:53
@LuciferYang
Copy link
Contributor Author

Merged into master. Thanks @dongjoon-hyun and @pan3793

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants