Skip to content

Conversation

@thuongle2210
Copy link

@thuongle2210 thuongle2210 commented Nov 1, 2025

Summary

Briefly explain what this PR does.

Inspired from: https://arrow.apache.org/blog/2025/10/23/rust-parquet-metadata/
My benchmark implementation: https://github.com/thuongle2210/parquet_decoding

Improvement: 1.84 times faster with Parquet version 56.0.0, and approximately 3 times faster with Parquet version 57.0.0.
This is due to dependencies on other libraries (for example, the latest version of deltalake matches with Parquet 56). Therefore, I keep using Parquet 56.0.0

Related Issues

links to related issues: #2192

Changes

  • replace the custom by built-in implementation of Parquet library
  • use built-in method to get statistic of parquet metadata file

Checklist

  • Code builds correctly
  • Tests have been added or updated
  • Documentation updated if necessary
  • I have reviewed my own changes

@cursor
Copy link

cursor bot commented Nov 1, 2025

You have run out of free Bugbot PR reviews for this billing cycle. This will reset on November 9.

To receive reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.

@thuongle2210
Copy link
Author

Hi @dentiny, could you pls help me review this PR?

@github-actions
Copy link

This PR has been inactive for 14 days and is now marked as stale. If this is still being worked on, please comment to keep it open.

@github-actions github-actions bot added the stale label Nov 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant