Filter manifest files based on partition summaries#938
Merged
scott-routledge2 merged 11 commits intomainfrom Nov 24, 2025
Merged
Filter manifest files based on partition summaries#938scott-routledge2 merged 11 commits intomainfrom
scott-routledge2 merged 11 commits intomainfrom
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #938 +/- ##
==========================================
+ Coverage 66.68% 68.86% +2.18%
==========================================
Files 186 195 +9
Lines 66795 67643 +848
Branches 9507 9611 +104
==========================================
+ Hits 44543 46584 +2041
+ Misses 19572 18227 -1345
- Partials 2680 2832 +152 |
ehsantn
approved these changes
Nov 22, 2025
Collaborator
ehsantn
left a comment
There was a problem hiding this comment.
Thanks @scott-routledge2 .
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Changes included in this PR
Filter manifest files based on partition summaries when constructing parquet infos. This can avoid a lot of overhead reading from slow storage e.g. s3 in the case where there are many manifest files but only a few that will match the filter.
Testing on S3Tables with ~2700 data files and 48 manifest files (but only 1 manifest file actually matches the filter), time to read a small section of the table went from 10s to 5s.
Testing strategy
S3tables benchmark, CI
User facing changes
Checklist
[run CI]in your commit message.