Skip to content

Filter manifest files based on partition summaries#938

Merged
scott-routledge2 merged 11 commits intomainfrom
scott/filter_manifest_files
Nov 24, 2025
Merged

Filter manifest files based on partition summaries#938
scott-routledge2 merged 11 commits intomainfrom
scott/filter_manifest_files

Conversation

@scott-routledge2
Copy link
Contributor

@scott-routledge2 scott-routledge2 commented Nov 21, 2025

Changes included in this PR

Filter manifest files based on partition summaries when constructing parquet infos. This can avoid a lot of overhead reading from slow storage e.g. s3 in the case where there are many manifest files but only a few that will match the filter.

Testing on S3Tables with ~2700 data files and 48 manifest files (but only 1 manifest file actually matches the filter), time to read a small section of the table went from 10s to 5s.

Testing strategy

S3tables benchmark, CI

User facing changes

Checklist

  • Pipelines passed before requesting review. To run CI you must include [run CI] in your commit message.
  • I am familiar with the Contributing Guide
  • I have installed + ran pre-commit hooks.

@scott-routledge2 scott-routledge2 changed the title Filter manifest files based on partition spec Filter manifest files based on partition summaries Nov 21, 2025
@codecov
Copy link

codecov bot commented Nov 21, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 68.86%. Comparing base (c33fbb5) to head (df612a9).
⚠️ Report is 131 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #938      +/-   ##
==========================================
+ Coverage   66.68%   68.86%   +2.18%     
==========================================
  Files         186      195       +9     
  Lines       66795    67643     +848     
  Branches     9507     9611     +104     
==========================================
+ Hits        44543    46584    +2041     
+ Misses      19572    18227    -1345     
- Partials     2680     2832     +152     

@scott-routledge2 scott-routledge2 marked this pull request as ready for review November 21, 2025 17:50
Copy link
Contributor

@IsaacWarren IsaacWarren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Thanks Scott

Copy link
Collaborator

@ehsantn ehsantn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@scott-routledge2 scott-routledge2 merged commit d9df9e8 into main Nov 24, 2025
30 checks passed
@scott-routledge2 scott-routledge2 deleted the scott/filter_manifest_files branch November 24, 2025 15:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants