Cache Parquet Metadata for Queries

We want to be able to cache parsed metadata for Parquet files in the write buffer to help with querying speeds, particularly single series queries. Reparsing the data on every query can be quite expensive and if we know we'll want to look at it again fairly often or even ahead of time (say the last N segments of data) it would be nice to just cache the parsed metadata alone. This could let us make queries against the ObjectStore without the added overhead of now fetching *and* parsing the data, as well as not needing to reparse the data for those files stored in the upcoming cache as mentioned in #24897.

We thought this would require significant changes to datafusion, but it looks like we alr[eady have what we need available](https://docs.rs/datafusion/latest/datafusion/datasource/physical_plan/parquet/trait.ParquetFileReaderFactory.html)! The docs for this feature are a bit sparse right now and @alamb is planning to update them upstream in Datafusion.

To close this ticket:
- [ ] Store parsed metadata in the Write Buffer
- [ ] Provide a new QueryChunk type that uses this parsed metadata for queries
- [ ] Hook this into queries made



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cache Parquet Metadata for Queries #24903

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Cache Parquet Metadata for Queries #24903

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions