-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Description
We want to be able to cache parsed metadata for Parquet files in the write buffer to help with querying speeds, particularly single series queries. Reparsing the data on every query can be quite expensive and if we know we'll want to look at it again fairly often or even ahead of time (say the last N segments of data) it would be nice to just cache the parsed metadata alone. This could let us make queries against the ObjectStore without the added overhead of now fetching and parsing the data, as well as not needing to reparse the data for those files stored in the upcoming cache as mentioned in #24897.
We thought this would require significant changes to datafusion, but it looks like we already have what we need available! The docs for this feature are a bit sparse right now and @alamb is planning to update them upstream in Datafusion.
To close this ticket:
- Store parsed metadata in the Write Buffer
- Provide a new QueryChunk type that uses this parsed metadata for queries
- Hook this into queries made