-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Description
Is your feature request related to a problem or challenge?
- part of [EPIC] ListingTable object store usage improvements #17214
- follow on to Adds memory-bound DefaultListFilesCache #18855
@BlakeOrth added a cache to avoid re-listing all files which is great, and it includes a max size and a TTL (time to live) for the entries.
The default ttl is infinite (to ensure stability). However this is likely not what all users want so it would be useful to be able to change these parameters similarly to how the other runtime options can be configured
Describe the solution you'd like
I think what we should do (as a follow on PR) is to add runtime configuration settings for the max cache size and its ttl in https://datafusion.apache.org/user-guide/configs.html#runtime-configuration-settings
This would mean supporting
-- set list files cache limit to 5MB
SET datafusion.runtime.list_files_cache_limit = '5M'
-- set time to live for each entry to 1 minute 30 seconds
SET datafusion.runtime.list_files_cache_limit = '1m30s'; -- would it be better like `1:30`?Describe alternatives you've considered
I suggest adding two new runtime configuration options, following the model of metadata_cache_limit
list_files_cache_limit-- size of cachelist_files_cache_ttl-- ttl duration of entries
that would mean roughly adding support here (and elsewhere in that file)
datafusion/datafusion/core/src/execution/context/mod.rs
Lines 1160 to 1163 in 838e1de
| "metadata_cache_limit" => { | |
| let limit = Self::parse_memory_limit(value)?; | |
| builder.with_metadata_cache_limit(limit) | |
| } |
And add tests like
datafusion/datafusion/sqllogictest/test_files/set_variable.slt
Lines 314 to 316 in c8d26ba
| # reset runtime variables | |
| statement ok | |
| SET datafusion.runtime.memory_limit = '1M' |
And then add a note to the upgrade guide
Additional context
No response