Skip to content

Add a way to dynamically configure / update the ListFilesCache settings via RuntimeConfiguration #19056

@alamb

Description

@alamb

Is your feature request related to a problem or challenge?

@BlakeOrth added a cache to avoid re-listing all files which is great, and it includes a max size and a TTL (time to live) for the entries.

The default ttl is infinite (to ensure stability). However this is likely not what all users want so it would be useful to be able to change these parameters similarly to how the other runtime options can be configured

Describe the solution you'd like

I think what we should do (as a follow on PR) is to add runtime configuration settings for the max cache size and its ttl in https://datafusion.apache.org/user-guide/configs.html#runtime-configuration-settings

This would mean supporting

-- set list files cache limit to 5MB 
SET datafusion.runtime.list_files_cache_limit = '5M'
-- set time to live for each entry to 1 minute 30 seconds
SET datafusion.runtime.list_files_cache_limit = '1m30s'; -- would it be better like `1:30`?

Describe alternatives you've considered

I suggest adding two new runtime configuration options, following the model of metadata_cache_limit

  1. list_files_cache_limit -- size of cache
  2. list_files_cache_ttl -- ttl duration of entries

that would mean roughly adding support here (and elsewhere in that file)

"metadata_cache_limit" => {
let limit = Self::parse_memory_limit(value)?;
builder.with_metadata_cache_limit(limit)
}

And add tests like

# reset runtime variables
statement ok
SET datafusion.runtime.memory_limit = '1M'

And then add a note to the upgrade guide

Additional context

No response

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions