Skip to content

Conversation

@lhoestq
Copy link
Member

@lhoestq lhoestq commented Jul 20, 2021

data_dir=None was considered a dataset config parameter, hence creating a special config_id for all dataset being loaded.
Since the config_id is used to name the cache directories, this leaded to datasets being regenerated for users.

I fixed this by ignoring the value of data_dir when it's None when computing the config_id.
I also added a test to make sure the cache directories are not unexpectedly renamed in the future.

Fix #2683

@lhoestq lhoestq merged commit a15d145 into master Jul 20, 2021
@lhoestq lhoestq deleted the dont-use-data_dir-if-None-for-config_id branch July 20, 2021 16:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Cache directories changed due to recent changes in how config kwargs are handled

2 participants