Skip to content

Conversation

@lhoestq
Copy link
Member

@lhoestq lhoestq commented Jul 19, 2021

Since #2628 we use the ETag or the remote data files to compute the directory in the cache where a dataset is saved. This is useful in order to reload the dataset from the cache only if the remote files haven't changed.

In this I made the ETag requests parallel using multithreading. There is also a tqdm progress bar that shows up if there are more than 16 data files.

@lhoestq lhoestq merged commit 8814b39 into master Jul 19, 2021
@lhoestq lhoestq deleted the parallelize-etag-requests branch July 19, 2021 19:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants