Skip to content

Conversation

@lhoestq
Copy link
Member

@lhoestq lhoestq commented Nov 16, 2020

Previously the locking system of the lib was based on the file_lock package. However as noticed in #812 there were too many logs printed even when the datasets logging was set to warnings or errors.

For example

import logging
logging.basicConfig(level=logging.INFO)

import datasets
datasets.set_verbosity_warning()
datasets.load_dataset("squad")

would still log the file lock events:

INFO:filelock:Lock 5737989232 acquired on /Users/quentinlhoest/.cache/huggingface/datasets/44801f118d500eff6114bfc56ab4e6def941f1eb14b70ac1ecc052e15cdac49d.85f43de978b9b25921cb78d7a2f2b350c04acdbaedb9ecb5f7101cd7c0950e68.py.lock
INFO:filelock:Lock 5737989232 released on /Users/quentinlhoest/.cache/huggingface/datasets/44801f118d500eff6114bfc56ab4e6def941f1eb14b70ac1ecc052e15cdac49d.85f43de978b9b25921cb78d7a2f2b350c04acdbaedb9ecb5f7101cd7c0950e68.py.lock
INFO:filelock:Lock 4393489968 acquired on /Users/quentinlhoest/.cache/huggingface/datasets/_Users_quentinlhoest_.cache_huggingface_datasets_squad_plain_text_1.0.0_1244d044b266a5e4dbd4174d23cb995eead372fbca31a03edc3f8a132787af41.lock
INFO:filelock:Lock 4393489968 released on /Users/quentinlhoest/.cache/huggingface/datasets/_Users_quentinlhoest_.cache_huggingface_datasets_squad_plain_text_1.0.0_1244d044b266a5e4dbd4174d23cb995eead372fbca31a03edc3f8a132787af41.lock
INFO:filelock:Lock 4393490808 acquired on /Users/quentinlhoest/.cache/huggingface/datasets/_Users_quentinlhoest_.cache_huggingface_datasets_squad_plain_text_1.0.0_1244d044b266a5e4dbd4174d23cb995eead372fbca31a03edc3f8a132787af41.lock
Reusing dataset squad (/Users/quentinlhoest/.cache/huggingface/datasets/squad/plain_text/1.0.0/1244d044b266a5e4dbd4174d23cb995eead372fbca31a03edc3f8a132787af41)
INFO:filelock:Lock 4393490808 released on /Users/quentinlhoest/.cache/huggingface/datasets/_Users_quentinlhoest_.cache_huggingface_datasets_squad_plain_text_1.0.0_1244d044b266a5e4dbd4174d23cb995eead372fbca31a03edc3f8a132787af41.lock

With the integration of file_lock in the library, the ouput is much cleaner:

Reusing dataset squad (/Users/quentinlhoest/.cache/huggingface/datasets/squad/plain_text/1.0.0/1244d044b266a5e4dbd4174d23cb995eead372fbca31a03edc3f8a132787af41)

Since the file_lock package is only a 450 lines file I think it's fine to have it inside the lib.

Fix #812

Copy link
Member

@thomwolf thomwolf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed!

@lhoestq lhoestq merged commit 5092092 into master Nov 16, 2020
@lhoestq lhoestq deleted the integreate-file_lock branch November 16, 2020 17:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Too much logging

3 participants