Skip to content

Conversation

@mariosasko
Copy link
Collaborator

This PR replaces tqdm from the tqdm lib with tqdm from datasets.utils.tqdm_utils. With this change, it's possible to disable progress bars just by calling disable_progress_bar. Note this doesn't work on Windows when using multiprocessing due to how global variables are shared between processes. Currently, there is no easy way to disable progress bars in a multiprocess setting on Windows (patching logging with datasets.utils.logging.get_verbosity = lambda: datasets.utils.logging.NOTSET doesn't seem to work as well), so adding support for this is a future goal. Additionally, this PR adds a unit ("ba" for batches) to the bar printed by Dataset.to_json (this change is motivated by #2657).

return function(data_struct)

disable_tqdm = bool(logger.getEffectiveLevel() > INFO)
disable_tqdm = bool(logger.getEffectiveLevel() > logging.INFO)
Copy link
Collaborator Author

@mariosasko mariosasko Jul 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replacing this line with disable_tqdm = bool(logging.get_verbosity() == logging.NOTSET) (this check was introduced in #2534) causes 2 tests to fail on Windows, which is very strange.

UPDATE:
It took me some time to find the bug. This is the PR with the fix I've opened in the tqdm repo. I'll update the line in a separate PR if it gets merged.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job discovering what caused this issue !

@mariosasko
Copy link
Collaborator Author

The current CI failure is due to modifications in the dataset script.

Copy link
Member

@lhoestq lhoestq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice thanks :)

I just have one comment about the change in src/datasets/__init__.py

return function(data_struct)

disable_tqdm = bool(logger.getEffectiveLevel() > INFO)
disable_tqdm = bool(logger.getEffectiveLevel() > logging.INFO)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job discovering what caused this issue !

Copy link
Member

@lhoestq lhoestq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot :)

@lhoestq
Copy link
Member

lhoestq commented Jul 19, 2021

Merging since the CI is only failing because of dataset card issues, which is unrelated to this PR

@lhoestq lhoestq merged commit 575fe8e into huggingface:master Jul 19, 2021
@mariosasko mariosasko deleted the tqdm-consistency branch July 19, 2021 17:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants