-
Notifications
You must be signed in to change notification settings - Fork 3k
Print absolute local paths in load_dataset error messages #2684
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Print absolute local paths in load_dataset error messages #2684
Conversation
… improve-load-dataset-messages
| if script_version is not None: | ||
| raise FileNotFoundError( | ||
| "Couldn't find remote file with version {} at {}. Please provide a valid version and a valid {} name".format( | ||
| "Couldn't find remote file with version {} at {}. Please provide a valid version and a valid {} name.".format( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lhoestq, do you guys plan to keep the same style as transformers? If so, the latter fully switched to f"" strings from format.
This could be a good https://github.com/huggingface/datasets/contribute Issue if you choose to do so.
If not, please ignore my comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes we prefer f-strings than using format, and when it's possible we try to follow the same style as transformers
The changes can be done in another PR :)
lhoestq
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks !
I just did a change to avoid showing twice the same path when the users pass a path to a directory, and to avoid showing something like dataset_name.py/dataset_name.py when a path to the dataset script is passed.
| m_path = re.search(r"\S*_dummy\b", str(exc_info.value)) | ||
| assert m_path is not None and os.path.isabs(m_path.group()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lhoestq Actually, this check doesn't do anything (m_path returns a substring of m_combined_path without .py file extension). We can replace this check with a check which verifies that the error message returns a remote URL.
m_paths = re.findall(r"\S*_dummy/_dummy.py\b", str(exc_info.value)) # on Linux this will match an URL as well as a local_path due to different os.sep, so take the last element (an URL always comes last in the list)
assert len(m_paths) > 0 and is_remote_url(m_paths[-1]) # is_remote_url comes from datasets.utils.file_utils
Use absolute local paths in the error messages of
load_datasetas per @stas00's suggestion in #2500 (comment)