Skip to content

Conversation

@mariosasko
Copy link
Collaborator

Use absolute local paths in the error messages of load_dataset as per @stas00's suggestion in #2500 (comment)

if script_version is not None:
raise FileNotFoundError(
"Couldn't find remote file with version {} at {}. Please provide a valid version and a valid {} name".format(
"Couldn't find remote file with version {} at {}. Please provide a valid version and a valid {} name.".format(
Copy link
Contributor

@stas00 stas00 Jul 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lhoestq, do you guys plan to keep the same style as transformers? If so, the latter fully switched to f"" strings from format.

This could be a good https://github.com/huggingface/datasets/contribute Issue if you choose to do so.

If not, please ignore my comment.

Copy link
Member

@lhoestq lhoestq Jul 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we prefer f-strings than using format, and when it's possible we try to follow the same style as transformers

The changes can be done in another PR :)

Copy link
Member

@lhoestq lhoestq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks !

I just did a change to avoid showing twice the same path when the users pass a path to a directory, and to avoid showing something like dataset_name.py/dataset_name.py when a path to the dataset script is passed.

@lhoestq lhoestq merged commit 28f928d into huggingface:master Jul 22, 2021
@mariosasko mariosasko deleted the improve-load-dataset-messages branch July 22, 2021 14:05
Comment on lines +230 to +231
m_path = re.search(r"\S*_dummy\b", str(exc_info.value))
assert m_path is not None and os.path.isabs(m_path.group())
Copy link
Collaborator Author

@mariosasko mariosasko Jul 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lhoestq Actually, this check doesn't do anything (m_path returns a substring of m_combined_path without .py file extension). We can replace this check with a check which verifies that the error message returns a remote URL.

m_paths = re.findall(r"\S*_dummy/_dummy.py\b", str(exc_info.value))  # on Linux this will match an URL as well as a local_path due to different os.sep, so take the last element (an URL always comes last in the list)
assert len(m_paths) > 0 and is_remote_url(m_paths[-1])  # is_remote_url comes from datasets.utils.file_utils

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants