Skip to content

Conversation

@albertvillanova
Copy link
Member

This PR extends the support in streaming mode for datasets that use pathlib.Path.

Related to: #2866.
CC: @severo

Copy link
Member

@lhoestq lhoestq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this is very nice ! It should fix a lot of dataset in streaming mode :)

I added a few suggestions, mostly about the tests

Copy link
Member

@lhoestq lhoestq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM thank you !

@lhoestq lhoestq merged commit 486e7ba into huggingface:master Sep 7, 2021
@severo
Copy link
Collaborator

severo commented Sep 7, 2021

I've tried #2866 again, and I get the same error.

import datasets as ds
ds.load_dataset('counter', split="train", streaming=False)

@albertvillanova
Copy link
Member Author

@severo Issue #2866 is not fully fixed yet: multiple patches need to be implemented for pathlib, as that dataset uses quite a lot of pathlib functions... 😅

@severo
Copy link
Collaborator

severo commented Sep 7, 2021

No worry and no stress, I just wanted to check for that case :) I'm very happy that you're working on issues I'm interested in!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants