-
Notifications
You must be signed in to change notification settings - Fork 3k
Extend support for streaming datasets that use pathlib.Path.glob #2876
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extend support for streaming datasets that use pathlib.Path.glob #2876
Conversation
|
I am thinking that ideally we should call |
lhoestq
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome ! Good idea to add a mock filesystem for the tests :)
Should we add rglob as well ?
| fs, fs_token, globbed_paths = fsspec.get_fs_token_paths(xjoin(posix_path, pattern)) | ||
| if "*" not in pattern: | ||
| globbed_paths = fs.glob(globbed_paths[0]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a comment to mention that get_fs_token_paths does pattern matching only if there's a * in the pattern. Otherwise it could be unclear why you would need to glob when there's no * in the pattern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have finally refactored this, so that it always calls fs.glob because:
get_fs_token_pathsdoes pattern matching only if there is a*- The
*inget_fs_token_pathsonly matches file names, not directory names get_fs_token_pathsdoesn't do pattern matching if there is**
|
Thanks, @lhoestq: the idea of adding the mock filesystem is to avoid network calls and reduce testing time ;) I have added |
This PR extends the support in streaming mode for datasets that use
pathlib, by patching the methodpathlib.Path.glob.Related to #2874, #2866.
CC: @severo