Skip to content

Conversation

@albertvillanova
Copy link
Member

@albertvillanova albertvillanova commented Sep 7, 2021

This PR extends the support in streaming mode for datasets that use pathlib, by patching the method pathlib.Path.glob.

Related to #2874, #2866.

CC: @severo

@albertvillanova
Copy link
Member Author

I am thinking that ideally we should call fs.glob() instead...

Copy link
Member

@lhoestq lhoestq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome ! Good idea to add a mock filesystem for the tests :)
Should we add rglob as well ?

Comment on lines 198 to 200
fs, fs_token, globbed_paths = fsspec.get_fs_token_paths(xjoin(posix_path, pattern))
if "*" not in pattern:
globbed_paths = fs.glob(globbed_paths[0])
Copy link
Member

@lhoestq lhoestq Sep 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a comment to mention that get_fs_token_paths does pattern matching only if there's a * in the pattern. Otherwise it could be unclear why you would need to glob when there's no * in the pattern.

Copy link
Member Author

@albertvillanova albertvillanova Sep 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have finally refactored this, so that it always calls fs.glob because:

  • get_fs_token_paths does pattern matching only if there is a *
  • The * in get_fs_token_paths only matches file names, not directory names
  • get_fs_token_paths doesn't do pattern matching if there is **

@albertvillanova
Copy link
Member Author

albertvillanova commented Sep 10, 2021

Thanks, @lhoestq: the idea of adding the mock filesystem is to avoid network calls and reduce testing time ;)

I have added rglob as well and fixed some bugs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants