Skip to content

Conversation

@lhoestq
Copy link
Member

@lhoestq lhoestq commented Jun 30, 2022

xisfile is working in a private repository when passing a chained URL to a file inside an archive, e.g. zip://a.txt::https://huggingface/datasets/username/dataset_name/resolve/main/data.zip. However it's not working when passing a simple file https://huggingface/datasets/username/dataset_name/resolve/main/data.zip.

This is because the authentication headers are not passed correctly in this case.

This is causing dataset streaming to fail in private parquet repositories, as noted in #4605

I fixed xisfile and the other functions that behave the same way: xgetsize, xisdir and xlistdir

TODO:

  • tests

fix #4605

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Jun 30, 2022

The documentation is not available anymore as the PR was closed or merged.

@lhoestq
Copy link
Member Author

lhoestq commented Jul 5, 2022

Added tests for xisfile, xgetsize, xlistdir and xglob for private repos, and also tests for xwalk that was untested

@lhoestq lhoestq marked this pull request as ready for review July 5, 2022 09:20
@lhoestq lhoestq requested a review from mariosasko July 5, 2022 09:20
Copy link
Collaborator

@mariosasko mariosasko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing this!

@lhoestq lhoestq merged commit 0702cb0 into master Jul 6, 2022
@lhoestq lhoestq deleted the fix-xisfile branch July 6, 2022 12:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Dataset Viewer issue for boris/gis_filtered

4 participants