Skip to content

Conversation

@hkjeon13
Copy link

I fixed a few errors when it occurs while streaming the private dataset on the Huggingface Hub.

from datasets import load_dataset

dataset = load_dataset(<repo_id>, use_auth_token=<private_token>, streaming=True)
for d in dataset['train']:
    print(d)
    break # this is for checking

This code is an example for streaming private datasets.
when the version of the datasets is 2.2.2, it works well but datasets>2.2.2 occurs error like this,

/usr/local/lib/python3.7/dist-packages/aiohttp/client_reqrep.py in raise_for_status(self)
1007 status=self.status,
1008 message=self.reason,
→ 1009 headers=self.headers,
1010 )
1011

ClientResponseError: 401, message='Unauthorized', url=URL('https://huggingface.co/datasets/.../train-00000-of-00001-168b451062c67c34.parquet')

(this is an example on the dataset has parquet extenstion)
It seems that the xisfile module in download/streaming_download_manager.py couldn't recognize the file on "https://huggingface.co/~".

so I add three lines.
With this change, there is no error anymore(but this code is ad-hoc).

@hkjeon13 hkjeon13 changed the title fix-auth-error-private-dataset Fix Authentification Error while streaming Jul 19, 2022
@mariosasko
Copy link
Collaborator

Hi, thanks for working on this, but the fix for this has already been merged in #4608.

@mariosasko mariosasko closed this Jul 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants