Skip to content

Private datasets do not load when passing token #6126

@albertvillanova

Description

@albertvillanova

Describe the bug

Since the release of datasets 2.14, private/gated datasets do not load when passing token: they raise EmptyDatasetError.

This is a non-planned backward incompatible breaking change.

Note that private datasets do load if instead download_config is passed:

from datasets import DownloadConfig, load_dataset

ds = load_dataset("albertvillanova/tmp-private", split="train", download_config=DownloadConfig(token="<MY-TOKEN>"))
ds

gives

Dataset({
    features: ['text'],
    num_rows: 4
})

Steps to reproduce the bug

from datasets import load_dataset

ds = load_dataset("albertvillanova/tmp-private", split="train", token="<MY-TOKEN>")

gives

---------------------------------------------------------------------------
EmptyDatasetError                         Traceback (most recent call last)
[<ipython-input-2-25b48732107a>](https://localhost:8080/#) in <cell line: 3>()
      1 from datasets import load_dataset
      2 
----> 3 ds = load_dataset("albertvillanova/tmp-private", split="train", token="<MY-TOKEN>")

5 frames
[/usr/local/lib/python3.10/dist-packages/datasets/load.py](https://localhost:8080/#) in load_dataset(path, name, data_dir, data_files, split, cache_dir, features, download_config, download_mode, verification_mode, ignore_verifications, keep_in_memory, save_infos, revision, token, use_auth_token, task, streaming, num_proc, storage_options, **config_kwargs)
   2107 
   2108     # Create a dataset builder
-> 2109     builder_instance = load_dataset_builder(
   2110         path=path,
   2111         name=name,

[/usr/local/lib/python3.10/dist-packages/datasets/load.py](https://localhost:8080/#) in load_dataset_builder(path, name, data_dir, data_files, cache_dir, features, download_config, download_mode, revision, token, use_auth_token, storage_options, **config_kwargs)
   1793         download_config = download_config.copy() if download_config else DownloadConfig()
   1794         download_config.storage_options.update(storage_options)
-> 1795     dataset_module = dataset_module_factory(
   1796         path,
   1797         revision=revision,

[/usr/local/lib/python3.10/dist-packages/datasets/load.py](https://localhost:8080/#) in dataset_module_factory(path, revision, download_config, download_mode, dynamic_modules_path, data_dir, data_files, **download_kwargs)
   1484                     raise ConnectionError(f"Couldn't reach the Hugging Face Hub for dataset '{path}': {e1}") from None
   1485                 if isinstance(e1, EmptyDatasetError):
-> 1486                     raise e1 from None
   1487                 if isinstance(e1, FileNotFoundError):
   1488                     raise FileNotFoundError(

[/usr/local/lib/python3.10/dist-packages/datasets/load.py](https://localhost:8080/#) in dataset_module_factory(path, revision, download_config, download_mode, dynamic_modules_path, data_dir, data_files, **download_kwargs)
   1474                     download_config=download_config,
   1475                     download_mode=download_mode,
-> 1476                 ).get_module()
   1477         except (
   1478             Exception

[/usr/local/lib/python3.10/dist-packages/datasets/load.py](https://localhost:8080/#) in get_module(self)
   1030             sanitize_patterns(self.data_files)
   1031             if self.data_files is not None
-> 1032             else get_data_patterns(base_path, download_config=self.download_config)
   1033         )
   1034         data_files = DataFilesDict.from_patterns(

[/usr/local/lib/python3.10/dist-packages/datasets/data_files.py](https://localhost:8080/#) in get_data_patterns(base_path, download_config)
    457         return _get_data_files_patterns(resolver)
    458     except FileNotFoundError:
--> 459         raise EmptyDatasetError(f"The directory at {base_path} doesn't contain any data files") from None
    460 
    461 

EmptyDatasetError: The directory at hf://datasets/albertvillanova/tmp-private@79b9e4fe79670a9a050d6ebc385464891915a71d doesn't contain any data files

Expected behavior

The dataset should load.

Environment info

  • datasets version: 2.14.3
  • Platform: Linux-5.15.109+-x86_64-with-glibc2.35
  • Python version: 3.10.12
  • Huggingface_hub version: 0.16.4
  • PyArrow version: 9.0.0
  • Pandas version: 1.5.3

Metadata

Metadata

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions