-
Notifications
You must be signed in to change notification settings - Fork 3k
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
Since the release of datasets 2.14, private/gated datasets do not load when passing token: they raise EmptyDatasetError.
This is a non-planned backward incompatible breaking change.
Note that private datasets do load if instead download_config is passed:
from datasets import DownloadConfig, load_dataset
ds = load_dataset("albertvillanova/tmp-private", split="train", download_config=DownloadConfig(token="<MY-TOKEN>"))
dsgives
Dataset({
features: ['text'],
num_rows: 4
})
Steps to reproduce the bug
from datasets import load_dataset
ds = load_dataset("albertvillanova/tmp-private", split="train", token="<MY-TOKEN>")gives
---------------------------------------------------------------------------
EmptyDatasetError Traceback (most recent call last)
[<ipython-input-2-25b48732107a>](https://localhost:8080/#) in <cell line: 3>()
1 from datasets import load_dataset
2
----> 3 ds = load_dataset("albertvillanova/tmp-private", split="train", token="<MY-TOKEN>")
5 frames
[/usr/local/lib/python3.10/dist-packages/datasets/load.py](https://localhost:8080/#) in load_dataset(path, name, data_dir, data_files, split, cache_dir, features, download_config, download_mode, verification_mode, ignore_verifications, keep_in_memory, save_infos, revision, token, use_auth_token, task, streaming, num_proc, storage_options, **config_kwargs)
2107
2108 # Create a dataset builder
-> 2109 builder_instance = load_dataset_builder(
2110 path=path,
2111 name=name,
[/usr/local/lib/python3.10/dist-packages/datasets/load.py](https://localhost:8080/#) in load_dataset_builder(path, name, data_dir, data_files, cache_dir, features, download_config, download_mode, revision, token, use_auth_token, storage_options, **config_kwargs)
1793 download_config = download_config.copy() if download_config else DownloadConfig()
1794 download_config.storage_options.update(storage_options)
-> 1795 dataset_module = dataset_module_factory(
1796 path,
1797 revision=revision,
[/usr/local/lib/python3.10/dist-packages/datasets/load.py](https://localhost:8080/#) in dataset_module_factory(path, revision, download_config, download_mode, dynamic_modules_path, data_dir, data_files, **download_kwargs)
1484 raise ConnectionError(f"Couldn't reach the Hugging Face Hub for dataset '{path}': {e1}") from None
1485 if isinstance(e1, EmptyDatasetError):
-> 1486 raise e1 from None
1487 if isinstance(e1, FileNotFoundError):
1488 raise FileNotFoundError(
[/usr/local/lib/python3.10/dist-packages/datasets/load.py](https://localhost:8080/#) in dataset_module_factory(path, revision, download_config, download_mode, dynamic_modules_path, data_dir, data_files, **download_kwargs)
1474 download_config=download_config,
1475 download_mode=download_mode,
-> 1476 ).get_module()
1477 except (
1478 Exception
[/usr/local/lib/python3.10/dist-packages/datasets/load.py](https://localhost:8080/#) in get_module(self)
1030 sanitize_patterns(self.data_files)
1031 if self.data_files is not None
-> 1032 else get_data_patterns(base_path, download_config=self.download_config)
1033 )
1034 data_files = DataFilesDict.from_patterns(
[/usr/local/lib/python3.10/dist-packages/datasets/data_files.py](https://localhost:8080/#) in get_data_patterns(base_path, download_config)
457 return _get_data_files_patterns(resolver)
458 except FileNotFoundError:
--> 459 raise EmptyDatasetError(f"The directory at {base_path} doesn't contain any data files") from None
460
461
EmptyDatasetError: The directory at hf://datasets/albertvillanova/tmp-private@79b9e4fe79670a9a050d6ebc385464891915a71d doesn't contain any data files
Expected behavior
The dataset should load.
Environment info
datasetsversion: 2.14.3- Platform: Linux-5.15.109+-x86_64-with-glibc2.35
- Python version: 3.10.12
- Huggingface_hub version: 0.16.4
- PyArrow version: 9.0.0
- Pandas version: 1.5.3
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working