-
Notifications
You must be signed in to change notification settings - Fork 3.1k
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
At least for 'reclor', 'telugu_books', 'turkish_movie_sentiment', 'ubuntu_dialogs_corpus', 'wikihow', trying to load_dataset in streaming mode raises a TypeError without any detail about why it fails.
Steps to reproduce the bug
from datasets import load_dataset
dataset = load_dataset("reclor", streaming=True)Expected results
Ideally: raise a specific exception, something like ManualDownloadError.
Or at least give the reason in the message, as when we load in normal mode:
from datasets import load_dataset
dataset = load_dataset("reclor")AssertionError: The dataset reclor with config default requires manual data.
Please follow the manual download instructions: to use ReClor you need to download it manually. Please go to its homepage (http://whyu.me/reclor/) fill the google
form and you will receive a download link and a password to extract it.Please extract all files in one folder and use the path folder in datasets.load_dataset('reclor', data_dir='path/to/folder/folder_name')
.
Manual data can be loaded with `datasets.load_dataset(reclor, data_dir='<path/to/manual/data>')
Actual results
TypeError: expected str, bytes or os.PathLike object, not NoneType
Environment info
datasetsversion: 1.11.0- Platform: macOS-11.5-x86_64-i386-64bit
- Python version: 3.8.11
- PyArrow version: 4.0.1
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working