-
Notifications
You must be signed in to change notification settings - Fork 3k
Closed
Description
Describe the bug
I'm building train, dev, and test using from_generator; however, in all three cases, the logger prints Generating train split:
It's not possible to change the split name since it seems to be hardcoded: https://github.com/huggingface/datasets/blob/main/src/datasets/packaged_modules/generator/generator.py
Steps to reproduce the bug
In [1]: from datasets import Dataset
In [2]: def gen():
...: yield {"pokemon": "bulbasaur", "type": "grass"}
...:
In [3]: ds = Dataset.from_generator(gen)
Generating train split: 1 examples [00:00, 133.89 examples/s]
Expected behavior
It should be possible to specify any split name
Environment info
datasetsversion: 2.19.2- Platform: macOS-10.16-x86_64-i386-64bit
- Python version: 3.8.5
huggingface_hubversion: 0.23.3- PyArrow version: 15.0.0
- Pandas version: 2.0.3
fsspecversion: 2023.10.0
albertvillanova
Metadata
Metadata
Assignees
Labels
No labels