Describe the bug
I am trying to load the timit_asr data set. I have tried with a copy from the LDC, and a copy from deepai. In both cases they fail with a "duplicate key" error. With the LDC version I have to convert the file extensions all to upper-case before I can load it at all.
Steps to reproduce the bug
timit = datasets.load_dataset("timit_asr", data_dir = "/path/to/dataset")
# Sample code to reproduce the bug
Expected results
The data set should load without error. It worked for me before the LDC url change.
Actual results
datasets.keyhash.DuplicatedKeysError: FAILURE TO GENERATE DATASET !
Found duplicate Key: SA1
Keys should be unique and deterministic in nature
Environment info
datasets version:
datasets version: 2.2.2
- Platform: Linux-5.4.0-90-generic-x86_64-with-glibc2.17
- Python version: 3.8.12
- PyArrow version: 8.0.0
- Pandas version: 1.4.2
Describe the bug
I am trying to load the timit_asr data set. I have tried with a copy from the LDC, and a copy from deepai. In both cases they fail with a "duplicate key" error. With the LDC version I have to convert the file extensions all to upper-case before I can load it at all.
Steps to reproduce the bug
Expected results
The data set should load without error. It worked for me before the LDC url change.
Actual results
Environment info
datasetsversion:datasetsversion: 2.2.2