Skip to content

Cannot load timit_asr data set #4422

@bhaddow

Description

@bhaddow

Describe the bug

I am trying to load the timit_asr data set. I have tried with a copy from the LDC, and a copy from deepai. In both cases they fail with a "duplicate key" error. With the LDC version I have to convert the file extensions all to upper-case before I can load it at all.

Steps to reproduce the bug

timit = datasets.load_dataset("timit_asr", data_dir = "/path/to/dataset")
# Sample code to reproduce the bug

Expected results

The data set should load without error. It worked for me before the LDC url change.

Actual results

datasets.keyhash.DuplicatedKeysError: FAILURE TO GENERATE DATASET !
Found duplicate Key: SA1
Keys should be unique and deterministic in nature

Environment info

  • datasets version:
  • datasets version: 2.2.2
  • Platform: Linux-5.4.0-90-generic-x86_64-with-glibc2.17
  • Python version: 3.8.12
  • PyArrow version: 8.0.0
  • Pandas version: 1.4.2

Metadata

Metadata

Labels

bugSomething isn't working

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions