-
Notifications
You must be signed in to change notification settings - Fork 3k
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
Loading code_search_net seems not possible at the moment.
Steps to reproduce the bug
>>> load_dataset('code_search_net')
Downloading: 8.50kB [00:00, 3.09MB/s]
Downloading: 19.1kB [00:00, 10.1MB/s]
No config specified, defaulting to: code_search_net/all
Downloading and preparing dataset code_search_net/all (download: 4.77 GiB, generated: 5.99 GiB, post-processed: Unknown size, total: 10.76 GiB) to /Users/thomwolf/.cache/huggingface/datasets/code_search_net/all/1.0.0/b3e8278faf5d67da1d06981efbeac3b76a2900693bd2239bbca7a4a3b0d6e52a...
Traceback (most recent call last):
File "/Users/thomwolf/Documents/GitHub/datasets/src/datasets/builder.py", line 1067, in _prepare_split
writer.write(example, key)
File "/Users/thomwolf/Documents/GitHub/datasets/src/datasets/arrow_writer.py", line 343, in write
self.check_duplicate_keys()
File "/Users/thomwolf/Documents/GitHub/datasets/src/datasets/arrow_writer.py", line 354, in check_duplicate_keys
raise DuplicatedKeysError(key)
datasets.keyhash.DuplicatedKeysError: FAILURE TO GENERATE DATASET !
Found duplicate Key: 48
Keys should be unique and deterministic in natureEnvironment info
datasetsversion: 1.8.1.dev0- Platform: macOS-10.15.7-x86_64-i386-64bit
- Python version: 3.8.5
- PyArrow version: 2.0.0
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working