-
Notifications
You must be signed in to change notification settings - Fork 3k
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
We had a regression and the order of the splits is not preserved. They are alphabetically sorted, instead of preserving original "train", "validation", "test" order.
Check: In branch "main"
In [9]: dataset = load_dataset("adversarial_qa", '"adversarialQA")
In [10]: dataset
Out[10]:
DatasetDict({
test: Dataset({
features: ['id', 'title', 'context', 'question', 'answers', 'metadata'],
num_rows: 3000
})
train: Dataset({
features: ['id', 'title', 'context', 'question', 'answers', 'metadata'],
num_rows: 30000
})
validation: Dataset({
features: ['id', 'title', 'context', 'question', 'answers', 'metadata'],
num_rows: 3000
})
})Before (2.15.0) it was:
DatasetDict({
train: Dataset({
features: ['id', 'title', 'context', 'question', 'answers', 'metadata'],
num_rows: 30000
})
validation: Dataset({
features: ['id', 'title', 'context', 'question', 'answers', 'metadata'],
num_rows: 3000
})
test: Dataset({
features: ['id', 'title', 'context', 'question', 'answers', 'metadata'],
num_rows: 3000
})
})See issues:
- https://huggingface.co/datasets/adversarial_qa/discussions/3
- https://huggingface.co/datasets/beans/discussions/4
This is a regression because it was previously fixed. See:
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working