Skip to content

The order of the splits is not preserved #6521

@albertvillanova

Description

@albertvillanova

We had a regression and the order of the splits is not preserved. They are alphabetically sorted, instead of preserving original "train", "validation", "test" order.

Check: In branch "main"

In [9]: dataset = load_dataset("adversarial_qa", '"adversarialQA")

In [10]: dataset
Out[10]: 
DatasetDict({
    test: Dataset({
        features: ['id', 'title', 'context', 'question', 'answers', 'metadata'],
        num_rows: 3000
    })
    train: Dataset({
        features: ['id', 'title', 'context', 'question', 'answers', 'metadata'],
        num_rows: 30000
    })
    validation: Dataset({
        features: ['id', 'title', 'context', 'question', 'answers', 'metadata'],
        num_rows: 3000
    })
})

Before (2.15.0) it was:

DatasetDict({
    train: Dataset({
        features: ['id', 'title', 'context', 'question', 'answers', 'metadata'],
        num_rows: 30000
    })
    validation: Dataset({
        features: ['id', 'title', 'context', 'question', 'answers', 'metadata'],
        num_rows: 3000
    })
    test: Dataset({
        features: ['id', 'title', 'context', 'question', 'answers', 'metadata'],
        num_rows: 3000
    })
})

See issues:

This is a regression because it was previously fixed. See:

Metadata

Metadata

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions