The order of data split names is nondeterministic

After this CI error: https://github.com/huggingface/datasets/actions/runs/4639528358/jobs/8210492953?pr=5718
```
FAILED tests/test_data_files.py::test_get_data_files_patterns[data_file_per_split4] - AssertionError: assert ['random', 'train'] == ['train', 'random']
  At index 0 diff: 'random' != 'train'
  Full diff:
  - ['train', 'random']
  + ['random', 'train']
```
I have checked locally and found out that the data split order is nondeterministic.

This is caused by the use of `set` for sharded splits.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

The order of data split names is nondeterministic #5728

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

The order of data split names is nondeterministic #5728

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions