Do not sort splits in dataset info #5201

polinaeterna · 2022-11-04T10:47:21Z

I suggest not to sort splits by their names in dataset_info in README so that they are displayed in the order specified in the loading script. Otherwise test split is displayed first, see this repo: https://huggingface.co/datasets/paws
What do you think?

But I added sorting in tests to fix CI (for the same dataset).

HuggingFaceDocBuilderDev · 2022-11-04T10:51:37Z

The documentation is not available anymore as the PR was closed or merged.

severo · 2022-11-04T11:28:57Z

It would be coherent with huggingface/dataset-viewer#614 (comment)

albertvillanova · 2022-11-04T12:01:18Z

I think we started working on this issue nearly at the same time... 😅

CI was fixed with this: https://huggingface.co/datasets/paws/discussions/1

Related issue:

CI fails after bulk edit of canonical datasets #5202

polinaeterna · 2022-11-04T12:21:50Z

@albertvillanova yeah I noticed it right after the PR 😄 thank you! the fix of the dataset info yaml fixes tests on CI, but in general order of splits in yaml influences the order in which they are displayed in the viewer, if I understand it correctly. So I suggest not to sort splits in yaml initially to avoid this for other datasets in the future. I think this change should work for it.

Changes to tests here maybe can be reverted considering that order in yaml now corresponds to the one in tests, thanks to your change in the dataset info.

albertvillanova

Thanks for the fix, @polinaeterna.

I agree we should not sort splits alphabetically, but keep them in their original order.

However, I disagree we should add sorted to our tests: I think we have to test the returned order (see comment below).

albertvillanova · 2022-11-04T12:24:56Z

tests/test_inspect.py

    info = infos[expected_config]
    assert info.config_name == expected_config
-    assert list(info.splits.keys()) == expected_splits_in_first_config
+    assert sorted(info.splits.keys()) == sorted(expected_splits_in_first_config)


I'm not sure we want to avoid testing the order: as already discussed by you @polinaeterna and @severo, splits are not alphabetically sorted.

Therefore, it makes sense to test that the order returned by get_dataset_infos is the expected one.

Anyway, if finally we decide to sort them, we should do it also in test_get_dataset_config_info, which was also failing besides test_get_dataset_info and test_get_dataset_split_names.

yes, agree with you! reverted sorting in tests a063c6f

albertvillanova · 2022-11-04T12:28:59Z

Hehe, @polinaeterna, we make comments nearly at the same time as well... 😆

This reverts commit fe51b19.

albertvillanova

Thanks for the suggested improvement.

Just a comment below.

albertvillanova · 2022-11-04T12:57:36Z

src/datasets/splits.py


    def get_list_sliced_split_info(self):
-        return list(sorted(self._splits.values(), key=lambda x: x.split_info.name))
+        return self._splits.values()


I think this is not used anywhere else... But, just in case, better returning a list here?

didn't find any usages either... applied your suggestion 6e10c33, thank you!

whatever it is...

polinaeterna added 3 commits November 4, 2022 14:41

do not sort splits in splits info

aaf73c4

sort splits in tests

fe51b19

remove comment

1742bae

polinaeterna changed the title ~~Do not sort splits in Splits info~~ Do not sort splits in dataset info Nov 4, 2022

polinaeterna requested a review from lhoestq November 4, 2022 10:48

polinaeterna requested a review from albertvillanova November 4, 2022 11:17

albertvillanova reviewed Nov 4, 2022

View reviewed changes

Revert "sort splits in tests"

a063c6f

This reverts commit fe51b19.

albertvillanova approved these changes Nov 4, 2022

View reviewed changes

Polina Kazakova added 2 commits November 4, 2022 14:59

Merge branch 'huggingface:main' into do-not-sort-splits-fix-ci

b090c3a

return list in sliced split info

6e10c33

whatever it is...

polinaeterna marked this pull request as ready for review November 4, 2022 14:03

Merge branch 'huggingface:main' into do-not-sort-splits-fix-ci

fe57ad1

polinaeterna merged commit 18e1e14 into huggingface:main Nov 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Do not sort splits in dataset info #5201

Do not sort splits in dataset info #5201

Uh oh!

polinaeterna commented Nov 4, 2022 •

edited

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Nov 4, 2022 •

edited

Loading

Uh oh!

severo commented Nov 4, 2022

Uh oh!

albertvillanova commented Nov 4, 2022 •

edited

Loading

Uh oh!

polinaeterna commented Nov 4, 2022

Uh oh!

albertvillanova left a comment

Uh oh!

albertvillanova Nov 4, 2022

Uh oh!

polinaeterna Nov 4, 2022

Uh oh!

albertvillanova commented Nov 4, 2022

Uh oh!

albertvillanova left a comment

Uh oh!

albertvillanova Nov 4, 2022

Uh oh!

polinaeterna Nov 4, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Do not sort splits in dataset info #5201

Do not sort splits in dataset info #5201

Uh oh!

Conversation

polinaeterna commented Nov 4, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Nov 4, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

severo commented Nov 4, 2022

Uh oh!

albertvillanova commented Nov 4, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

polinaeterna commented Nov 4, 2022

Uh oh!

albertvillanova left a comment

Choose a reason for hiding this comment

Uh oh!

albertvillanova Nov 4, 2022

Choose a reason for hiding this comment

Uh oh!

polinaeterna Nov 4, 2022

Choose a reason for hiding this comment

Uh oh!

albertvillanova commented Nov 4, 2022

Uh oh!

albertvillanova left a comment

Choose a reason for hiding this comment

Uh oh!

albertvillanova Nov 4, 2022

Choose a reason for hiding this comment

Uh oh!

polinaeterna Nov 4, 2022

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

polinaeterna commented Nov 4, 2022 •

edited

Loading

HuggingFaceDocBuilderDev commented Nov 4, 2022 •

edited

Loading

albertvillanova commented Nov 4, 2022 •

edited

Loading