Update: GooAQ - add train/val/test splits #2792

bhavitvyamalik · 2021-08-12T11:40:18Z

GooAQ dataset was recently updated after splits were added for the same. This PR contains new updated GooAQ with train/val/test splits and updated README as well.

bhavitvyamalik · 2021-08-12T12:21:52Z

@albertvillanova my tests are failing here:

dataset_name = 'gooaq'

    def test_load_dataset(self, dataset_name):
        configs = self.dataset_tester.load_all_configs(dataset_name, is_local=True)[:1]
>       self.dataset_tester.check_load_dataset(dataset_name, configs, is_local=True, use_local_dummy_data=True)

tests/test_dataset_common.py:234: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
tests/test_dataset_common.py:187: in check_load_dataset
    self.parent.assertTrue(len(dataset[split]) > 0)
E   AssertionError: False is not true

When I try loading dataset on local machine it works fine. Any suggestions on how can I avoid this error?

albertvillanova

Hi @bhavitvyamalik, thanks a lot for the addition of this dataset! ^^

The error you get is due to the dummy data you generated:

The IDs present in the dummy gooaq.jsonl are: 1, 2, 3, 4 and 5
However, the IDs present in the dummy split.json are: {"dev": [3880119, 1038845, 2069835, 1960624, 2938642], "test": [2022145, 6465663, 2063013, 1139244, 1996513], "train": [[2302335, 0.5], [6028813, 1.0], [2560106, 1.0], [2208050, 0.5], [3073548, 0.6666666666666666]]}

Because of this mismatch, the test generates a dataset which is empty for all the splits:

DatasetDict({
    train: Dataset({
        features: ['id', 'question', 'short_answer', 'answer', 'answer_type'],
        num_rows: 0
    })
    validation: Dataset({
        features: ['id', 'question', 'short_answer', 'answer', 'answer_type'],
        num_rows: 0
    })
    test: Dataset({
        features: ['id', 'question', 'short_answer', 'answer', 'answer_type'],
        num_rows: 0
    })
})

And the test fails, as it checks that the dataset is not empty for all splits: len(dataset[split]) > 0

You should modify one of the files so that the IDs in both files match.

For example by setting split.json to:

{"dev": [1], "test": [2, 3], "train": [[4, 0.5], [5, 1.0]]}

bhavitvyamalik · 2021-08-13T18:28:27Z

Thanks for the help, @albertvillanova! All tests are passing now.

lhoestq

Nice thank you ! Before we merge, could you just update the version of the Gooaq builder class ?

datasets/gooaq/gooaq.py

lhoestq

Thanks !

bhavitvyamalik added 3 commits August 12, 2021 17:01

update gooaq

5ae54ec

Merge remote-tracking branch 'origin/master' into gooaq_update

528b956

update README

a5dfc9e

add pretty name

4e5477f

albertvillanova requested changes Aug 13, 2021

View reviewed changes

dummy data changed

9eacc7b

lhoestq reviewed Aug 18, 2021

View reviewed changes

datasets/gooaq/gooaq.py Show resolved Hide resolved

bhavitvyamalik added 2 commits August 18, 2021 21:04

update version

2ec1351

remove 1.1.0 dummy_data

c3850b6

lhoestq approved these changes Aug 27, 2021

View reviewed changes

lhoestq merged commit e34e5cd into huggingface:master Aug 27, 2021

lhoestq changed the title ~~Update GooAQ~~ Update: GooAQ - add train/val/test splits Aug 27, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update: GooAQ - add train/val/test splits #2792

Update: GooAQ - add train/val/test splits #2792

Uh oh!

bhavitvyamalik commented Aug 12, 2021

Uh oh!

bhavitvyamalik commented Aug 12, 2021 •

edited

Loading

Uh oh!

albertvillanova left a comment •

edited

Loading

Uh oh!

bhavitvyamalik commented Aug 13, 2021

Uh oh!

lhoestq left a comment

Uh oh!

Uh oh!

lhoestq left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Update: GooAQ - add train/val/test splits #2792

Update: GooAQ - add train/val/test splits #2792

Uh oh!

Conversation

bhavitvyamalik commented Aug 12, 2021

Uh oh!

bhavitvyamalik commented Aug 12, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

albertvillanova left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bhavitvyamalik commented Aug 13, 2021

Uh oh!

lhoestq left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

lhoestq left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

bhavitvyamalik commented Aug 12, 2021 •

edited

Loading

albertvillanova left a comment •

edited

Loading