-
Notifications
You must be signed in to change notification settings - Fork 3k
Update: GooAQ - add train/val/test splits #2792
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@albertvillanova my tests are failing here: When I try loading dataset on local machine it works fine. Any suggestions on how can I avoid this error? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @bhavitvyamalik, thanks a lot for the addition of this dataset! ^^
The error you get is due to the dummy data you generated:
- The IDs present in the dummy
gooaq.jsonlare: 1, 2, 3, 4 and 5 - However, the IDs present in the dummy
split.jsonare:{"dev": [3880119, 1038845, 2069835, 1960624, 2938642], "test": [2022145, 6465663, 2063013, 1139244, 1996513], "train": [[2302335, 0.5], [6028813, 1.0], [2560106, 1.0], [2208050, 0.5], [3073548, 0.6666666666666666]]}
Because of this mismatch, the test generates a dataset which is empty for all the splits:
DatasetDict({
train: Dataset({
features: ['id', 'question', 'short_answer', 'answer', 'answer_type'],
num_rows: 0
})
validation: Dataset({
features: ['id', 'question', 'short_answer', 'answer', 'answer_type'],
num_rows: 0
})
test: Dataset({
features: ['id', 'question', 'short_answer', 'answer', 'answer_type'],
num_rows: 0
})
})And the test fails, as it checks that the dataset is not empty for all splits: len(dataset[split]) > 0
You should modify one of the files so that the IDs in both files match.
For example by setting split.json to:
{"dev": [1], "test": [2, 3], "train": [[4, 0.5], [5, 1.0]]}
|
Thanks for the help, @albertvillanova! All tests are passing now. |
lhoestq
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice thank you ! Before we merge, could you just update the version of the Gooaq builder class ?
lhoestq
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks !
GooAQ dataset was recently updated after splits were added for the same. This PR contains new updated GooAQ with train/val/test splits and updated README as well.