-
Notifications
You must be signed in to change notification settings - Fork 3k
Add KLUE dataset #2416
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add KLUE dataset #2416
Conversation
Co-authored-by: Minho Ryu <[email protected]>
Co-authored-by: Minho Ryu <[email protected]>
Co-authored-by: Minho Ryu <[email protected]>
Co-authored-by: Minho Ryu <[email protected]>
Co-authored-by: Minho Ryu <[email protected]>
|
I'm not sure why I got error like below when I auto-generate dummy data "mrc" |
Please check out the suggestion below. I think it might be a cause. |
Co-authored-by: Minho Ryu <[email protected]>
Co-authored-by: Minho Ryu <[email protected]>
The problem was |
Co-authored-by: Minho Ryu <[email protected]>
lhoestq
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome thank you ! You did a really great job adding the dataset.
The dataset card and the python scripts are really good.
I just added a few comments.
After some changes regarding features you will probably need to regenerate the dataset_infos.json file
Also I noticed that some of the dummy data are bigger than 20KB, could you try to reduce their sizes please ? For example the mrc dummy data file is 200KB. I think this is because it contains data for several tens of examples for each split. In the dummy data we expect to have less than 5 examples so that they can be loaded quickly.
|
To fix the CI you can just merge master into your branch and it should be all green hopefully :) |
Co-authored-by: Quentin Lhoest <[email protected]>
|
@lhoestq It's harder than I thought to add dataset card. 😅 dummy data is little bit larger than expected because I'm not sure why some CI keep fails, can u check for this? |
|
Thanks ! That makes sense for ner and dp For mrc on the other hand there are still too many examples, maybe you can generate the dummy data for 5 examples for all tasks except ner and dp ? |
Yes, I generate default lines in dataset-cli for other dataset except "dp" and "ner" the reason CI failed was I forgot to merge master into my branch 😅 |
lhoestq
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a LOT ! This looks all good to me now :)
Add
KLUE (Korean Language Understanding Evaluation)dataset released recently from paper, github and webpage.Please let me know if there's anything missing in the code or README.
Thanks!