Add KLUE dataset #2416

jungwhank · 2021-05-27T15:49:51Z

Add KLUE (Korean Language Understanding Evaluation) dataset released recently from paper, github and webpage.
Please let me know if there's anything missing in the code or README.
Thanks!

datasets/klue/klue.py

Co-authored-by: Minho Ryu <[email protected]>

jungwhank · 2021-05-30T14:33:54Z

I'm not sure why I got error like below when I auto-generate dummy data "mrc"

datasets.keyhash.DuplicatedKeysError: FAILURE TO GENERATE DATASET !
Found duplicate Key: 0
Keys should be unique and deterministic in nature

bzantium · 2021-05-31T04:36:12Z

I'm not sure why I got error like below when I auto-generate dummy data "mrc"
datasets.keyhash.DuplicatedKeysError: FAILURE TO GENERATE DATASET !
Found duplicate Key: 0
Keys should be unique and deterministic in nature

Please check out the suggestion below. I think it might be a cause.

datasets/klue/klue.py

Co-authored-by: Minho Ryu <[email protected]>

jungwhank · 2021-05-31T11:26:32Z

I'm not sure why I got error like below when I auto-generate dummy data "mrc"
datasets.keyhash.DuplicatedKeysError: FAILURE TO GENERATE DATASET !
Found duplicate Key: 0
Keys should be unique and deterministic in nature
Please check out the suggestion below. I think it might be a cause.

The problem was id_ in mrc when yield was not unique. (I used index in enumerate(paragraphs) by mistake)
I fixed it and update all the things

datasets/klue/klue.py

Co-authored-by: Minho Ryu <[email protected]>

lhoestq

Awesome thank you ! You did a really great job adding the dataset.
The dataset card and the python scripts are really good.

I just added a few comments.
After some changes regarding features you will probably need to regenerate the dataset_infos.json file

Also I noticed that some of the dummy data are bigger than 20KB, could you try to reduce their sizes please ? For example the mrc dummy data file is 200KB. I think this is because it contains data for several tens of examples for each split. In the dummy data we expect to have less than 5 examples so that they can be loaded quickly.

datasets/klue/README.md

datasets/klue/klue.py

lhoestq · 2021-06-04T10:21:14Z

To fix the CI you can just merge master into your branch and it should be all green hopefully :)

Co-authored-by: Quentin Lhoest <[email protected]>

jungwhank · 2021-06-04T14:08:15Z

@lhoestq
Thanks for reviewing!

It's harder than I thought to add dataset card. 😅
I checked and updated your suggestion (script, readme details, dummy data).

dummy data is little bit larger than expected because ner dataset is about 80 lines and dp dataset is about 25 lines to avoid 0 examples.

I'm not sure why some CI keep fails, can u check for this?

lhoestq · 2021-06-04T14:30:39Z

Thanks ! That makes sense for ner and dp

For mrc on the other hand there are still too many examples, maybe you can generate the dummy data for 5 examples for all tasks except ner and dp ?

datasets/klue/klue.py

jungwhank · 2021-06-04T14:58:53Z

Thanks ! That makes sense for ner and dp

For mrc on the other hand there are still too many examples, maybe you can generate the dummy data for 5 examples for all tasks except ner and dp ?

Yes, I generate default lines in dataset-cli for other dataset except "dp" and "ner"
I fixed mrc dataset, hope it's fine now :)

the reason CI failed was I forgot to merge master into my branch 😅

lhoestq

Thanks a LOT ! This looks all good to me now :)

jungwhank added 6 commits May 28, 2021 00:37

add klue

87cff14

apply make style

c58fd5d

apply flake8

be54e2e

update README.md

57b5f11

update README

f8872ff

fix task_ids, source_datasets

a4e27e9

bhavitvyamalik mentioned this pull request May 27, 2021

add utf-8 while reading README #2418

Merged

jungwhank mentioned this pull request May 28, 2021

홈페이지 statistics typo KLUE-benchmark/KLUE#4

Closed

bzantium reviewed May 29, 2021

View reviewed changes

jungwhank and others added 8 commits May 29, 2021 23:17

fix category

f88b385

Co-authored-by: Minho Ryu <[email protected]>

fix head feature

44a976e

Co-authored-by: Minho Ryu <[email protected]>

fix index feature

92d4a97

Co-authored-by: Minho Ryu <[email protected]>

delete unnecessary features

57bf1eb

Co-authored-by: Minho Ryu <[email protected]>

delete unnecessary features

317dc81

Co-authored-by: Minho Ryu <[email protected]>

fix ynat features

3e7c799

fix generate examples

20d1ac8

update readme, info, dummy data

8cb8439

make style

b14a47f

bzantium reviewed May 31, 2021

View reviewed changes

datasets/klue/klue.py Outdated Show resolved Hide resolved

datasets/klue/klue.py Show resolved Hide resolved

jungwhank and others added 4 commits May 31, 2021 19:31

fix mrc

f8cca1f

Co-authored-by: Minho Ryu <[email protected]>

fix mrc yield

599b440

Co-authored-by: Minho Ryu <[email protected]>

fix mrc unique id

143e4f2

update readme and fix dummy data & infos

1be7a36

bzantium reviewed May 31, 2021

View reviewed changes

datasets/klue/klue.py Show resolved Hide resolved

fix guid to id_

0bd8451

Co-authored-by: Minho Ryu <[email protected]>

lhoestq reviewed Jun 4, 2021

View reviewed changes

jungwhank and others added 2 commits June 4, 2021 21:50

update dp

9b57186

Co-authored-by: Quentin Lhoest <[email protected]>

update script, readme and dummy dataset

e227245

Merge branch 'master' into klue

c87d38f

lhoestq reviewed Jun 4, 2021

View reviewed changes

datasets/klue/klue.py Outdated Show resolved Hide resolved

jungwhank added 2 commits June 4, 2021 23:39

fix ner features

c7575b9

fix dummy dataset smaller

10ee1df

lhoestq approved these changes Jun 4, 2021

View reviewed changes

lhoestq merged commit ede1bbd into huggingface:master Jun 4, 2021

jungwhank deleted the klue branch June 9, 2021 15:00

ingyuseong mentioned this pull request Jul 3, 2023

Add KLUE-MRC metrics #6002

Closed

3 tasks

Add KLUE dataset #2416

Add KLUE dataset #2416

Uh oh!

Conversation

jungwhank commented May 27, 2021

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jungwhank commented May 30, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bzantium commented May 31, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jungwhank commented May 31, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

lhoestq left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lhoestq commented Jun 4, 2021

Uh oh!

jungwhank commented Jun 4, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lhoestq commented Jun 4, 2021

Uh oh!

Uh oh!

jungwhank commented Jun 4, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lhoestq left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jungwhank commented May 30, 2021 •

edited

Loading

bzantium commented May 31, 2021 •

edited

Loading

jungwhank commented May 31, 2021 •

edited

Loading

jungwhank commented Jun 4, 2021 •

edited

Loading

jungwhank commented Jun 4, 2021 •

edited

Loading