Skip to content

Conversation

@iliaschalkidis
Copy link
Contributor

Fix for a critical bug in the EURLEX dataset label list to make LexGLUE EURLEX results replicable.

In LexGLUE (Chalkidis et al., 2022), the following is mentioned w.r.t. EUR-LEX: "It supports four different label granularities, comprising 21, 127, 567, 7390 EuroVoc concepts, respectively. We use the 100 most frequent concepts from level 2 [...]”. The current label list has all 127 labels, which leads to different (lower) results, as communicated by users.

Thanks!

Fix for a critical bug in the EURLEX dataset label list to make LexGLUE EURLEX results replicable. In LexGLUE (Chalkidis et al., 2022), the following is mentioned w.r.t. EUR-LEX: "It supports four different label granularities, comprising 21, 127, 567, 7390 EuroVoc concepts, respectively. We use the 100 most frequent concepts from level 2 [...]”. The current label list has all 127 labels, which leads to different (lower) results, as communicated by users.
@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Sep 30, 2022

The documentation is not available anymore as the PR was closed or merged.

Update code formatting
@iliaschalkidis
Copy link
Contributor Author

@JamesLYC88 here is the fix! Thanks again!

@albertvillanova albertvillanova changed the title Update lex_glue.py Fix bug with labels of eurlex config of lex_glue dataset Sep 30, 2022
Copy link
Member

@albertvillanova albertvillanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix, @iliaschalkidis.

@albertvillanova albertvillanova added the dataset contribution Contribution to a dataset script label Sep 30, 2022
@albertvillanova albertvillanova merged commit 9a9df85 into huggingface:main Sep 30, 2022
@iliaschalkidis
Copy link
Contributor Author

Thanks, @albertvillanova. When do you expect that this change will take effect when someone downloads the dataset?

@albertvillanova
Copy link
Member

albertvillanova commented Sep 30, 2022

The change is immediately available now, since this change we made to our library:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dataset contribution Contribution to a dataset script

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants