Skip to content

Commit 9a9df85

Browse files
Fix bug with labels of eurlex config of lex_glue dataset (#5048)
* Update lex_glue.py Fix for a critical bug in the EURLEX dataset label list to make LexGLUE EURLEX results replicable. In LexGLUE (Chalkidis et al., 2022), the following is mentioned w.r.t. EUR-LEX: "It supports four different label granularities, comprising 21, 127, 567, 7390 EuroVoc concepts, respectively. We use the 100 most frequent concepts from level 2 [...]”. The current label list has all 127 labels, which leads to different (lower) results, as communicated by users. * Update lex_glue.py Update code formatting * Update dataset card * Update metadata JSON Co-authored-by: Albert Villanova del Moral <[email protected]>
1 parent a4a571a commit 9a9df85

File tree

3 files changed

+2
-29
lines changed

3 files changed

+2
-29
lines changed

datasets/lex_glue/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -199,7 +199,7 @@ An example of 'train' looks as follows.
199199
```json
200200
{
201201
"text": "COMMISSION REGULATION (EC) No 1629/96 of 13 August 1996 on an invitation to tender for the refund on export of wholly milled round grain rice to certain third countries ...",
202-
"labels": [2, 42, 72, 76, 86]
202+
"labels": [4, 20, 21, 35, 68]
203203
}
204204
```
205205

0 commit comments

Comments
 (0)