BIORANGE is an extension of the BioRED dataset by annotating NIL entities and extracting n-ary relations. This work highlights the potential for reusing existing datasets, in a resourceful manner to improve edge cases.
Our academic paper describing BIORANGE in detail can be found [here] (COMING SOON).
This repository provides:
- Annotated Corpus: Includes NIL entities and n-ary relation annotations:
(Entities/data/annotations/round_3.tar.gz) - N-ary Relation Corpora: Full corpora for 3-ary and 4-ary relations:
(RE/nary_dataset) - Annotation Guidelines: Detailed instructions for dataset expansion:
(BioRED_Expansion_Guidelines.pdf)
-
- Annotate NIL entities in the BioRED dataset.
- Annotate
CellTypeOrAnatomicalentities using the best available identifiers in target knowledge bases.
-
- Expand binary relations in BioRED into n-ary relations (3-ary and 4-ary).
**Note on Round Labels** The datasets labeled **Round 2** and **Round 3** correspond to the paper **Round 1** and **Round 2**, respectively. The original labels were kept to avoid potential errors when renaming multiple files.
To cite BiORANGE, please use the following:
(Coming soon)
We acknowledge the contributions of:
- Furkan Goz and Filipe Nascimento: Assistance during curation of NER, EL, and RE tasks.
- Authors of BioRED: For making the data and code publicly available.
This work was supported by:
- FCT (Fundação para a Ciência e a Tecnologia) through:
- PhD Scholarship 2020.05393.BD (awarded to PR).
- PhD Scholarship UI/BD/153730/2022 (awarded to SIRC).
- LASIGE Research Unit, reference *UID/000408/2025