Transfer-learning-for-BNER-Bioinformatics-2018

This repository contains supplementary data, and links to the model and corpora used for the paper Transfer learning for biomedical named entity recognition with neural networks.

Code

Corpora pre-processing steps were collected in a single script with a jupyter notebook for ease-of-use. Script and notebook can be found in code.

Model

The model used in this study is NeuroNER [1], a domain-independent named entity recognizer (NER) based on a bi-directional long short term memory network-conditional random field (LSTM-CRF). A repository for the model can be found here.

NeuroNER uses standard python config files to specify hyperparameters. We provide three of these config files for reproducibility (see code/configs):

baseline.ini: config used while training on the target data sets (i.e., the baseline.)
source.ini: config used while training on the source data sets.
transfer.ini: config used while transferring a model trained on the source data set for training on a target data set.

Word Embeddings

The word embeddings used in this study were obtained from here [2]. Code for converting the word vectors to the .txt format necessary for use with NeuroNER can be found in the jupyter notebook in code, under data cleaning.

Corpora

All corpora used in this study (which can be re-distributed) are in the corpora folder (given in Brat-standoff format).

Data can be uncompressed with the following command: tar -zxvf <name_of_corpora>.

Alternatively, the corpora can be publicly accessed at the following links:

Corpora	Text Genre	Standard	Entities	Publication
AZDC	Scientific Article	Gold	disease	link
BioCreative II GM	Scientific Article	Gold	genes/proteins	link
BioInfer	Scientific Article	Gold	genes/proteins	link
BioSemantics	Patent	Gold	chemicals, disease	link
CALBC-III-Small	Scientific Article	Silver	chemicals, diseases, species, genes/proteins	link
CDR	Scientific Article	Gold	chemicals, diseases	link
CellFinder	Scientific Article	Gold	species, gene/proteins, cells, anatomy	link
CHEMDNER Patent	Patent	Gold	chemicals	link
DECA	Scientific Article	Gold	gene/proteins	link
FSU-PRGE	Scientific Article	Gold	genes/proteins	link
Linneaus	Scientific Article	Gold	species	link
LocText	Scientific Article	Gold	species, genes/proteins	link
IEPA	Scientific Article	Gold	genes/proteins	link
miRNA	Scientific Article	Gold	diseases, species, genes/proteins	link
NCBI disease	Scientific Article	Gold	diseases	link
S800	Scientific Article	Gold	species	link
Variome	Scientific Article	Gold	diseases, species, genes/proteins	link

Many of these corpora can also be accessed and visualized in the browser here [3].

Supplementary Information

The supplementary data can be found in the file supplementary/additional_file_1.pdf. Additionally, blacklists used for the silver-standard corpora (SSCs) can be found in supplementary/blacklists.

Citations

Dernoncourt, F., Lee, J. Y., & Szolovits, P. (2017). NeuroNER: an easy-to-use program for named-entity recognition based on neural networks. arXiv preprint arXiv:1705.05487.
Moen, S. P. F. G. H., & Ananiadou, T. S. S. (2013). Distributional semantics resources for biomedical text processing. In Proceedings of the 5th International Symposium on Languages in Biology and Medicine, Tokyo, Japan (pp. 39-43).
Stenetorp, P., Topić, G., Pyysalo, S., Ohta, T., Kim, J. D., & Tsujii, J. I. (2011, June). BioNLP shared task 2011: Supporting resources. In Proceedings of the BioNLP Shared Task 2011 Workshop (pp. 112-120). Association for Computational Linguistics.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
code		code
corpora		corpora
supplementary		supplementary
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Transfer-learning-for-BNER-Bioinformatics-2018

Code

Model

Word Embeddings

Corpora

Supplementary Information

Citations

About

Uh oh!

Releases

Packages

Languages

BaderLab/Transfer-Learning-BNER-Bioinformatics-2018

Folders and files

Latest commit

History

Repository files navigation

Transfer-learning-for-BNER-Bioinformatics-2018

Code

Model

Word Embeddings

Corpora

Supplementary Information

Citations

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages