-
Notifications
You must be signed in to change notification settings - Fork 1
Open
Labels
Description
Hi jakelever,
Thanks for this wonderful project.
When i used the civicmine (http://bionlp.bcgsc.ca/civicmine) i can't find "T790M" in any sentence. It was odd for me because EGFR T790M is very famous biomarker in treatment cancer.
This is a tokenizer problem that Spacy language model (en_core_web_sm) tokenizes the "T790M" as a "T790" and "M". (('T790', 'NOUN'), ('M', 'PROPN'))
I changed the kindred package like this (kindred/Parser.py)
if not model in Parser._models:
Parser._models[model] = spacy.load(model, disable=['ner'])
self.nlp = Parser._models[model]
special_case = [{ORTH: "T790M"}]
self.nlp.tokenizer.add_special_case("T790M", special_case)Now "T790M" is ('T790M', 'VERB') fixed.
best,
jakelever
Reactions are currently unavailable