-
-
Notifications
You must be signed in to change notification settings - Fork 4.6k
Description
Hi, I tried to add and use new entities.
Here is my code.
`
import spacy
nlp = spacy.load('en')
def merge_phrases(matcher, doc, i, matches):
'''
Merge a phrase. We have to be careful here because we'll change the token indices.
To avoid problems, merge all the phrases once we're called on the last match.
'''
if i != len(matches)-1:
return None
spans = [(ent_id, label, doc[start : end]) for ent_id, label, start, end in matches]
for ent_id, label, span in spans:
span.merge('NNP' if label else span.root.tag_, span.text, nlp.vocab.strings[label])
matcher = spacy.matcher.Matcher(nlp.vocab)
matcher.add(entity_key='company-transocean', label='company', attrs={}, specs=[[{spacy.attrs.ORTH: 'Transocean Ltd'}]], on_match=merge_phrases)
matcher.add(entity_key='company-transocean-ltd', label='company', attrs={}, specs=[[{spacy.attrs.ORTH: 'Transocean'}]], on_match=merge_phrases)
doc = nlp(u"""Tell me about Macys Inc in Japan and about Transocean Ltd.""")
matcher(doc)
print(['%s|%s' % (t.orth_, t.ent_type_) for t in doc])
`
output
['Tell|', 'me|', 'about|', 'Macys|ORG', 'Inc|ORG', 'in|', 'Japan|GPE', 'and|', 'about|', 'Transocean|company', 'Ltd.|ORG']
It's start to work but not as i expect
And i have 2 questions
- i want to put 2 types of name of the same company "Transocean Ltd" and "Transocean" it's the same company but system recognized only "Transocean" and think that Ltd. is separate. I want only Transocean Ltd|Company
- How to save it, that in new start of script, spacy can use all new added entities, because i don't want to load new entities all the time when script starts