-
-
Notifications
You must be signed in to change notification settings - Fork 4.6k
Closed
Labels
🌙 nightlyDiscussion and contributions related to nightly buildsDiscussion and contributions related to nightly builds
Description
I have encountered inconsistencies when parsing certain questions, and wanted to try updating the model to fix them. I referred to the NER-training sample and the 2.0 docs, and put together the following:
from spacy.gold import GoldParse
from functools import partial
import random
def reformat_train_data(tokenizer, examples):
"""Reformat data to match JSON format"""
output = []
for i, (text, deps, heads) in enumerate(examples):
doc = tokenizer(text)
ner_tags = [''] * len(doc)
words = [w.text for w in doc]
tags = ['-'] * len(doc)
sentence = (range(len(doc)), words, tags, heads, deps, ner_tags)
output.append((text, [(sentence, [])]))
return output
training_data = [('How long does it run?',
['advmod', 'advmod', 'aux', 'det', 'nsubj', 'ROOT'],
[1, 4, 4, 4, 4, 4]),
('How high does it reach?',
['advmod', 'advmod', 'aux', 'det', 'nsubj', 'ROOT'],
[1, 4, 4, 4, 4, 4])]
get_training_data = partial(reformat_train_data, nlp.tokenizer, training_data)
optimizer = nlp.begin_training(get_training_data)
for iteration in range(100):
random.shuffle(training_data)
for raw_text, deps, heads in training_data:
doc = nlp.make_doc(raw_text)
gold = GoldParse(deps=deps, heads=heads)
nlp.update([doc], [gold], sgd=optimizer)
However, when trying to run this, it crashes with KeyError: 13656873538139661788. It seems like the error is similar to #1052, but the particular bug from that issue appears to have been fixed.
Is there something I'm doing wrong? Or does Spacy 2.0 currently not support training the dependency parser?
Update
Info about spaCy
- spaCy version: 2.0.0a9
- Platform: Linux-4.4.0-43-Microsoft-x86_64-with-debian-stretch-sid
- Python version: 3.6.1
- Models: en, en_default
Full stack trace
KeyError Traceback (most recent call last)
<ipython-input-2-d6926b2fc85f> in <module>()
30
31
---> 32 optimizer = nlp.begin_training(get_training_data)
33 for iteration in range(100):
34 random.shuffle(training_data)
/mnt/c/Users/notnami/projects/.../.spacy-nightly/lib/python3.6/site-packages/spacy/language.py in begin_training(self, get_gold_tuples, **cfg)
367 if hasattr(proc, 'begin_training'):
368 context = proc.begin_training(get_gold_tuples(),
--> 369 pipeline=self.pipeline)
370 contexts.append(context)
371 learn_rate = util.env_opt('learn_rate', 0.001)
/mnt/c/Users/notnami/projects/..../.spacy-nightly/lib/python3.6/site-packages/spacy/pipeline.pyx in spacy.pipeline.NeuralTagger.begin_training (spacy/pipeline.cpp:16293)()
/mnt/c/Users/notnami/projects/.../.spacy-nightly/lib/python3.6/site-packages/spacy/morphology.pyx in spacy.morphology.Morphology.__init__ (spacy/morphology.cpp:4655)()
/mnt/c/Users/notnami/projects/.../.spacy-nightly/lib/python3.6/site-packages/spacy/morphology.pyx in spacy.morphology.Morphology.add_special_case (spacy/morphology.cpp:5625)()
KeyError: 13656873538139661788
Metadata
Metadata
Assignees
Labels
🌙 nightlyDiscussion and contributions related to nightly buildsDiscussion and contributions related to nightly builds