Skip to content

[2.0] KeyError when attempting to train the dependency parser #1268

@anna-hope

Description

@anna-hope

I have encountered inconsistencies when parsing certain questions, and wanted to try updating the model to fix them. I referred to the NER-training sample and the 2.0 docs, and put together the following:

from spacy.gold import GoldParse

from functools import partial
import random

def reformat_train_data(tokenizer, examples):
    """Reformat data to match JSON format"""
    output = []
    for i, (text, deps, heads) in enumerate(examples):
        doc = tokenizer(text)
        ner_tags = [''] * len(doc)
        words = [w.text for w in doc]
        tags = ['-'] * len(doc)
        sentence = (range(len(doc)), words, tags, heads, deps, ner_tags)
        output.append((text, [(sentence, [])]))
    return output

training_data = [('How long does it run?', 
                  ['advmod', 'advmod', 'aux', 'det', 'nsubj', 'ROOT'],
                  [1, 4, 4, 4, 4, 4]),
                ('How high does it reach?',
                 ['advmod', 'advmod', 'aux', 'det', 'nsubj', 'ROOT'],
                 [1, 4, 4, 4, 4, 4])]

get_training_data = partial(reformat_train_data, nlp.tokenizer, training_data)

optimizer = nlp.begin_training(get_training_data)
for iteration in range(100):
    random.shuffle(training_data)
    for raw_text, deps, heads in training_data:
        doc = nlp.make_doc(raw_text)
        gold = GoldParse(deps=deps, heads=heads)
        nlp.update([doc], [gold], sgd=optimizer)

However, when trying to run this, it crashes with KeyError: 13656873538139661788. It seems like the error is similar to #1052, but the particular bug from that issue appears to have been fixed.

Is there something I'm doing wrong? Or does Spacy 2.0 currently not support training the dependency parser?

Update

Info about spaCy

  • spaCy version: 2.0.0a9
  • Platform: Linux-4.4.0-43-Microsoft-x86_64-with-debian-stretch-sid
  • Python version: 3.6.1
  • Models: en, en_default

Full stack trace

KeyError                                  Traceback (most recent call last)
<ipython-input-2-d6926b2fc85f> in <module>()
     30 
     31 
---> 32 optimizer = nlp.begin_training(get_training_data)
     33 for iteration in range(100):
     34     random.shuffle(training_data)

/mnt/c/Users/notnami/projects/.../.spacy-nightly/lib/python3.6/site-packages/spacy/language.py in begin_training(self, get_gold_tuples, **cfg)
    367             if hasattr(proc, 'begin_training'):
    368                 context = proc.begin_training(get_gold_tuples(),
--> 369                                               pipeline=self.pipeline)
    370                 contexts.append(context)
    371         learn_rate = util.env_opt('learn_rate', 0.001)


/mnt/c/Users/notnami/projects/..../.spacy-nightly/lib/python3.6/site-packages/spacy/pipeline.pyx in spacy.pipeline.NeuralTagger.begin_training (spacy/pipeline.cpp:16293)()

/mnt/c/Users/notnami/projects/.../.spacy-nightly/lib/python3.6/site-packages/spacy/morphology.pyx in spacy.morphology.Morphology.__init__ (spacy/morphology.cpp:4655)()

/mnt/c/Users/notnami/projects/.../.spacy-nightly/lib/python3.6/site-packages/spacy/morphology.pyx in spacy.morphology.Morphology.add_special_case (spacy/morphology.cpp:5625)()

KeyError: 13656873538139661788

Metadata

Metadata

Assignees

No one assigned

    Labels

    🌙 nightlyDiscussion and contributions related to nightly builds

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions