Skip to content

NER model for Armenian #1206

@ShakeHakobyan

Description

@ShakeHakobyan

Hello! I have trained a NER model for the Armenian language using the ArmTDP dataset and the xlm-roberta-base model.

After that, I attempted to test the model using stanza.Pipeline:

import stanza

config = {
'processors': 'tokenize, ner',
'lang': 'hy',
'ner_model_path': '/Lab/Projects/ner/models/hy_armtdp_nertagger_bert_18.pt',
}

nlp = stanza.Pipeline(**config)

nlp("some text in Arminian")

While working with the same data, I observed that the outputs after loading the model were different each time.
Although there was no such problem when testing the code using internal commands. Whenever I run the following code, I get the same output:

python3 -m stanza.utils.training.run_ner hy_armtdp --score_test

What could be the cause of this problem?

Additionally, I have added data conversion and BERT code for Armenian in this pull request (trained model can be downloaded from this drive).

If the problem is feasible, it would be great to integrate a NER model for Armenian in the main package

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions