This package wraps the Trankit library, so you can use trankit models in a spaCy pipeline.
Using this wrapper, you'll be able to use the following annotations, computed by
your pretrained trankit pipeline/model:
- Statistical tokenization (reflected in the
Docand its tokens) - Lemmatization (
token.lemmaandtoken.lemma_) - Part-of-speech tagging (
token.tag,token.tag_,token.pos,token.pos_) - Morphological analysis (
token.morph) - Dependency parsing (
token.dep,token.dep_,token.head) - Named entity recognition (
doc.ents,token.ent_type,token.ent_type_,token.ent_iob,token.ent_iob_) - Sentence segmentation (
doc.sents)
As of v0.1.0 spacy-trankit is only compatible with spaCy v3.x. To install
the most recent version:
pip install git+https://github.com/imvladikon/spacy-trankitor from pypi:
pip install spacy-trankitLoad pre-trained trankit model into a spaCy pipeline:
import spacy_trankit
# Initialize the pipeline
nlp = spacy_trankit.load("en")
doc = nlp("Barack Obama was born in Hawaii. He was elected president in 2008.")
for token in doc:
print(token.text, token.lemma_, token.pos_, token.dep_, token.ent_type_)
print(doc.ents)Load it from the path:
import spacy_trankit
# Initialize the pipeline
nlp = spacy_trankit.load_from_path(name="en", path="./cache")
doc = nlp("Barack Obama was born in Hawaii. He was elected president in 2008.")
for token in doc:
print(token.text, token.lemma_, token.pos_, token.dep_, token.ent_type_)
print(doc.ents)