Skip to content

Usability Improvement: Accessing vectors for lemmas of tokens #956

@mirosval

Description

@mirosval

Thanks for an amazing library!

I have a small pain point that I have run into. My workflow is the following:

  1. Parse a sentence
  2. Remove stop words
  3. Compute distance between two sentences

In this workflow I have run into a couple of problems:

  • Lemma does not have a text property like Token does, it does have a lower_ property which is equivalent, so maybe it could be just aliased
  • There is no simple way to obtain the vector of a lemma given a Token. I used nlp.vocab[token.lemma].vector as a proxy, but I think it could be cleaner. I think it makes a lot of sense for people to use the lemma's vector rather than token's

I hope you find this useful in improving the future API. I would be willing to contribute if there is interest.


## Info about spaCy

* **spaCy version:** 1.7.3
* **Platform:** Darwin-16.5.0-x86_64-i386-64bit
* **Python version:** 3.6.1
* **Installed models:** en, en_core_web_md

## Info about model en_core_web_md

* **lang:** en
* **name:** core_web_md
* **license:** CC BY-SA 3.0
* **author:** Explosion AI
* **url:** https://explosion.ai
* **version:** 1.2.1
* **spacy_version:** >=1.7.0,<2.0.0
* **email:** [email protected]
* **description:** General-purpose English model, with tagging, parsing, entities and word vectors
* **source:** /Users/miroslav.zoricak/.local/share/virtualenvs/intent-classification-tNT6Vqf2/lib/python3.6/site-packages/en_core_web_md/en_core_web_md-1.2.1

Metadata

Metadata

Assignees

No one assigned

    Labels

    usageGeneral spaCy usage

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions