Skip to content

Analyzing Multilingual French and Russian Text: Error message when using Language.factory("language_detector") #3713

@christofs

Description

@christofs

I've gone through this wonderful lesson with some students and we noticed what appears to be a small source an error in the Jupyter notebook cell using the spacy "Language.factory()" component. Not sure this is a consistent error, or maybe due to some recent update in how spaCy works, but it might be worth checking. We experienced this both on Linux (with Python 3.13 and spaCy 3.8.11) and in a Google Colab environment (using Python 3.12 and spaCy 3.8.11).

This is the cell as currently shown in the lesson:

# Let’s try spaCy first. First, we install the spacy_langdetect package from the Python Package Index:

pip install spacy_langdetect

# Then we import it and use it to detect our languages:

from spacy.language import Language
from spacy_langdetect import LanguageDetector

# setting up our pipeline
Language.factory("language_detector")
nlp.add_pipe('language_detector', last=True)

# running the language detection on each sentence and printing the results
rus_doc = nlp(spacy_rus_sent)
print(rus_doc._.language)

fre_doc = nlp(spacy_fre_sent)
print(fre_doc._.language)

multi_doc = nlp(spacy_multi_sent)
print(multi_doc._.language)

This appears to produce the following error:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[14], line 6
      4 # setting up our pipeline
      5 Language.factory("language_detector")
----> 6 nlp.add_pipe('language_detector', last=True)
      8 # running the language detection on each sentence and printing the results
      9 rus_doc = nlp(spacy_rus_sent)

AttributeError: 'MultilingualPipeline' object has no attribute 'add_pipe'

The code that worked for us is as follows, but note that there might be other and probably more elegant solutions as well:

# First, we install the spacy_langdetect package from the Python Package Index (With the `!`, or on the command line) 
!pip install spacy_langdetect

# then we import it 
from spacy.language import Language
from spacy_langdetect import LanguageDetector

# We register a new, empty multilingual pipeline 
nlp = spacy.blank("xx")

# Register the factory (DO THIS ONLY ONCE! Needs to be commented out for multiple runs) 
Language.factory(
    "language_detector",
    func=lambda nlp, name: LanguageDetector()
)

# Now the `.add_pipe()` method works. BUT: it requires both the sentencizer and the "language detector" (in this order!) 
nlp.add_pipe("sentencizer")
nlp.add_pipe("language_detector", last=True)

# Now this works as well: running the language detection on each sentence and printing the results
rus_doc = nlp(spacy_rus_sent)
print(rus_doc._.language)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions