Analyzing Multilingual French and Russian Text: Error message when using Language.factory("language_detector")

I've gone through this wonderful lesson with some students and we noticed what appears to be a small source an error in the Jupyter notebook cell using the spacy "Language.factory()" component. Not sure this is a consistent error, or maybe due to some recent update in how spaCy works, but it might be worth checking. We experienced this both on Linux (with Python 3.13 and spaCy 3.8.11) and in a Google Colab environment (using Python 3.12 and spaCy 3.8.11).  

This is the cell as currently shown in the lesson: 

```
# Let’s try spaCy first. First, we install the spacy_langdetect package from the Python Package Index:

pip install spacy_langdetect

# Then we import it and use it to detect our languages:

from spacy.language import Language
from spacy_langdetect import LanguageDetector

# setting up our pipeline
Language.factory("language_detector")
nlp.add_pipe('language_detector', last=True)

# running the language detection on each sentence and printing the results
rus_doc = nlp(spacy_rus_sent)
print(rus_doc._.language)

fre_doc = nlp(spacy_fre_sent)
print(fre_doc._.language)

multi_doc = nlp(spacy_multi_sent)
print(multi_doc._.language)
```

This appears to produce the following error: 

```
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[14], line 6
      4 # setting up our pipeline
      5 Language.factory("language_detector")
----> 6 nlp.add_pipe('language_detector', last=True)
      8 # running the language detection on each sentence and printing the results
      9 rus_doc = nlp(spacy_rus_sent)

AttributeError: 'MultilingualPipeline' object has no attribute 'add_pipe'
```

The code that worked for us is as follows, but note that there might be other and probably more elegant solutions as well: 

```
# First, we install the spacy_langdetect package from the Python Package Index (With the `!`, or on the command line) 
!pip install spacy_langdetect

# then we import it 
from spacy.language import Language
from spacy_langdetect import LanguageDetector

# We register a new, empty multilingual pipeline 
nlp = spacy.blank("xx")

# Register the factory (DO THIS ONLY ONCE! Needs to be commented out for multiple runs) 
Language.factory(
    "language_detector",
    func=lambda nlp, name: LanguageDetector()
)

# Now the `.add_pipe()` method works. BUT: it requires both the sentencizer and the "language detector" (in this order!) 
nlp.add_pipe("sentencizer")
nlp.add_pipe("language_detector", last=True)

# Now this works as well: running the language detection on each sentence and printing the results
rus_doc = nlp(spacy_rus_sent)
print(rus_doc._.language)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Analyzing Multilingual French and Russian Text: Error message when using Language.factory("language_detector") #3713

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Analyzing Multilingual French and Russian Text: Error message when using Language.factory("language_detector") #3713

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions