-
Notifications
You must be signed in to change notification settings - Fork 228
Open
Description
I've gone through this wonderful lesson with some students and we noticed what appears to be a small source an error in the Jupyter notebook cell using the spacy "Language.factory()" component. Not sure this is a consistent error, or maybe due to some recent update in how spaCy works, but it might be worth checking. We experienced this both on Linux (with Python 3.13 and spaCy 3.8.11) and in a Google Colab environment (using Python 3.12 and spaCy 3.8.11).
This is the cell as currently shown in the lesson:
# Let’s try spaCy first. First, we install the spacy_langdetect package from the Python Package Index:
pip install spacy_langdetect
# Then we import it and use it to detect our languages:
from spacy.language import Language
from spacy_langdetect import LanguageDetector
# setting up our pipeline
Language.factory("language_detector")
nlp.add_pipe('language_detector', last=True)
# running the language detection on each sentence and printing the results
rus_doc = nlp(spacy_rus_sent)
print(rus_doc._.language)
fre_doc = nlp(spacy_fre_sent)
print(fre_doc._.language)
multi_doc = nlp(spacy_multi_sent)
print(multi_doc._.language)
This appears to produce the following error:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In[14], line 6
4 # setting up our pipeline
5 Language.factory("language_detector")
----> 6 nlp.add_pipe('language_detector', last=True)
8 # running the language detection on each sentence and printing the results
9 rus_doc = nlp(spacy_rus_sent)
AttributeError: 'MultilingualPipeline' object has no attribute 'add_pipe'
The code that worked for us is as follows, but note that there might be other and probably more elegant solutions as well:
# First, we install the spacy_langdetect package from the Python Package Index (With the `!`, or on the command line)
!pip install spacy_langdetect
# then we import it
from spacy.language import Language
from spacy_langdetect import LanguageDetector
# We register a new, empty multilingual pipeline
nlp = spacy.blank("xx")
# Register the factory (DO THIS ONLY ONCE! Needs to be commented out for multiple runs)
Language.factory(
"language_detector",
func=lambda nlp, name: LanguageDetector()
)
# Now the `.add_pipe()` method works. BUT: it requires both the sentencizer and the "language detector" (in this order!)
nlp.add_pipe("sentencizer")
nlp.add_pipe("language_detector", last=True)
# Now this works as well: running the language detection on each sentence and printing the results
rus_doc = nlp(spacy_rus_sent)
print(rus_doc._.language)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels