-
-
Notifications
You must be signed in to change notification settings - Fork 4.6k
Closed
Labels
bugBugs and behaviour differing from documentationBugs and behaviour differing from documentation
Description
The following code block induces a hung state in the NLP module (this is text from a real-world corpus, similar errors happen on long Java namespaces (e.g. org.apache.x.y.z).
nlp = spacy.load("en_core_web_sm")
nlp("0.1.2.3.4.5.6.7.8.9.10.11.12.13.14.15.16.17.18.19.20.21.22.23.24.25.26.27.28.29.30.31.32.33.34.35.36.37.38.39.40.41.42.43.44.45.46.47.48")
Our inspection suggests the tokenizer is thrashing, perhaps owing to a regex of exploding complexity. We've routed around the damage with a context manager that uses signal.SIGALRM to timeout if spacy takes too long, but this issue was the source of much confusion as regards seemingly simple jobs that were running for extended periods of time.
cedias
Metadata
Metadata
Assignees
Labels
bugBugs and behaviour differing from documentationBugs and behaviour differing from documentation