Skip to content

Conversation

@rmitsch
Copy link
Contributor

@rmitsch rmitsch commented May 11, 2022

Goals

Introduce enable argument to spacy.load(), behaving analogously to and disable.

Description

enable takes precedence over their negative counterparts. I.e.: if enable is set, disable has to be consistent. Otherwise an error is raised. Internally, enable is resolved to disable by disabling those components in the pipeline that are not part of include.

Example:

spacy.load("en_core_web_sm", enable=["tagger", "parser"])

This disables all components other than the tagger and the parser.

Types of change

New feature, new tests.

Checklist

  • I confirm that I have the right to submit this contribution under the project's MIT license.
  • I ran the tests, and all new and existing tests passed.
  • My changes don't require a change to the documentation, or if they do, I've added all required information.

@svlandeg svlandeg added enhancement Feature requests and improvements feat / pipeline Feature: Processing pipeline and components labels May 11, 2022
Copy link
Contributor

@svlandeg svlandeg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd love to hear some thoughts/reviews by other colleagues as well, but in principle I think this will be really useful. It makes the code & docs slightly more complex, but it will have a huge impact on many use-cases where you really just want one or 2 components and disable everything else. It feels like such a common use-case that we might as well cater for it...

@rmitsch rmitsch force-pushed the feat/inclusive-spacy-load-flags branch from e6acdd2 to 9ae4189 Compare May 13, 2022 15:37
@rmitsch
Copy link
Contributor Author

rmitsch commented May 16, 2022

The Python38Mac tests fail due to a config mismatch (the new enabled property seems to be missing). IMO that shouldn't be the case though, and the other jobs in the build matrix succeed...ideas? 🤔

@adrianeboyd
Copy link
Contributor

adrianeboyd commented May 16, 2022

The Python 3.8 mac build includes some additional CLI tests that aren't run in the other builds (just to avoid excessive downloads/time), so these failures do matter, and aren't particular to python 3.8 / macos.

I didn't realize that the plan was to add this setting to the config? I thought this was just adding new keyword arguments to spacy.load?

Edited to add: I don't think it makes sense to have two conflicting features in the config like disable and include. Someone writing a config can figure out what the right settings are with pipeline and disable. Someone loading a model with spacy.load has a good reason to want an include option that makes it easier to selectively load things.

@rmitsch
Copy link
Contributor Author

rmitsch commented May 16, 2022

The Python 3.8 mac build includes some additional CLI tests that aren't run in the other builds (just to avoid excessive downloads/time), so these failures do matter, and aren't particular to python 3.8 / macos.

I didn't realize that the plan was to add this setting to the config? I thought this was just adding new keyword arguments to spacy.load?

Edited to add: I don't think it makes sense to have two conflicting features in the config like disable and include. Someone writing a config can figure out what the right settings are with pipeline and disable. Someone loading a model with spacy.load has a good reason to want an include option that makes it easier to selectively load things.

Adding it to the config isn't necessary. It was my assumption we'd want to handle enable the same way as disable, which would mean adding it to the config. I think you have a point there, I'd be ok with removing it from the config again. What does @svlandeg think?

@svlandeg
Copy link
Contributor

Agreed with Adriane's points!

@rmitsch rmitsch marked this pull request as ready for review May 17, 2022 07:32
@rmitsch
Copy link
Contributor Author

rmitsch commented May 17, 2022

Removed enabled from the config. This is the last remaining open question from my side.

@rmitsch
Copy link
Contributor Author

rmitsch commented May 25, 2022

Removed include. Updated website docs.

@rmitsch rmitsch changed the title enable/include arguments for spacy.load() enable arguments for spacy.load() May 25, 2022
@rmitsch rmitsch changed the title enable arguments for spacy.load() enable argument for spacy.load() May 25, 2022
@svlandeg svlandeg self-requested a review May 31, 2022 22:46
@polm
Copy link
Contributor

polm commented Jun 6, 2022

I'm kind of late here, but I agree this would be a great feature to have - for short bits of code this can be clearer and more succint than disable.

Copy link
Contributor

@svlandeg svlandeg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! Just had a few more minor comments but should otherwise be good to merge :-)

@svlandeg
Copy link
Contributor

Nice work, Raphael!

@svlandeg svlandeg merged commit 4c058eb into explosion:master Jun 17, 2022
danieldk added a commit that referenced this pull request Jun 27, 2022
* Add "Aim-spaCy" to spaCy Universe (#10943)

* Add Aim-spaCy to spaCy universe

* Update Aim thumbnail

* Fix author links

Co-authored-by: Paul O'Leary McCann <[email protected]>

* Auto-format code with black (#10945)

Co-authored-by: explosion-bot <[email protected]>

* precomputable_biaffine: avoid concatenation (#10911)

The `forward` of `precomputable_biaffine` performs matrix multiplication
and then `vstack`s the result with padding. This creates a temporary
array used for the output of matrix concatenation.

This change avoids the temporary by pre-allocating an array that is
large enough for the output of matrix multiplication plus padding and
fills the array in-place.

This gave me a small speedup (a bit over 100 WPS) on de_core_news_lg on
M1 Max (after changing thinc-apple-ops to support in-place gemm as BLIS
does).

* Add failing test: `test_matcher_extension_in_set_predicate` (#10948)

* vectors: remove use of float as row number (#10955)

The float -1 was returned rather than the integer -1 as the row for
unknown keys. This doesn't introduce a realy bug, since such floats
cast (without issues) to int in the conversion to NumPy arrays. Still,
it's nice to to do the correct thing :).

* Update for CBlas changes in Thinc 8.1.0.dev2 (#10970)

* Workaround for Typer optional default values with Python calls (#10788)

* Workaround for Typer optional default values with Python calls: added test and workaround.

* @rmitsch Workaround for Typer optional default values with Python calls: reverting some black formatting changes.

Co-authored-by: Sofie Van Landeghem <[email protected]>

* @rmitsch Workaround for Typer optional default values with Python calls: removing return type hint.

Co-authored-by: Sofie Van Landeghem <[email protected]>

* Workaround for Typer optional default values with Python calls: fixed imports, added GitHub issue marker.

* Workaround for Typer optional default values with Python calls: removed forcing of default values for optional arguments in init_config_cli(). Added default values for init_config(). Synchronized default values for init_config_cli() and init_config().

* Workaround for Typer optional default values with Python calls: removed unused import.

* Workaround for Typer optional default values with Python calls: fixed usage of optimize in init_config_cli().

* Workaround for Typer optional default values with Pythhon calls: remove output_file from InitDefaultValues.

* Workaround for Typer optional default values with Python calls: rename class for default init values.

* Workaround for Typer optional default values with Python calls: remove newline.

* remove introduced newlines

* Remove test_init_config_from_python_without_optional_args().

* remove leftover import

* reformat import

* remove duplicate

Co-authored-by: Sofie Van Landeghem <[email protected]>

* Made _initialize_X() methods private. (#10978)

* Auto-format code with black (#10977)

Co-authored-by: explosion-bot <[email protected]>

* account for NER labels with a hyphen in the name (#10960)

* account for NER labels with a hyphen in the name

* cleanup

* fix docstring

* add return type to helper method

* shorter method and few more occurrences

* user helper method across repo

* fix circular import

* partial revert to avoid circular import

* `enable` argument for spacy.load() (#10784)

* Enable flag on spacy.load: foundation for include, enable arguments.

* Enable flag on spacy.load: fixed tests.

* Enable flag on spacy.load: switched from pretrained model to empty model with added pipes for tests.

* Enable flag on spacy.load: switched to more consistent error on misspecification of component activity. Test refactoring. Added  to default config.

* Enable flag on spacy.load: added support for fields not in pipeline.

* Enable flag on spacy.load: removed serialization fields from supported fields.

* Enable flag on spacy.load: removed 'enable' from config again.

* Enable flag on spacy.load: relaxed checks in _resolve_component_activation_status() to allow non-standard pipes.

* Enable flag on spacy.load: fixed relaxed checks for _resolve_component_activation_status() to allow non-standard pipes. Extended tests.

* Enable flag on spacy.load: comments w.r.t. resolution workarounds.

* Enable flag on spacy.load: remove include fields. Update website docs.

* Enable flag on spacy.load: updates w.r.t. changes in master.

* Implement Doc.from_json(): update docstrings.

Co-authored-by: Adriane Boyd <[email protected]>

* Implement Doc.from_json(): remove newline.

Co-authored-by: Adriane Boyd <[email protected]>

* Implement Doc.from_json(): change error message for E1038.

Co-authored-by: Adriane Boyd <[email protected]>

* Enable flag on spacy.load: wrapped docstring for _resolve_component_status() at 80 chars.

* Enable flag on spacy.load: changed exmples for enable flag.

* Remove newline.

Co-authored-by: Sofie Van Landeghem <[email protected]>

* Fix docstring for Language._resolve_component_status().

* Rename E1038 to E1042.

Co-authored-by: Adriane Boyd <[email protected]>
Co-authored-by: Sofie Van Landeghem <[email protected]>

* add counts to verbose list of NER labels (#10957)

* Update linguistic-features.md (#10993)

Change link for downloading fasttext word vectors

* Use thinc-apple-ops>=0.1.0.dev0 with `apple` extras (#10904)

* Use thinc-apple-ops>=0.1.0.dev0 with `apple` extras

Also test with thinc-apple-ops that is at least 0.1.0.dev0.

* Check thinc-apple-ops on macOS with Python 3.10

Co-authored-by: Adriane Boyd <[email protected]>

* Use `pip install --pre` for installing thinc-apple-ops in CI

Co-authored-by: Adriane Boyd <[email protected]>

Co-authored-by: Gor Arakelyan <[email protected]>
Co-authored-by: Paul O'Leary McCann <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: explosion-bot <[email protected]>
Co-authored-by: Madeesh Kannan <[email protected]>
Co-authored-by: Raphael Mitsch <[email protected]>
Co-authored-by: Sofie Van Landeghem <[email protected]>
Co-authored-by: Adriane Boyd <[email protected]>
Co-authored-by: Victoria <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement Feature requests and improvements feat / pipeline Feature: Processing pipeline and components

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants