Commit 4149618
* Update `TextCatBOW` to use the fixed `SparseLinear` layer
A while ago, we fixed the `SparseLinear` layer to use all available
parameters: explosion/thinc#754
This change updates `TextCatBOW` to `v3` which uses the new
`SparseLinear_v2` layer. This results in a sizeable improvement on a
text categorization task that was tested.
While at it, this `spacy.TextCatBOW.v3` also adds the `length_exponent`
option to make it possible to change the hidden size. Ideally, we'd just
have an option called `length`. But the way that `TextCatBOW` uses
hashes results in a non-uniform distribution of parameters when the
length is not a power of two.
* Replace TexCatBOW `length_exponent` parameter by `length`
We now round up the length to the next power of two if it isn't
a power of two.
* Remove some tests for TextCatBOW.v2
* Fix missing import
1 parent ad4da43 commit 4149618
File tree
3 files changed
+44
-7
lines changed- spacy
- tests/pipeline
- website/docs/api
3 files changed
+44
-7
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
974 | 974 | | |
975 | 975 | | |
976 | 976 | | |
977 | | - | |
978 | | - | |
979 | | - | |
980 | 977 | | |
981 | 978 | | |
982 | 979 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
499 | 499 | | |
500 | 500 | | |
501 | 501 | | |
502 | | - | |
503 | | - | |
504 | | - | |
| 502 | + | |
| 503 | + | |
| 504 | + | |
505 | 505 | | |
506 | 506 | | |
507 | 507 | | |
| |||
749 | 749 | | |
750 | 750 | | |
751 | 751 | | |
752 | | - | |
| 752 | + | |
753 | 753 | | |
754 | 754 | | |
755 | 755 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1020 | 1020 | | |
1021 | 1021 | | |
1022 | 1022 | | |
| 1023 | + | |
| 1024 | + | |
| 1025 | + | |
| 1026 | + | |
| 1027 | + | |
| 1028 | + | |
| 1029 | + | |
| 1030 | + | |
| 1031 | + | |
| 1032 | + | |
| 1033 | + | |
| 1034 | + | |
| 1035 | + | |
| 1036 | + | |
| 1037 | + | |
| 1038 | + | |
| 1039 | + | |
| 1040 | + | |
| 1041 | + | |
| 1042 | + | |
| 1043 | + | |
| 1044 | + | |
| 1045 | + | |
| 1046 | + | |
| 1047 | + | |
| 1048 | + | |
| 1049 | + | |
| 1050 | + | |
| 1051 | + | |
| 1052 | + | |
| 1053 | + | |
| 1054 | + | |
| 1055 | + | |
| 1056 | + | |
| 1057 | + | |
| 1058 | + | |
| 1059 | + | |
| 1060 | + | |
| 1061 | + | |
| 1062 | + | |
1023 | 1063 | | |
1024 | 1064 | | |
1025 | 1065 | | |
| |||
0 commit comments