Skip to content
Merged
Show file tree
Hide file tree
Changes from 10 commits
Commits
Show all changes
79 commits
Select commit Hold shift + click to select a range
fd33e07
Fixed typing problems mostly in with... methods
richardpaulhudson Feb 22, 2022
fa5a6bb
Sorted out flatten and unflatten
richardpaulhudson Feb 23, 2022
66470fb
Iterable and Concatenatable types
richardpaulhudson Feb 23, 2022
e630ca1
Corrections
richardpaulhudson Feb 23, 2022
abf1d99
More corrections
richardpaulhudson Feb 23, 2022
a60b94f
Merge branch 'explosion:master' into feature/mypy-fixes
richardpaulhudson Feb 23, 2022
83161aa
Corrections to layers
richardpaulhudson Feb 24, 2022
62c833c
Moved type definitions from types to ops
richardpaulhudson Feb 24, 2022
b01117e
Updated documentation
richardpaulhudson Feb 24, 2022
192d3cb
Simplified ops type declarations
richardpaulhudson Feb 24, 2022
d02fe36
Fixed mypy backwards compatibility issue
richardpaulhudson Feb 24, 2022
cf28895
Correct type-ignore comment
richardpaulhudson Feb 24, 2022
b144a17
Updated Mypy version in azure-pipelines
richardpaulhudson Feb 24, 2022
d269e00
Added CI checks with Python 3.7
richardpaulhudson Feb 24, 2022
aa55834
Any as first parameter of with_... layers
richardpaulhudson Feb 25, 2022
c5805aa
Revert "Any as first parameter of with_... layers"
richardpaulhudson Feb 25, 2022
8340792
Tidied up init methods
richardpaulhudson Feb 25, 2022
cc829d0
Removed unnecessary imports
richardpaulhudson Mar 1, 2022
36cc1ba
Put import statement on one line
richardpaulhudson Mar 1, 2022
851edd0
Changes based on PR review comments
richardpaulhudson Mar 1, 2022
3df418c
Improvements after PR feedback
richardpaulhudson Mar 2, 2022
0f0089b
Went through ignore statements in layers
richardpaulhudson Mar 2, 2022
ac4e14b
Removed unnecessary covariance
richardpaulhudson Mar 2, 2022
f3afd69
Merge branch 'master' into feature/mypy-fixes
richardpaulhudson Mar 4, 2022
1706fac
Improvements based on PR review
richardpaulhudson Mar 4, 2022
ffbab02
Remove Python 3.7 additions
richardpaulhudson Mar 14, 2022
ca29e5e
Reverted lstm_tagger.py changes
richardpaulhudson Mar 14, 2022
c1a90f7
Added ArrayTXd_co
richardpaulhudson Mar 14, 2022
e38d26f
Final changes before review
richardpaulhudson Mar 14, 2022
6048dcc
Cast in main rather than in type-specific forward methods
richardpaulhudson Mar 15, 2022
14f65dc
Added empty line
richardpaulhudson Mar 15, 2022
de789e6
Merge branch 'master' of https://github.com/explosion/thinc into feat…
richardpaulhudson Mar 15, 2022
9617d33
Corrections
richardpaulhudson Mar 15, 2022
19d749a
More corrections
richardpaulhudson Mar 15, 2022
a436794
Corrections
richardpaulhudson Mar 16, 2022
6b50d3e
Merge branch 'explosion:master' into feature/mypy-fixes
richardpaulhudson Mar 16, 2022
9e32595
Returned to ListXd types
richardpaulhudson Mar 16, 2022
2232a9f
Merge branch 'feature/mypy-fixes' of https://github.com/richardpaulhu…
richardpaulhudson Mar 16, 2022
069cd8b
More corrections
richardpaulhudson Mar 16, 2022
7bb4af0
Further corrections
richardpaulhudson Mar 16, 2022
0c657b9
Corrected model typing
richardpaulhudson Mar 16, 2022
bb25884
Further corrections
richardpaulhudson Mar 16, 2022
8f26c8b
Corrections
richardpaulhudson Mar 17, 2022
7d96a55
Tidying up
richardpaulhudson Mar 17, 2022
0496371
Corrections
richardpaulhudson Mar 17, 2022
2e62b7f
Removed line
richardpaulhudson Mar 17, 2022
a1595d1
Made imports clearer
richardpaulhudson Mar 17, 2022
8bf132d
Readded line
richardpaulhudson Mar 17, 2022
ba2cc5c
Reformatted
richardpaulhudson Mar 17, 2022
cb6c274
Readded line
richardpaulhudson Mar 17, 2022
4a75d94
Corrected residual.py
richardpaulhudson Mar 19, 2022
e6eb87b
Changed imports back to original order
richardpaulhudson Mar 19, 2022
d6321c7
Changes in response to review comments
richardpaulhudson Apr 4, 2022
1329754
Update thinc/layers/dropout.py
richardpaulhudson Apr 6, 2022
67badb6
Update thinc/layers/embed.py
richardpaulhudson Apr 6, 2022
3efabfd
Changes responding to Github review
richardpaulhudson Apr 6, 2022
aa63aac
Reversed changes to init() return types
richardpaulhudson Apr 6, 2022
3e7b52e
Reversed changes to init() return types
richardpaulhudson Apr 6, 2022
91b5519
Corrected embed.py and hashembed.py
richardpaulhudson Apr 7, 2022
32335c8
Corrections based on Github review
richardpaulhudson Apr 7, 2022
2d77334
Fixed chain.py
richardpaulhudson Apr 7, 2022
0415230
Merge branch 'master' into feature/mypy-fixes
richardpaulhudson Apr 7, 2022
47d5f9b
Further correction to chain.py
richardpaulhudson Apr 7, 2022
0d9f9a8
Removed unnecessary cast
richardpaulhudson Apr 19, 2022
e7d59d5
Updated documentation
richardpaulhudson Apr 19, 2022
effe597
Changes based on review
richardpaulhudson Apr 19, 2022
c9f6c4e
Added @overload signatures in ops
richardpaulhudson Apr 19, 2022
4a4aba2
Added comment
richardpaulhudson Apr 19, 2022
76e4095
Merge branch 'explosion:master' into feature/mypy-fixes
richardpaulhudson May 2, 2022
75a7d35
Changes based on review comments
richardpaulhudson May 2, 2022
92a9b57
Final corrections
richardpaulhudson May 2, 2022
292ac5c
Bumped mypy version
richardpaulhudson May 2, 2022
0efb966
Changes based on review comments
richardpaulhudson May 3, 2022
dbc4ec9
Added space to trigger CI
richardpaulhudson May 3, 2022
6fde743
Merge branch 'develop' into feature/mypy-fixes
richardpaulhudson May 3, 2022
fa5f2cd
Corrected Pydantic version ranges
richardpaulhudson May 4, 2022
1757c9a
Fixed mypy version range
richardpaulhudson May 5, 2022
e09bf57
Merge remote-tracking branch 'expl/develop' into feature/mypy-fixes
richardpaulhudson May 9, 2022
8b46e2a
Correct documentation for clone
richardpaulhudson May 9, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions examples/benchmarks/lstm_tagger.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,15 +11,15 @@

So PyTorch is 3x faster currently.
"""
from typing import List
from typing import List, cast
import typer
import tqdm
import numpy.random
from timeit import default_timer as timer
from thinc.api import Model, Config, registry, chain, list2padded, with_array
from thinc.api import to_categorical, set_current_ops
from thinc.api import NumpyOps, CupyOps, fix_random_seed, require_gpu
from thinc.types import Array2d, Padded
from thinc.types import Array2d, Padded, List2d

CONFIG = """
[data]
Expand Down Expand Up @@ -59,7 +59,7 @@ def build_tagger(
embed: Model[Array2d, Array2d],
encode: Model[Padded, Padded],
predict: Model[Array2d, Array2d],
) -> Model[List[Array2d], Padded]:
) -> Model[List2d, List2d]:
model = chain(
list2padded(),
with_array(embed),
Expand Down
5 changes: 2 additions & 3 deletions requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ wasabi>=0.8.1,<1.1.0
catalogue>=2.0.4,<2.1.0
ml_datasets>=0.2.0,<0.3.0
# Third-party dependencies
pydantic>=1.7.4,!=1.8,!=1.8.1,<1.9.0
pydantic>=1.7.4,!=1.8,!=1.8.1,<1.10.0
numpy>=1.15.0
# Backports of modern Python features
dataclasses>=0.6,<1.0; python_version < "3.7"
Expand All @@ -22,8 +22,7 @@ pytest-cov>=2.7.0,<2.8.0
coverage>=5.0.0,<6.0.0
mock>=2.0.0,<3.0.0
flake8>=3.5.0,<3.6.0
# restricting mypy until faster 3.10 wheels are available
mypy>=0.901,<0.920; python_version < "3.10"
mypy>=0.901,<=0.931
types-mock>=0.1.1
types-contextvars>=0.1.2; python_version < "3.7"
types-dataclasses>=0.1.3; python_version < "3.7"
Expand Down
2 changes: 1 addition & 1 deletion setup.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ install_requires =
# Third-party dependencies
setuptools
numpy>=1.15.0
pydantic>=1.7.4,!=1.8,!=1.8.1,<1.9.0
pydantic>=1.7.4,!=1.8,!=1.8.1,<1.10.0
# Backports of modern Python features
dataclasses>=0.6,<1.0; python_version < "3.7"
typing_extensions>=3.7.4.1,<4.0.0.0; python_version < "3.8"
Expand Down
109 changes: 72 additions & 37 deletions thinc/backends/ops.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,15 +5,47 @@
import numpy
import itertools

from .. import registry
from ..types import Xp, Shape, DTypes, DTypesInt, DTypesFloat, List2d, ArrayXd
from ..types import Array3d, Floats1d, Floats2d, Floats3d, Floats4d
from ..types import FloatsXd, Ints1d, Ints2d, Ints3d, Ints4d, IntsXd, _Floats
from ..types import (
Array1d,
Array2d,
Array3d,
Array4d,
ArrayXd,
)
from ..types import Floats1d, Floats2d, Floats3d, Floats4d, FloatsXd, _Floats
from ..types import Ints1d, Ints2d, Ints3d, Ints4d, IntsXd
from ..types import Xp, Shape, DTypes, DTypesInt, DTypesFloat
from ..types import DeviceTypes, Generator, Padded, Batchable, SizedGenerator
from ..util import get_array_module, is_xp_array, to_numpy


ArrayT2d = TypeVar("ArrayT2d", bound=Union[Floats2d, Ints2d, Array2d])
ArrayT2d_co = TypeVar(
"ArrayT2d_co", bound=Union[Floats2d, Ints2d, Array2d], covariant=True
)
ArrayT = TypeVar("ArrayT", bound=ArrayXd)
ArrayTXd = TypeVar(
"ArrayTXd",
bound=ArrayXd,
)
ArrayTXd_co = TypeVar(
"ArrayTXd_co",
bound=ArrayXd,
covariant=True,
)
ArrayTXNotMind = TypeVar(
"ArrayTXNotMind", bound=Union[Floats2d, Floats3d, Floats4d, Ints2d, Ints3d, Ints4d]
)
ArrayTXNotMaxd = TypeVar(
"ArrayTXNotMaxd", bound=Union[Floats1d, Floats2d, Floats3d, Ints1d, Ints2d, Ints3d]
)
ArrayTXNotMaxd_co = TypeVar(
"ArrayTXNotMaxd_co",
bound=Union[Floats1d, Floats2d, Floats3d, Ints1d, Ints2d, Ints3d],
covariant=True,
)


FloatsT = TypeVar("FloatsT", bound=_Floats)
FloatsType = TypeVar("FloatsType", bound=FloatsXd)
SQRT2PI = math.sqrt(2.0 / math.pi)
Expand Down Expand Up @@ -227,11 +259,11 @@ def affine(self, X: Floats2d, W: Floats2d, b: Floats1d) -> Floats2d:

def flatten(
self,
X: Sequence[ArrayT],
X: List[ArrayTXd_co],
dtype: Optional[DTypes] = None,
pad: int = 0,
ndim_if_empty: int = 2,
) -> ArrayT:
) -> ArrayTXd_co:
"""Flatten a list of arrays into one large array."""
if X is None or len(X) == 0:
return self.alloc((0,) * ndim_if_empty, dtype=dtype or "f")
Expand All @@ -252,7 +284,7 @@ def flatten(
result = xp.asarray(result, dtype=dtype)
return result

def unflatten(self, X: Floats2d, lengths: Ints1d, pad: int = 0) -> List[Floats2d]:
def unflatten(self, X: ArrayTXd, lengths: Ints1d, pad: int = 0) -> List[ArrayTXd]:
"""The reverse/backward operation of the `flatten` function: unflatten
a large array into a list of arrays according to the given lengths.
"""
Expand All @@ -261,31 +293,20 @@ def unflatten(self, X: Floats2d, lengths: Ints1d, pad: int = 0) -> List[Floats2d
for length in lengths:
length = int(length)
if pad >= 1 and length != 0:
X = X[pad:]
X = X[pad:] # type: ignore[assignment]
unflat.append(X[:length])
X = X[length:]
X = X[length:] # type: ignore[assignment]
if pad >= 1:
X = X[pad:]
X = X[pad:] # type: ignore[assignment]
assert len(X) == 0
assert len(unflat) == len(lengths)
return unflat

@overload
def pad(self, seqs: List[Ints2d], round_to=1) -> Ints3d:
...
return cast(List[ArrayTXd], unflat)

@overload # noqa: F811
def pad(self, seqs: List[Floats2d], round_to=1) -> Floats3d:
...

def pad( # noqa: F811
self, seqs: Union[List[Ints2d], List[Floats2d]], round_to=1
) -> Array3d:
def pad(self, seqs: List[ArrayTXNotMaxd_co], round_to=1) -> ArrayTXNotMind:
"""Perform padding on a list of arrays so that they each have the same
length, by taking the maximum dimension across each axis. This only
works on non-empty sequences with the same `ndim` and `dtype`.
"""
# TODO: This should be generalized to handle different ranks
if not seqs:
raise ValueError("Cannot pad empty sequence")
if len(set(seq.ndim for seq in seqs)) != 1:
Expand All @@ -300,29 +321,31 @@ def pad( # noqa: F811
# array sizes.
length = (length + (round_to - 1)) // round_to * round_to
final_shape = (len(seqs), length) + seqs[0].shape[1:]
output: Array3d = self.alloc(final_shape, dtype=seqs[0].dtype)
output: ArrayTXNotMind = self.alloc(final_shape, dtype=seqs[0].dtype)
for i, arr in enumerate(seqs):
# It's difficult to convince this that the dtypes will match.
output[i, : arr.shape[0]] = arr # type: ignore
return output

def unpad(self, padded: Array3d, lengths: List[int]) -> List2d:
def unpad(
self, padded: ArrayTXNotMind, lengths: List[int]
) -> List[ArrayTXNotMaxd_co]:
"""The reverse/backward operation of the `pad` function: transform an
array back into a list of arrays, each with their original length.
"""
output = []
for i, length in enumerate(lengths):
output.append(padded[i, :length])
return cast(List2d, output)
return cast(List[ArrayTXNotMaxd_co], output)

def list2padded(self, seqs: List[Floats2d]) -> Padded:
def list2padded(self, seqs: List[ArrayT2d_co]) -> Padded:
"""Pack a sequence of 2d arrays into a Padded datatype."""
if not seqs:
return Padded(
self.alloc3f(0, 0, 0), self.alloc1i(0), self.alloc1i(0), self.alloc1i(0)
)
elif len(seqs) == 1:
data = self.reshape3f(seqs[0], seqs[0].shape[0], 1, seqs[0].shape[1])
data = self.reshape3(seqs[0], seqs[0].shape[0], 1, seqs[0].shape[1])
size_at_t = self.asarray1i([1] * data.shape[0])
lengths = self.asarray1i([data.shape[0]])
indices = self.asarray1i([0])
Expand All @@ -338,7 +361,7 @@ def list2padded(self, seqs: List[Floats2d]) -> Padded:
# direction: you're swapping elements between their original and sorted
# position.
seqs = [seqs[i] for i in indices_]
arr: Floats3d = self.pad(seqs)
arr: Array3d = cast(Array3d, self.pad(seqs))
assert arr.shape == (nB, nS, nO), (nB, nS, nO)
arr = self.as_contig(arr.transpose((1, 0, 2)))
assert arr.shape == (nS, nB, nO)
Expand All @@ -351,23 +374,23 @@ def list2padded(self, seqs: List[Floats2d]) -> Padded:
batch_size_at_t_[t] = current_size
assert sum(lengths_) == sum(batch_size_at_t_)
return Padded(
cast(Floats3d, arr),
arr,
self.asarray1i(batch_size_at_t_),
self.asarray1i(lengths_),
self.asarray1i(indices_),
)

def padded2list(self, padded: Padded) -> List2d:
def padded2list(self, padded: Padded) -> List[Array2d]:
"""Unpack a Padded datatype to a list of 2-dimensional arrays."""
data = padded.data
indices = to_numpy(padded.indices)
lengths = to_numpy(padded.lengths)
unpadded: List[Optional[Floats2d]] = [None] * len(lengths)
unpadded: List[Optional[Array2d]] = [None] * len(lengths)
# Transpose from (length, batch, data) to (batch, length, data)
data = self.as_contig(data.transpose((1, 0, 2)))
for i in range(data.shape[0]):
unpadded[indices[i]] = data[i, : int(lengths[i])]
return cast(List2d, unpadded)
return cast(List[Array2d], unpadded)

def get_dropout_mask(self, shape: Shape, drop: Optional[float]) -> FloatsXd:
"""Create a random mask for applying dropout, with a certain percent of
Expand Down Expand Up @@ -445,6 +468,18 @@ def alloc(self, shape: Shape, *, dtype: Optional[DTypes] = "float32") -> ArrayT:
shape = (shape,)
return self.xp.zeros(shape, dtype=dtype)

def reshape1(self, array: ArrayXd, d0: int) -> Array1d:
return cast(Array1d, self.reshape(array, (d0,)))

def reshape2(self, array: ArrayXd, d0: int, d1: int) -> Array2d:
return cast(Array2d, self.reshape(array, (d0, d1)))

def reshape3(self, array: ArrayXd, d0: int, d1: int, d2: int) -> Array3d:
return cast(Array3d, self.reshape(array, (d0, d1, d2)))

def reshape4(self, array: ArrayXd, d0: int, d1: int, d2: int, d3: int) -> Array4d:
return cast(Array4d, self.reshape(array, (d0, d1, d2, d3)))

def reshape1f(self, array: FloatsXd, d0: int) -> Floats1d:
return cast(Floats1d, self.reshape(array, (d0,)))

Expand Down Expand Up @@ -603,7 +638,7 @@ def dtanh(self, Y: FloatsT, *, inplace: bool = False) -> FloatsT:
Y += 1.0
return Y
else:
return 1 - Y ** 2
return 1 - Y**2

def softmax(
self,
Expand Down Expand Up @@ -859,7 +894,7 @@ def gelu_approx(self, X: FloatsType, inplace: bool = False) -> FloatsType:
Y = self.xp.zeros_like(X)
Y += tmp
Y *= X
return cast(FloatsType, Y)
return Y

def backprop_gelu_approx(
self, dY: FloatsType, X: FloatsType, inplace: bool = False
Expand Down Expand Up @@ -924,7 +959,7 @@ def backprop_mish(
delta = xp.exp(Xsub) + 1.0
delta *= delta
delta += 1.0
dXsub = dYsub * ((xp.exp(Xsub) * omega) / (delta ** 2))
dXsub = dYsub * ((xp.exp(Xsub) * omega) / (delta**2))
# Gradient when above threshold will ignore softplus.
if inplace:
out = dY
Expand Down Expand Up @@ -1368,7 +1403,7 @@ def dsigmoid(Y: ArrayT) -> ArrayT:


def dtanh(Y: ArrayT) -> ArrayT:
return 1 - Y ** 2
return 1 - Y**2


def gaussian_cdf(ops: Ops, X: FloatsType) -> FloatsType:
Expand Down
11 changes: 6 additions & 5 deletions thinc/config.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
from typing import Union, Dict, Any, Optional, List, Tuple, Callable, Type
from typing import Union, Dict, Any, Optional, List, Tuple, Callable, Type, Mapping
from typing import Iterable, Sequence, cast
from types import GeneratorType
from dataclasses import dataclass
Expand Down Expand Up @@ -550,7 +550,7 @@ def __init__(
self,
*,
config: Optional[Union[Config, Dict[str, Dict[str, Any]], str]] = None,
errors: Iterable[Dict[str, Any]] = tuple(),
errors: Union[Sequence[Mapping[str, Any]], Iterable[Dict[str, Any]]] = tuple(),
title: Optional[str] = "Config validation error",
desc: Optional[str] = None,
parent: Optional[str] = None,
Expand All @@ -560,9 +560,10 @@ def __init__(

config (Union[Config, Dict[str, Dict[str, Any]], str]): The
config the validation error refers to.
errors (Iterable[Dict[str, Any]]): A list of errors as dicts with keys
"loc" (list of strings describing the path of the value), "msg"
(validation message to show) and optional "type" (mostly internals).
errors (Union[Sequence[Mapping[str, Any]], Iterable[Dict[str, Any]]]):
A list of errors as dicts with keys "loc" (list of strings
describing the path of the value), "msg" (validation message
to show) and optional "type" (mostly internals).
Same format as produced by pydantic's validation error (e.errors()).
title (str): The error title.
desc (str): Optional error description, displayed below the title.
Expand Down
2 changes: 1 addition & 1 deletion thinc/layers/add.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@


InT = TypeVar("InT", bound=Any)
OutT = TypeVar("OutT", bound=ArrayXd)
OutT = TypeVar("OutT", bound=ArrayXd, covariant=True)


@registry.layers("add.v1")
Expand Down
5 changes: 3 additions & 2 deletions thinc/layers/array_getitem.py
Original file line number Diff line number Diff line change
@@ -1,13 +1,14 @@
from typing import Union, Sequence, Tuple
from typing import Union, Sequence, Tuple, TypeVar
from ..types import ArrayXd, FloatsXd, IntsXd
from ..model import Model


AxisIndex = Union[int, slice, Sequence[int]]
Index = Union[AxisIndex, Tuple[AxisIndex, ...]]
ArrayXd_co = TypeVar("ArrayXd_co", bound=ArrayXd, covariant=True)


def array_getitem(index: Index) -> Model[ArrayXd, ArrayXd]:
def array_getitem(index: Index) -> Model[ArrayXd_co, ArrayXd_co]:
"""Index into input arrays, and return the subarrays.

index:
Expand Down
2 changes: 1 addition & 1 deletion thinc/layers/cauchysimilarity.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ def forward(
X1, X2 = X1_X2
W = cast(Floats2d, model.get_param("W"))
diff = X1 - X2
square_diff = diff ** 2
square_diff = diff**2
total = (W * square_diff).sum(axis=1) # type: ignore
sim, bp_sim = inverse(total)

Expand Down
Loading