CBlas ABI changes #3

shadeMe · 2022-06-15T09:26:06Z

Remove use of torch.set_default_tensor_type (Remove use of torch.set_default_tensor_type explosion/thinc#674)
Remove use of torch.set_default_tensor_type

This PR removes use of torch.set_default_tensor_type. There are
various reasons why we should probably move away from using this
function:

Upstream will deprecate and remove it:
Deprecate and remove torch.set_default_tensor_type pytorch/pytorch#53124
We cannot use this mechanism for other devices than CPU/CUDA, such as
Metal Performance Shaders.
It offers little flexibility in allocating Torch models on different
devices.

This PR makes PyTorchWrapper/PyTorchShim flexible in terms of the
devices it can use. Both classes add a device argument to their
constructors that takes a torch.device instance. The shim ensures that
the model is on the given device. The wrapper ensures that input tensors
are on the correct device, by calling xp2torch with the new device
keyword argument.

Even though this approach offers more flexibility, as a default we want
to use the cpu device when NumpyOps is used and cuda:N when
CupyOps is used. In order to do so, this PR also adds a new function
get_torch_default_device that returns the correct device for the
currently active Ops. PyTorchWrapper/PyTorchShim/xp2torch use this
function when None is given as the device to fall back on this
default, mimicking the behavior from before this PR.

Add some typing fixes
Remove spurious cupy import
Small fixes

Use torch.cuda.current_device() to get the current PyTorch CUDA
device.
Do not use torch_set_default_tensor_type in set_active_gpu.

Add test_slow_gpu explosion-bot command
Auto-format code with black (Auto-format code with black explosion/thinc#682)

Co-authored-by: explosion-bot [email protected]

Azure: pin protobuf to fix Tensorflow
Extend typing_extensions to <4.2.0 (Extend typing_extensions to <4.2.0 explosion/thinc#689)
Add support for PyTorch Metal Performance Shaders (Add support for PyTorch Metal Performance Shaders explosion/thinc#685)
Add test_slow_gpu explosion-bot command
Auto-format code with black (Auto-format code with black explosion/thinc#682)

Co-authored-by: explosion-bot [email protected]

Add support for PyTorch Metal Performance Shaders

Nightly PyTorch versions add support for Metal Performance Shaders
(MPS). Metal is a low-level graphics API for Apple platforms that also
supports compute kernels (shaders). MPS is a framework of
highly-optimized compute and graphics kernels, including kernels for
neural networks. MPS is supported on both Apple Silicon, such as the M1
family of SoC, as well as a range of AMD GPUs used in Macs.

Since devices are handled in Thinc through a specific Ops
implementation (e.g. CupyOps == CUDA GPUs), this change introduces the
MPSOps class. This class is a subclass of NumpyOps or
AppleOps (when available). MPSOps does not override any methods, but
is used to signal to relevant code paths (e.g. xp2torch) that Torch
tensors should be placed on the MPS device.

The mapping in the previously introduced get_torch_default_device
function is updated to:

NumpyOps -> cpu
CupyOps -> cuda:N, where N is the selected CUDA device.
MPSOps -> mps

to ensure placement of Torch tensors on the mps device when MPSOps
is active.

Finally, the following booleans have been added to or changed in
compat:

has_torch_mps (new): PyTorch has MPS support
has_torch_mps_gpu (new): PyTorch has MPS support and an
MPS-capable GPU is available.
has_torch_cuda_gpu (new): PyTorch has CUDA support and a
CUDA-capable GPU is available.
has_torch_gpu (changed): PyTorch has a GPU available (CUDA
or MPS).

Test PyTorch wrapper with all xp ops
Azure: pin protobuf to fix Tensorflow
Extend typing_extensions to <4.2.0 (Extend typing_extensions to <4.2.0 explosion/thinc#689)
Fix type checking error
Only back-off to NumpyOps on import error

We do not want to hide other issues while importing thinc_apple_ops.

Remove unneeded has_torch_mps bool
Add has_gpu bool and use it in util
Replace another expression by has_gpu
Set has_torch_gpu to has_torch_cuda_gpu

We need to decide whether we want to make the potentially breaking
change from has_torch_cuda_gpu to has_torch_cuda_gpu or has_torch_mps_gpu. But since the latter is not needed for this PR,
remove the change.

Update thinc/util.py

Co-authored-by: Sofie Van Landeghem [email protected]

Co-authored-by: shademe [email protected]
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: explosion-bot [email protected]
Co-authored-by: Adriane Boyd [email protected]
Co-authored-by: Sofie Van Landeghem [email protected]

* Remove use of `torch.set_default_tensor_type` (explosion#674) * Remove use of `torch.set_default_tensor_type` This PR removes use of `torch.set_default_tensor_type`. There are various reasons why we should probably move away from using this function: - Upstream will deprecate and remove it: pytorch/pytorch#53124 - We cannot use this mechanism for other devices than CPU/CUDA, such as Metal Performance Shaders. - It offers little flexibility in allocating Torch models on different devices. This PR makes `PyTorchWrapper`/`PyTorchShim` flexible in terms of the devices it can use. Both classes add a `device` argument to their constructors that takes a `torch.device` instance. The shim ensures that the model is on the given device. The wrapper ensures that input tensors are on the correct device, by calling `xp2torch` with the new `device` keyword argument. Even though this approach offers more flexibility, as a default we want to use the `cpu` device when `NumpyOps` is used and `cuda:N` when CupyOps is used. In order to do so, this PR also adds a new function `get_torch_default_device` that returns the correct device for the currently active Ops. `PyTorchWrapper`/`PyTorchShim`/`xp2torch` use this function when `None` is given as the device to fall back on this default, mimicking the behavior from before this PR. * Add some typing fixes * Remove spurious cupy import * Small fixes - Use `torch.cuda.current_device()` to get the current PyTorch CUDA device. - Do not use `torch_set_default_tensor_type` in `set_active_gpu`. * Add `test_slow_gpu` explosion-bot command * Auto-format code with black (explosion#682) Co-authored-by: explosion-bot <[email protected]> * Azure: pin protobuf to fix Tensorflow * Extend typing_extensions to <4.2.0 (explosion#689) * Add support for PyTorch Metal Performance Shaders (explosion#685) * Add `test_slow_gpu` explosion-bot command * Auto-format code with black (explosion#682) Co-authored-by: explosion-bot <[email protected]> * Add support for PyTorch Metal Performance Shaders Nightly PyTorch versions add support for Metal Performance Shaders (MPS). Metal is a low-level graphics API for Apple platforms that also supports compute kernels (shaders). MPS is a framework of highly-optimized compute and graphics kernels, including kernels for neural networks. MPS is supported on both Apple Silicon, such as the M1 family of SoC, as well as a range of AMD GPUs used in Macs. Since devices are handled in Thinc through a specific `Ops` implementation (e.g. `CupyOps` == CUDA GPUs), this change introduces the `MPSOps` class. This class is a subclass of `NumpyOps` or `AppleOps` (when available). `MPSOps` does not override any methods, but is used to signal to relevant code paths (e.g. `xp2torch`) that Torch tensors should be placed on the MPS device. The mapping in the previously introduced `get_torch_default_device` function is updated to: - `NumpyOps` -> `cpu` - `CupyOps` -> `cuda:N`, where N is the selected CUDA device. - `MPSOps` -> `mps` to ensure placement of Torch tensors on the `mps` device when `MPSOps` is active. Finally, the following booleans have been added to or changed in `compat`: - `has_torch_mps` (new): PyTorch has MPS support - `has_torch_mps_gpu` (new): PyTorch has MPS support and an MPS-capable GPU is available. - `has_torch_cuda_gpu` (new): PyTorch has CUDA support and a CUDA-capable GPU is available. - `has_torch_gpu` (changed): PyTorch has a GPU available (CUDA or MPS). * Test PyTorch wrapper with all xp ops * Azure: pin protobuf to fix Tensorflow * Extend typing_extensions to <4.2.0 (explosion#689) * Fix type checking error * Only back-off to NumpyOps on import error We do not want to hide other issues while importing thinc_apple_ops. * Remove unneeded `has_torch_mps` bool * Add `has_gpu` bool and use it in `util` * Replace another expression by has_gpu * Set `has_torch_gpu` to `has_torch_cuda_gpu` We need to decide whether we want to make the potentially breaking change from `has_torch_cuda_gpu` to `has_torch_cuda_gpu or has_torch_mps_gpu`. But since the latter is not needed for this PR, remove the change. * Update thinc/util.py Co-authored-by: Sofie Van Landeghem <[email protected]> Co-authored-by: shademe <[email protected]> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: explosion-bot <[email protected]> Co-authored-by: Adriane Boyd <[email protected]> Co-authored-by: Sofie Van Landeghem <[email protected]> Co-authored-by: shademe <[email protected]> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: explosion-bot <[email protected]> Co-authored-by: Adriane Boyd <[email protected]> Co-authored-by: Sofie Van Landeghem <[email protected]>

shadeMe and others added 3 commits June 14, 2022 15:09

NumpyOps: Do not use global for CBlas (explosion#697)

8d9405f

C++ experiment to avoid ABI changes

79143e0

shadeMe merged commit 19f406e into shadeMe:cblas-abi Jun 15, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CBlas ABI changes #3

CBlas ABI changes #3

Uh oh!

shadeMe commented Jun 15, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

CBlas ABI changes #3

CBlas ABI changes #3

Uh oh!

Conversation

shadeMe commented Jun 15, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants