Skip to content

Conversation

@danieldk
Copy link
Contributor

@danieldk danieldk commented Mar 30, 2022

This PR makes most CPU kernels generic, so that they can take both float32 and float64 arrays (and hopefully in the future float16). I experimented with kernels in Cython + fused types and kernels as C++ with templates, I found the C++ template route more promising:

  • More compact/ergonomic implementations with fewer compile-time conditionals.
  • Opens up the possibility to easily use SIMD intrinsics in the future.

To allow genericity in the NumpyOps method arguments, we use:

  • Fused types when we require a specific dimensionality;
  • np.ndarray otherwise.

Some of the kernels are not (yet) made generic:

  • cpu_scatter_add: needs tests to verify that the op still works
    correctly.
  • cpu_position_encode: the position_encode op doesn't take float
    array(s).
  • lstm kernels: I need to look more deeply into them.

This PR makes most CPU kernels generic, so that they can take both
float32 and float64 arrays (and hopefully in the future float16). I
experimented with kernels in Cython + fused types and kernels as C++
with templates, I found the C++ template route more promising:

- More compact/ergonomic implementations with fewer compile-time
  conditionals.
- Opens up the possibility to easily use SIMD intrinsics in the
  future.

To allow genericity in the NumpyOps method arguments, we use:

- Fused types when we require a specific dimensionality;
- np.ndarray otherwise.

Some of the kernels are not made generic:

- cpu_scatter_add: needs tests to verify that the op still works
  correctly.
- cpu_position_encode: the position_encode op doesn't take float
  array(s).
- lstm kernels: I need to look more deeply into them.
@svlandeg svlandeg added enhancement Feature requests and improvements feat / ops Backends and maths labels Mar 31, 2022
@danieldk danieldk marked this pull request as ready for review March 31, 2022 11:35
@danieldk danieldk force-pushed the numpy-generic-kernels branch from e9fb986 to f5feb1c Compare April 5, 2022 18:38
@danieldk danieldk requested a review from svlandeg April 6, 2022 12:05
@svlandeg
Copy link
Contributor

svlandeg commented Apr 6, 2022

Ugh - do we have a flaky test?

@danieldk
Copy link
Contributor Author

danieldk commented Apr 6, 2022

Ugh - do we have a flaky test?

Oh no, again the dreadful Unreliable test timings! On an initial run, this test took XYZms, which exceeded the deadline of 200.00ms.

@svlandeg
Copy link
Contributor

svlandeg commented Apr 6, 2022

Rerunning it 😐

@svlandeg svlandeg merged commit 4e7b2bb into explosion:develop Apr 7, 2022
@danieldk danieldk added the performance Speed and memory use label Mar 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement Feature requests and improvements feat / ops Backends and maths performance Speed and memory use

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants