diff --git a/sycl/doc/extensions/experimental/sycl_ext_oneapi_matrix/sycl_ext_oneapi_cuda_matrix.asciidoc b/sycl/doc/extensions/experimental/sycl_ext_oneapi_matrix/sycl_ext_oneapi_cuda_matrix.asciidoc new file mode 100644 index 0000000000000..7dfc234a5c6cf --- /dev/null +++ b/sycl/doc/extensions/experimental/sycl_ext_oneapi_matrix/sycl_ext_oneapi_cuda_matrix.asciidoc @@ -0,0 +1,95 @@ +# `sycl_ext_oneapi_matrix` extension constraints specific to the `ext_oneapi_cuda` backend. +:source-highlighter: coderay +:coderay-linenums-mode: table +:dpcpp: pass:[DPC++] + +// This section needs to be after the document title. +:doctype: book +:toc2: +:toc: left +:encoding: utf-8 +:lang: en + +:blank: pass:[ +] + +// Set the default source code type in this document to C++, +// for syntax highlighting purposes. This is needed because +// docbook uses c++ and html5 uses cpp. +:language: {basebackend@docbook:c++:cpp} + + +== Notice + +Copyright (c) 2022-2022 Intel Corporation. All rights reserved. + +NOTE: Khronos(R) is a registered trademark and SYCL(TM) and SPIR(TM) are +trademarks of The Khronos Group Inc. OpenCL(TM) is a trademark of Apple Inc. +used by permission by Khronos. + +This extension is written against the SYCL 2020 revision 6 specification. All +references below to the "core SYCL specification" or to section numbers in the +SYCL specification refer to that revision. + + +**_NOTE:_** This document describes the current design and API for the `ext_oneapi_cuda` only features matrix +extension to {dpcpp}. This is an initial experimental version to try out functionality +and performance, and **future versions of this API may change in ways that are incompatible with this experimental version**. + +## Introduction +The `ext_oneapi_cuda` backend supports `joint_matrix`, `joint_matrix_load`, `joint_matrix_store`, `joint_matrix_mad` and `joint_matrix_fill` as they are defined in the `sycl_ext_oneapi_matrix` extension. The complete set of `joint_matrix` types and shapes that are valid in the `ext_oneapi_cuda` backend are listed in this document. +This extension presents any constraints that apply specifically when using the `ext_oneapi_cuda` backend, which may not apply generally to the `sycl_ext_oneapi_matrix` extension. + +### Valid `joint_matrix` types and shapes + +The complete set of matrix data types and shapes that are supported by the `ext_oneapi_cuda` backend are represented in the following table. Tm indicates the matrix element data type held by a "multiplicand" `joint_matrix`: i.e requiring `use::a` or `use::b`. Tc indicates the matrix element data type held by an "accumulator" `joint_matrix`: i.e requiring `use::accumulator`. +-- +[.center] +|====================== +|Tm (`use::a` or `use::b`) |Tc (`use::accumulator`) |M |N |K | Minimum Compute Capability +.3+|half .3+|float +|16 |16 |16| sm_70 +|8 |32 |16| sm_70 +|32 |8 |16| sm_70 +.3+|half .3+|half +|16 |16 |16| sm_70 +|8 |32 |16| sm_70 +|32 |8 |16| sm_70 +.3+|int8_t .3+|int32_t +|16 |16 |16| sm_72 +|8 |32 |16| sm_72 +|32 |8 |16| sm_72 +.3+|uint8_t .3+|int32_t +|16 |16 |16| sm_72 +|8 |32 |16| sm_72 +|32 |8 |16| sm_72 +|precision::tf32 |float |16 |16 |8| sm_80 +.3+|bfloat16 .3+|float +|16 |16 |16 |sm_80 +|8 |32 |16 |sm_80 +|32 |8 |16 |sm_80 +|double |double |8 |8 |4 |sm_80 +|====================== +-- + +The M, N, K triple from the above table defines the complete set of matrix shapes constructible: +-- +[.center] +|====================== +|use |NumRows | NumCols +|a |M |K +|b |K |N +|accumulator | M| N +|====================== +-- + +### Additional contraints in the `ext_oneapi_cuda` backend + +IMPORTANT: The `stride` argument to `joint_matrix_load` and `joint_matrix_store` must be a multiple of 8 when `T` is `half`, and a multiple of 4 when `T` is `float`; where `T` is the type of the `joint_matrix` elements. + +## Revision History + +[frame="none",options="header"] +|====================== +|Rev |Date |Author |Changes +|1 |2022-10-5 |Jack Kirk |Initial public working draft. +|======================