From ed6584aa707d83d4c99655c026372fcce04176e4 Mon Sep 17 00:00:00 2001 From: JackAKirk Date: Tue, 11 Apr 2023 13:03:44 +0100 Subject: [PATCH 1/4] Added Tensor Cores supported param combinations table. Signed-off-by: JackAKirk --- .../sycl_ext_oneapi_matrix.asciidoc | 52 +++++++++++++++++++ 1 file changed, 52 insertions(+) diff --git a/sycl/doc/extensions/experimental/sycl_ext_oneapi_matrix/sycl_ext_oneapi_matrix.asciidoc b/sycl/doc/extensions/experimental/sycl_ext_oneapi_matrix/sycl_ext_oneapi_matrix.asciidoc index 4c7214ab56e7a..3c32302179092 100644 --- a/sycl/doc/extensions/experimental/sycl_ext_oneapi_matrix/sycl_ext_oneapi_matrix.asciidoc +++ b/sycl/doc/extensions/experimental/sycl_ext_oneapi_matrix/sycl_ext_oneapi_matrix.asciidoc @@ -579,6 +579,58 @@ for (int i = 0; i < data.length; ++i) { } ``` +=== Appendix: Supported Parameter Combinations Per Hardware + +The tables below provide a list of the parameter combinations that +`joint_matrix` implementations support on each supported vendors hardware type. + +==== Nvidia Tensor Cores Supported Combinations + +The complete set of matrix data types and shapes that are supported by the `ext_oneapi_cuda` backend are represented in the following table. Tm indicates the matrix element data type held by a "multiplicand" `joint_matrix`: i.e requiring `use::a` or `use::b`. Tc indicates the matrix element data type held by an "accumulator" `joint_matrix`: i.e requiring `use::accumulator`. +IMPORTANT: When compiling for the `ext_oneapi_cuda` backend the target arch backend flag, `-Xsycl-target-backend --cuda-gpu-arch=sm_xx`, must be used, where `sm_xx` must be a Compute Capability that is equal to or greater than the appropriate Minimum Compute Capability. When an executable has been compiled for `sm_xx`, if the executable is run on a device with compute capability less than `sm_xx` then an error will be thrown. The mapping to Minimum Compute Capability from each supported parameter combination is specified in the following table. + +-- +[.center] +|====================== +|Tm (`use::a` or `use::b`) |Tc (`use::accumulator`) |M |N |K | Minimum Compute Capability +.3+|half .3+|float +|16 |16 |16| sm_70 +|8 |32 |16| sm_70 +|32 |8 |16| sm_70 +.3+|half .3+|half +|16 |16 |16| sm_70 +|8 |32 |16| sm_70 +|32 |8 |16| sm_70 +.3+|int8_t .3+|int32_t +|16 |16 |16| sm_72 +|8 |32 |16| sm_72 +|32 |8 |16| sm_72 +.3+|uint8_t .3+|int32_t +|16 |16 |16| sm_72 +|8 |32 |16| sm_72 +|32 |8 |16| sm_72 +|precision::tf32 |float |16 |16 |8| sm_80 +.3+|bfloat16 .3+|float +|16 |16 |16 |sm_80 +|8 |32 |16 |sm_80 +|32 |8 |16 |sm_80 +|double |double |8 |8 |4 |sm_80 +|====================== +-- + +The M, N, K triple from the above table defines the complete set of matrix shapes constructible: +-- +[.center] +|====================== +|use |NumRows | NumCols +|a |M |K +|b |K |N +|accumulator | M| N +|====================== +-- + +IMPORTANT: The `stride` argument to `joint_matrix_load` and `joint_matrix_store` must be a multiple of 8 when `T` is `half`, and a multiple of 4 when `T` is `float`; where `T` is the type of the `joint_matrix` elements. + ## TODO List - Add WI data to joint matrix mapping coordinates information for piece-wise operations. This will be added as part of the query or new methods to the 'get_wi_data' class. - Add a more realistic and complete example that shows the value of the general query. From fa311298e5736bae773a19f6019f84cc84428ba2 Mon Sep 17 00:00:00 2001 From: JackAKirk Date: Tue, 11 Apr 2023 13:12:15 +0100 Subject: [PATCH 2/4] Format. Signed-off-by: JackAKirk --- .../sycl_ext_oneapi_matrix/sycl_ext_oneapi_matrix.asciidoc | 1 + 1 file changed, 1 insertion(+) diff --git a/sycl/doc/extensions/experimental/sycl_ext_oneapi_matrix/sycl_ext_oneapi_matrix.asciidoc b/sycl/doc/extensions/experimental/sycl_ext_oneapi_matrix/sycl_ext_oneapi_matrix.asciidoc index 3c32302179092..d10a1dda6b14a 100644 --- a/sycl/doc/extensions/experimental/sycl_ext_oneapi_matrix/sycl_ext_oneapi_matrix.asciidoc +++ b/sycl/doc/extensions/experimental/sycl_ext_oneapi_matrix/sycl_ext_oneapi_matrix.asciidoc @@ -587,6 +587,7 @@ The tables below provide a list of the parameter combinations that ==== Nvidia Tensor Cores Supported Combinations The complete set of matrix data types and shapes that are supported by the `ext_oneapi_cuda` backend are represented in the following table. Tm indicates the matrix element data type held by a "multiplicand" `joint_matrix`: i.e requiring `use::a` or `use::b`. Tc indicates the matrix element data type held by an "accumulator" `joint_matrix`: i.e requiring `use::accumulator`. + IMPORTANT: When compiling for the `ext_oneapi_cuda` backend the target arch backend flag, `-Xsycl-target-backend --cuda-gpu-arch=sm_xx`, must be used, where `sm_xx` must be a Compute Capability that is equal to or greater than the appropriate Minimum Compute Capability. When an executable has been compiled for `sm_xx`, if the executable is run on a device with compute capability less than `sm_xx` then an error will be thrown. The mapping to Minimum Compute Capability from each supported parameter combination is specified in the following table. -- From 5ef09b229ca0284d96a37d9ace4ccb58193c9bbe Mon Sep 17 00:00:00 2001 From: JackAKirk Date: Tue, 11 Apr 2023 16:49:45 +0100 Subject: [PATCH 3/4] Clarify stride restrictions. Signed-off-by: JackAKirk --- .../sycl_ext_oneapi_matrix.asciidoc | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/sycl/doc/extensions/experimental/sycl_ext_oneapi_matrix/sycl_ext_oneapi_matrix.asciidoc b/sycl/doc/extensions/experimental/sycl_ext_oneapi_matrix/sycl_ext_oneapi_matrix.asciidoc index d10a1dda6b14a..84f7393eb05e0 100644 --- a/sycl/doc/extensions/experimental/sycl_ext_oneapi_matrix/sycl_ext_oneapi_matrix.asciidoc +++ b/sycl/doc/extensions/experimental/sycl_ext_oneapi_matrix/sycl_ext_oneapi_matrix.asciidoc @@ -595,13 +595,13 @@ IMPORTANT: When compiling for the `ext_oneapi_cuda` backend the target arch back |====================== |Tm (`use::a` or `use::b`) |Tc (`use::accumulator`) |M |N |K | Minimum Compute Capability .3+|half .3+|float -|16 |16 |16| sm_70 -|8 |32 |16| sm_70 -|32 |8 |16| sm_70 +|16 |16 |16 .6+| sm_70 +|8 |32 |16 +|32 |8 |16 .3+|half .3+|half -|16 |16 |16| sm_70 -|8 |32 |16| sm_70 -|32 |8 |16| sm_70 +|16 |16 |16 +|8 |32 |16 +|32 |8 |16 .3+|int8_t .3+|int32_t |16 |16 |16| sm_72 |8 |32 |16| sm_72 @@ -630,7 +630,7 @@ The M, N, K triple from the above table defines the complete set of matrix shape |====================== -- -IMPORTANT: The `stride` argument to `joint_matrix_load` and `joint_matrix_store` must be a multiple of 8 when `T` is `half`, and a multiple of 4 when `T` is `float`; where `T` is the type of the `joint_matrix` elements. +IMPORTANT: The `stride` argument to `joint_matrix_load` and `joint_matrix_store` must be a multiple of 8 when `T` is `half`, and a multiple of 4 when `T` is `float`; where `T` is the type of the `joint_matrix` elements. When `T` is not `half` or `float` there are no restrictions to `stride`. ## TODO List - Add WI data to joint matrix mapping coordinates information for piece-wise operations. This will be added as part of the query or new methods to the 'get_wi_data' class. From afa0c14abf96fb186ee82366b307be24c1e9a3c7 Mon Sep 17 00:00:00 2001 From: JackAKirk Date: Tue, 11 Apr 2023 16:53:44 +0100 Subject: [PATCH 4/4] Improved table row formatting. Signed-off-by: JackAKirk --- .../sycl_ext_oneapi_matrix.asciidoc | 22 +++++++++---------- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/sycl/doc/extensions/experimental/sycl_ext_oneapi_matrix/sycl_ext_oneapi_matrix.asciidoc b/sycl/doc/extensions/experimental/sycl_ext_oneapi_matrix/sycl_ext_oneapi_matrix.asciidoc index 84f7393eb05e0..cb430e7c794ef 100644 --- a/sycl/doc/extensions/experimental/sycl_ext_oneapi_matrix/sycl_ext_oneapi_matrix.asciidoc +++ b/sycl/doc/extensions/experimental/sycl_ext_oneapi_matrix/sycl_ext_oneapi_matrix.asciidoc @@ -603,19 +603,19 @@ IMPORTANT: When compiling for the `ext_oneapi_cuda` backend the target arch back |8 |32 |16 |32 |8 |16 .3+|int8_t .3+|int32_t -|16 |16 |16| sm_72 -|8 |32 |16| sm_72 -|32 |8 |16| sm_72 +|16 |16 |16 .6+| sm_72 +|8 |32 |16 +|32 |8 |16 .3+|uint8_t .3+|int32_t -|16 |16 |16| sm_72 -|8 |32 |16| sm_72 -|32 |8 |16| sm_72 -|precision::tf32 |float |16 |16 |8| sm_80 +|16 |16 |16 +|8 |32 |16 +|32 |8 |16 +|precision::tf32 |float |16 |16 |8 .5+| sm_80 .3+|bfloat16 .3+|float -|16 |16 |16 |sm_80 -|8 |32 |16 |sm_80 -|32 |8 |16 |sm_80 -|double |double |8 |8 |4 |sm_80 +|16 |16 |16 +|8 |32 |16 +|32 |8 |16 +|double |double |8 |8 |4 |====================== --