Skip to content
Merged
Show file tree
Hide file tree
Changes from 9 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -741,12 +741,12 @@ descriptors that can be queried using `get_info` API.
[frame="none",options="header"]
|======================
| Device descriptors | Return type| Description
|`ext::oneapi::experimental::info::device::matrix::combinations` |
|`ext::oneapi::experimental::info::device::matrix_combinations` |
`std::vector<combination>`| tells the set of supported matrix sizes
and types on this device
|======================

The runtime query returns a vector of `combinations` of `combination`
The runtime query returns a vector of `matrix_combinations` of `combination`
type. Each combination includes the sizes and the types for the
matrices A, B, C, and D. Note that for each matrix hardware,
the query returns `max_msize, max_nsize, max_ksize` or `msize, nsize,
Expand Down Expand Up @@ -790,7 +790,7 @@ struct combination {
} // namespace sycl::ext::oneapi::experimental::matrix
```

Each combination of the `combinations` vector composes the types and
Each combination of the `matrix_combinations` vector composes the types and
sizes of A, B, C, and D matrices supported by the device
implementation. The table below provides a description of each member
of the `combination` struct.
Expand Down Expand Up @@ -832,7 +832,7 @@ the `T` template parameter as follows: +
```c++
// Ta, Tb, Tc, and Td are the types used in applications
std::vector<combination> combinations =
device.get_info<info::device::matrix::combinations>();
device.get_info<info::device::matrix_combinations>();
for (int i = 0; sizeof(combinations); i++) {
if (Ta == combinations[i].atype &&
Tb == combinations[i].btype &&
Expand All @@ -849,7 +849,7 @@ for (int i = 0; sizeof(combinations); i++) {
The table below provides a list of the combinations that
`joint_matrix` implementations support on each of Intel AMX and Intel
XMX hardware. Note that these can be returned using
`ext::oneapi::experimental::info::device::matrix::combinations`.
`ext::oneapi::experimental::info::device::matrix_combinations`.

==== Intel AMX Supported Combinations
This is currently available in devices with the architecture
Expand All @@ -875,32 +875,40 @@ table below.

==== Intel XMX Supported Combinations
This is currently available in devices with the architecture
`architecture::intel_gpu_pvc` and `architecture::intel_gpu_dg2`. In
these architectures' implementation, the type of the C matrix must be
the same as the type of the D matrix. Therefore, that common type is
shown in a single column in the table below.
`architecture::intel_gpu_pvc`, `architecture::intel_gpu_dg2_g10`,
`architecture::intel_gpu_dg2_g11`, and
`architecture::intel_gpu_dg2_g12`. In these architectures'
implementation, the type of the C matrix must be the same as the type
of the D matrix. Therefore, that common type is shown in a single
column in the table below.

[frame="none",options="header"]
|======================
| A type | B type | C and D type | M | N | K | device
| `matrix_type::uint8` | `matrix_type::uint8` |
`matrix_type::int32` | +<=+ 8 | 16 | 32 | architecture::intel_gpu_pvc
| | | | |8||architecture::intel_gpu_dg2
| `matrix_type::uint8` | `matrix_type::int8` |
`matrix_type::int32` | +<=+ 8 | 16 | 32 | architecture::intel_gpu_pvc
| | | | |8||architecture::intel_gpu_dg2
| `matrix_type::int8` | `matrix_type::uint8` |
`matrix_type::int32` | +<=+ 8 | 16 | 32 | architecture::intel_gpu_pvc
| | | | |8||architecture::intel_gpu_dg2
| `matrix_type::int8` | `matrix_type::int8` |
`matrix_type::int32` | +<=+ 8 | 16 | 32 | architecture::intel_gpu_pvc
| | | | |8||architecture::intel_gpu_dg2
| `matrix_type::fp16` | `matrix_type::fp16` |
`matrix_type::fp32` | +<=+ 8 | 16 | 16 | architecture::intel_gpu_pvc
| | | | |8|| architecture::intel_gpu_dg2
| `matrix_type::bf16` | `matrix_type::bf16` |
`matrix_type::fp32` | +<=+ 8 | 16 | 16 | architecture::intel_gpu_pvc
| | | | |8|| architecture::intel_gpu_dg2
.2+| `matrix_type::uint8` .2+| `matrix_type::uint8` .2+|
`matrix_type::sint32` .2+| +<=+ 8 | 16 .2+| 32
|`architecture::intel_gpu_pvc`|8|`architecture::intel_gpu_dg2_g10,
architecture::intel_gpu_dg2_g11, architecture::intel_gpu_dg2_g12`
.2+| `matrix_type::uint8` .2+| `matrix_type::int8` .2+|
`matrix_type::sint32` .2+| +<=+ 8 | 16 .2+| 32 |
`architecture::intel_gpu_pvc`|8|`architecture::intel_gpu_dg2_g10,
architecture::intel_gpu_dg2_g11, architecture::intel_gpu_dg2_g12`
.2+| `matrix_type::int8` .2+| `matrix_type::uint8` .2+|
`matrix_type::sint32` .2+| +<=+ 8 | 16 .2+| 32 |
`architecture::intel_gpu_pvc`|8|`architecture::intel_gpu_dg2_g10,
architecture::intel_gpu_dg2_g11, architecture::intel_gpu_dg2_g12`
.2+| `matrix_type::int8` .2+| `matrix_type::int8` .2+|
`matrix_type::sint32` .2+| +<=+ 8 | 16 .2+| 32 |
`architecture::intel_gpu_pvc`|8|`architecture::intel_gpu_dg2_g10,
architecture::intel_gpu_dg2_g11, architecture::intel_gpu_dg2_g12`
.2+|`matrix_type::fp16` .2+| `matrix_type::fp16` .2+|
`matrix_type::fp32` .2+| +<=+ 8 | 16 .2+| 16 |
`architecture::intel_gpu_pvc`|8| `architecture::intel_gpu_dg2_g10,
architecture::intel_gpu_dg2_g11, architecture::intel_gpu_dg2_g12`
.2+| `matrix_type::bf16` .2+| `matrix_type::bf16` .2+|
`matrix_type::fp32` .2+| +<=+ 8 | 16 .2+| 16 |
`architecture::intel_gpu_pvc` |8| `architecture::intel_gpu_dg2_g10,
architecture::intel_gpu_dg2_g11, architecture::intel_gpu_dg2_g12`
|======================

==== Nvidia Tensor Cores Supported Combinations
Expand Down Expand Up @@ -932,11 +940,11 @@ supported parameter combination is specified in the following table.
|16 |16 |16
|8 |32 |16
|32 |8 |16
.3+| `matrix_type::int8` .3+| `matrix_type::int32`
.3+| `matrix_type::int8` .3+| `matrix_type::sint32`
|16 |16 |16 .6+| sm_72
|8 |32 |16
|32 |8 |16
.3+|`matrix_type::uint8` .3+|`matrix_type::int32`
.3+|`matrix_type::uint8` .3+|`matrix_type::sint32`
|16 |16 |16
|8 |32 |16
|32 |8 |16
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -99,6 +99,7 @@ namespace sycl::ext::oneapi::experimental {

enum class architecture : /* unspecified */ {
x86_64,
intel_cpu_spr,
intel_gpu_bdw,
intel_gpu_skl,
intel_gpu_kbl,
Expand Down Expand Up @@ -195,6 +196,10 @@ of these enumerators, and it provides a brief description of their meanings.
|-
|Any CPU device with the x86_64 instruction set.

|`intel_cpu_spr`
|-
|Intel Xeon processor codenamed Sapphire Rapids.

|`intel_gpu_bdw`
|-
|Broadwell Intel graphics architecture.
Expand Down Expand Up @@ -588,6 +593,7 @@ feature, the application must be compiled in ahead-of-time (AOT) mode using
"special target values" listed in the link:../../UsersManual.md[users manual]
description of the `-fsycl-targets` option. These are the target names of the
form "intel_gpu_*", "nvidia_gpu_*", or "amd_gpu_*".
The support of AOT compilation for `intel_cpu_spr` is currently unavailable.

== Future direction

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ namespace ext::oneapi::experimental {

enum class architecture {
x86_64,
intel_cpu_spr,
intel_gpu_bdw,
intel_gpu_skl,
intel_gpu_kbl,
Expand Down
Loading