-
Notifications
You must be signed in to change notification settings - Fork 808
[SYCL] Move bfloat support from experimental to supported. #6524
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 2 commits
6014cef
bdd88e5
73ed541
0fe1884
feb9d5f
129f53f
2115f09
3c2eb80
74aa175
bd05711
f8e894c
2ad68f6
4b78c03
0fce16d
4bcb383
35308f8
fa045e2
3322d6a
b12fd94
87b0f09
f217eb4
a908b11
aab4c78
a2568ba
4d7a22b
38e5ad4
b9accad
ca7880a
dc3b2b5
c955d36
1aa6ad3
ff04ce1
802f502
8d7f46a
190f2a3
84c50f3
df058ba
fed4d1d
28259d0
c11115b
6b05a2a
a82d73a
3fc8885
1ec6838
105094b
432e775
c135643
4eca414
8876ac8
f0f2727
17673bf
1094b8c
8d40228
c5a85cf
cf8f6e0
5e50646
45d3e70
a7be718
cac1c18
208c09a
46f406d
6830857
46e5278
10fc9a3
6195545
437e34a
09dc4c5
386353e
0f93586
48f3cac
d33cb10
28992c2
ec28c8b
b958fc7
ec70b20
1b86012
3e1e681
8c633d3
1a59e03
b2fd6cc
fab2e54
35b8910
a05c872
ac5f603
6d45ed1
077d0fe
2ff6a9d
d7c80ee
20d13df
cd1d0a2
4bf60b9
45c32f7
5de1bf7
6ec2bb9
49e9cd1
2065060
e24e57b
41098ab
37b05f0
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -39,39 +39,22 @@ https://github.com/intel/llvm/issues | |
|
|
||
| == Dependencies | ||
|
|
||
| This extension is written against the SYCL 2020 specification, Revision 4. | ||
| This extension is written against the SYCL 2020 specification, Revision 5. | ||
|
|
||
| == Status | ||
|
|
||
| This extension is implemented and fully supported by DPC++. | ||
| [NOTE] | ||
| ==== | ||
| This extension is currently implemented in {dpcpp} only for GPU devices that support bfloat16 natively. Attempting to use this extension in | ||
| This extension is currently implemented in `dpcpp` only for GPU devices that support `bfloat16` natively. Attempting to use this extension in | ||
| kernels that run on other devices may result in undefined behavior. | ||
| Be aware that the compiler is not able to issue a diagnostic to warn you if this happens. | ||
| ==== | ||
|
|
||
gmlueck marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| == Version | ||
|
|
||
| Revision: 5 | ||
|
|
||
| == Overview | ||
|
|
||
| This extension adds functionality to convert values of single-precision | ||
| floating-point type(`float`) to `bfloat16` type and vice versa. The extension | ||
| doesn't add support for `bfloat16` type as such, instead it uses 16-bit integer | ||
| type(`uint16_t`) as a storage for `bfloat16` values. | ||
| This extension adds support for a 16-bit floating point type `bfloat16`. This type occupies 16 bits of storage space as does the `sycl::half` type. However, `bfloat16` allots 8 bits to the exponent instead of the 5 bits used by `sycl::half` and 7 bits to the significand versus 10 bits used by `sycl::half`. Thus, `bfloat16` has the same dynamic range as a 32-bit `float` but with reduced precision. This type is useful when memory required to store the values must be reduced, and when the calculations require high dynamic range but can tolerate lower-precision. Some implementations may still perform operations on this type using 32-bit math. For example, they may convert the `bfloat16` value to `float`, and then perform the operation on the 32-bit `float`. | ||
|
||
|
|
||
| The purpose of conversion from float to bfloat16 is to reduce the amount of memory | ||
| required to store floating-point numbers. Computations are expected to be done with | ||
| 32-bit floating-point values. | ||
|
|
||
| This extension is an optional kernel feature as described in | ||
| https://www.khronos.org/registry/SYCL/specs/sycl-2020/html/sycl-2020.html#sec:optional-kernel-features[section 5.7] | ||
| of the SYCL 2020 spec. Therefore, attempting to submit a kernel using this | ||
| feature to a device that does not support it should cause a synchronous | ||
| `errc::kernel_not_supported` exception to be thrown from the kernel invocation | ||
| command (e.g. from `parallel_for`). | ||
|
|
||
| == Specification | ||
|
|
||
|
|
@@ -91,7 +74,7 @@ the implementation supports this feature, or applications can test the macro’s | |
| |1 |Initial extension version. Base features are supported. | ||
| |=== | ||
|
|
||
| == Extension to `enum class aspect` | ||
| === Extension to `enum class aspect` | ||
|
|
||
| [source] | ||
| ---- | ||
|
|
@@ -106,16 +89,18 @@ enum class aspect { | |
| If a SYCL device has the `ext_oneapi_bfloat16` aspect, then it natively | ||
| supports conversion of values of `float` type to `bfloat16` and back. | ||
gmlueck marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| If the device doesn't have the aspect, objects of `bfloat16` class must not be | ||
| used in the device code. | ||
| This extension is an optional kernel feature as described in section 5.7 of the SYCL 2020 spec, with the associated aspect `ext_oneapi_bfloat16`. Applications can query whether the device has this aspect to determine if it supports kernels that use `bfloat16`. Attempting to submit a kernel using `bfloat16` to a device that does not support it causes a synchronous `errc::kernel_not_supported` exception to be thrown from the kernel invocation command (e.g. from `parallel_for`). | ||
|
|
||
| [NOTE] | ||
| ==== | ||
| . DPC++ does not currently implement the `errc::kernel_not_supported` exception in this case. Attempting to submit a kernel using `bfloat16` to a device that does not have the `ext_oneapi_bfloat16` aspect results in undefined behavior. | ||
| . The `bfloat16` class is currently supported only on Xe HP GPUs and Nvidia GPUs with Compute Capability >= SM80. | ||
| ==== | ||
|
|
||
| **NOTE**: The `bfloat16` class is currently supported only on Xe HP GPUs and Nvidia GPUs with Compute Capability >= SM80. | ||
|
|
||
| == New `bfloat16` class | ||
| === New `bfloat16` class | ||
|
|
||
gmlueck marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| The `bfloat16` class below provides the conversion functionality. Conversion | ||
| from `float` to `bfloat16` is done with round to nearest even(RTE) rounding | ||
| mode. | ||
| The `bfloat16` type represents a 16-bit floating point value. Conversions from `float` to `bfloat16` are done with round to nearest even (RTE) rounding mode. | ||
|
|
||
gmlueck marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| [source] | ||
| ---- | ||
|
|
@@ -124,8 +109,6 @@ namespace ext { | |
| namespace oneapi { | ||
|
|
||
| class bfloat16 { | ||
| using storage_t = uint16_t; | ||
| storage_t value; | ||
|
|
||
| public: | ||
| bfloat16() = default; | ||
|
|
@@ -138,6 +121,13 @@ public: | |
|
|
||
| // Convert bfloat16 to float | ||
| operator float() const; | ||
|
|
||
| // Convert from sycl::half to bfloat16 | ||
| bfloat16(const sycl::half &a); | ||
| bfloat16 &operator=(const sycl::half &a); | ||
|
|
||
| // Convert bfloat16 to sycl::half | ||
| operator sycl::half() const; | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I like this conversion to Do we also need conversion to / from
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This PR is intended to move the current bfloat16 support out of experimental space. Any changes to the level of bfloat16 support can be done in future PRs.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. On Intel platforms the bfloat16 to/from float is done using the __spirv_ConvertBF16ToFINTELoperator. I suspect a double version of that does not exist.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The sycl::half class includes conversions to/from float. Those kick in when bfloat16 is used with sycl::half, so conversions between bfloat16 and sycl::half are not needed.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Are you saying that we should remove this conversion from
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, its not needed.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This item was revisited and it turns out that sycl::half <-> bfloat16 conversions are needed. They have been added.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sorry for joining the discussion late. May be it's a nitpick, but should we tell, that conversion half <-> bfloat16 follows IEEE 754 float <-> half conversion? In other words, what happens, if bfloat16 value overflows half range? Also are we adding last 3 fraction bits stochastically or they are guarantied to be zero (or it's implementation detail)? |
||
|
|
||
| // Convert bfloat16 to bool type | ||
| explicit operator bool(); | ||
|
|
@@ -186,6 +176,15 @@ Table 1. Member functions of `bfloat16` class. | |
| | `operator float() const;` | ||
| | Return `bfloat16` value converted to `float`. | ||
|
|
||
| | `bfloat16(const sycl::half& a);` | ||
| | Construct `bfloat16` from `sycl::half`. Converts `sycl::half` to `bfloat16`. | ||
|
|
||
| | `bfloat16 &operator=(const sycl::half &a);` | ||
| | Replace the value with `a` converted to `bfloat16` | ||
|
|
||
| | `operator sycl::half() const;` | ||
| | Return `bfloat16` value converted to `sycl::half`. | ||
|
|
||
| | `explicit operator bool() { /* ... */ }` | ||
| | Convert `bfloat16` to `bool` type. Return `false` if the `value` equals to | ||
| zero, return `true` otherwise. | ||
|
|
@@ -279,7 +278,6 @@ float foo(float a, float b) { | |
| bfloat16 C = A + B; | ||
|
|
||
| // Return the result converted from bfloat16 to float. | ||
| // return sycl::ext::oneapi::float(C); | ||
| return C; | ||
| } | ||
|
|
||
|
|
@@ -292,8 +290,7 @@ int main(int argc, char *argv[]) { | |
| if (dev.has(aspect::ext_oneapi_bfloat16)) { | ||
| deviceQueue.submit([&](handler &cgh) { | ||
| accessor numbers{buf, cgh, read_write}; | ||
| cgh.single_task<class simple_kernel>( | ||
| [=]() { numbers[2] = foo(numbers[0], numbers[1]); }); | ||
| cgh.single_task([=]() { numbers[2] = foo(numbers[0], numbers[1]); }); | ||
| }); | ||
| } else { | ||
| std::cout << "No bfloat16 support\n"; | ||
|
|
@@ -307,11 +304,11 @@ int main(int argc, char *argv[]) { | |
|
|
||
| == New bfloat16 math functions | ||
gmlueck marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| Many applications will require dedicated functions that take parameters of type `bfloat16`. This extension adds `bfloat16` support to the `fma`, `fmin`, `fmax` and `fabs` SYCL floating point math functions. These functions can be used as element wise operations on matrices, supplementing the `bfloat16` support in the sycl_ext_oneapi_matrix extension. | ||
| Many applications will require dedicated functions that take parameters of type `bfloat16`. This extension adds `bfloat16` support to the `fma`, `fmin`, `fmax` and `fabs` SYCL floating point math functions. These functions can be used as element wise operations on matrices, supplementing the `bfloat16` support in the `sycl_ext_oneapi_matrix` extension. | ||
|
|
||
| The descriptions of the `fma`, `fmin`, `fmax` and `fabs` SYCL floating point math functions can be found in the SYCL specification: https://www.khronos.org/registry/SYCL/specs/sycl-2020/html/sycl-2020.html#_math_functions. | ||
|
|
||
| The following functions are only available when `T` is `bfloat16` or `sycl::marray<bfloat16, {N}>`, where `{N}` means any positive value of `size_t` type. | ||
|
|
||
|
|
||
| === fma | ||
gmlueck marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.