intel · pvchupin · Nov 28, 2022 · Aug 3, 2022 · Aug 3, 2022 · Aug 24, 2022
@@ -39,39 +39,22 @@ https://github.com/intel/llvm/issues
 
 == Dependencies
 
-This extension is written against the SYCL 2020 specification, Revision 4.
+This extension is written against the SYCL 2020 specification, Revision 5.
 
 == Status
 
 This extension is implemented and fully supported by DPC++.
 [NOTE]
 ====
-This extension is currently implemented in {dpcpp} only for GPU devices that support bfloat16 natively. Attempting to use this extension in
+This extension is currently implemented in `dpcpp` only for GPU devices that support `bfloat16` natively. Attempting to use this extension in
 kernels that run on other devices may result in undefined behavior.
 Be aware that the compiler is not able to issue a diagnostic to warn you if this happens.
 ====
 
-== Version
-
-Revision: 5
-
 == Overview
 
-This extension adds functionality to convert values of single-precision
-floating-point type(`float`) to `bfloat16` type and vice versa. The extension
-doesn't add support for `bfloat16` type as such, instead it uses 16-bit integer
-type(`uint16_t`) as a storage for `bfloat16` values.
+This extension adds support for a 16-bit floating point type `bfloat16`. This type occupies 16 bits of storage space as does the `sycl::half` type. However, `bfloat16` allots 8 bits to the exponent instead of the 5 bits used by `sycl::half` and 7 bits to the significand versus 10 bits used by `sycl::half`. Thus, `bfloat16` has the same dynamic range as a 32-bit `float` but with reduced precision. This type is useful when memory required to store the values must be reduced, and when the calculations require high dynamic range but can tolerate lower-precision. Some implementations may still perform operations on this type using 32-bit math. For example, they may convert the `bfloat16` value to `float`, and then perform the operation on the 32-bit `float`.
 
-The purpose of conversion from float to bfloat16 is to reduce the amount of memory
-required to store floating-point numbers. Computations are expected to be done with
-32-bit floating-point values.
-
-This extension is an optional kernel feature as described in
-https://www.khronos.org/registry/SYCL/specs/sycl-2020/html/sycl-2020.html#sec:optional-kernel-features[section 5.7]
-of the SYCL 2020 spec. Therefore, attempting to submit a kernel using this
-feature to a device that does not support it should cause a synchronous
-`errc::kernel_not_supported` exception to be thrown from the kernel invocation
-command (e.g. from `parallel_for`).
 
 == Specification
 
@@ -91,7 +74,7 @@ the implementation supports this feature, or applications can test the macro’s
 |1     |Initial extension version. Base features are supported.
 |===
 
-== Extension to `enum class aspect`
+=== Extension to `enum class aspect`
 
 [source]
 ----
@@ -106,16 +89,18 @@ enum class aspect {
 If a SYCL device has the `ext_oneapi_bfloat16` aspect, then it natively
 supports conversion of values of `float` type to `bfloat16` and back.
 
-If the device doesn't have the aspect, objects of `bfloat16` class must not be
-used in the device code.
+This extension is an optional kernel feature as described in section 5.7 of the SYCL 2020 spec, with the associated aspect `ext_oneapi_bfloat16`. Applications can query whether the device has this aspect to determine if it supports kernels that use `bfloat16`. Attempting to submit a kernel using `bfloat16` to a device that does not support it causes a synchronous `errc::kernel_not_supported` exception to be thrown from the kernel invocation command (e.g. from `parallel_for`).
+
+[NOTE]
+====
+. DPC++ does not currently implement the `errc::kernel_not_supported` exception in this case. Attempting to submit a kernel using `bfloat16` to a device that does not have the `ext_oneapi_bfloat16` aspect results in undefined behavior.
+. The `bfloat16` class is currently supported only on Xe HP GPUs and Nvidia GPUs with Compute Capability >= SM80.
+====
 
-**NOTE**: The `bfloat16` class is currently supported only on Xe HP GPUs and Nvidia GPUs with Compute Capability >= SM80.
 
-== New `bfloat16` class
+=== New `bfloat16` class
 
-The `bfloat16` class below provides the conversion functionality. Conversion
-from `float` to `bfloat16` is done with round to nearest even(RTE) rounding
-mode.
+The `bfloat16` type represents a 16-bit floating point value. Conversions from `float` to `bfloat16` are done with round to nearest even (RTE) rounding mode.
 
 [source]
 ----
@@ -124,8 +109,6 @@ namespace ext {
 namespace oneapi {
 
 class bfloat16 {
-  using storage_t = uint16_t;
-  storage_t value;
 
 public:
   bfloat16() = default;
@@ -138,6 +121,13 @@ public:
 
   // Convert bfloat16 to float
   operator float() const;
+
+  // Convert from sycl::half to bfloat16
+  bfloat16(const sycl::half &a);
+  bfloat16 &operator=(const sycl::half &a);
+
+  // Convert bfloat16 to sycl::half
+  operator sycl::half() const;
 
   // Convert bfloat16 to bool type
   explicit operator bool();
@@ -186,6 +176,15 @@ Table 1. Member functions of `bfloat16` class.
 | `operator float() const;`
 |  Return `bfloat16` value converted to `float`.
 
+| `bfloat16(const sycl::half& a);`
+| Construct `bfloat16` from `sycl::half`. Converts `sycl::half` to `bfloat16`.
+
+| `bfloat16 &operator=(const sycl::half &a);`
+| Replace the value with `a` converted to `bfloat16`
+
+| `operator sycl::half() const;`
+|  Return `bfloat16` value converted to `sycl::half`.
+
 | `explicit operator bool() { /* ... */ }`
 | Convert `bfloat16` to `bool` type. Return `false` if the `value` equals to
   zero, return `true` otherwise.
@@ -279,7 +278,6 @@ float foo(float a, float b) {
   bfloat16 C = A + B;
 
   // Return the result converted from bfloat16 to float.
-  // return sycl::ext::oneapi::float(C);
   return C;
 }
 
@@ -292,8 +290,7 @@ int main(int argc, char *argv[]) {
   if (dev.has(aspect::ext_oneapi_bfloat16)) {
     deviceQueue.submit([&](handler &cgh) {
       accessor numbers{buf, cgh, read_write};
-      cgh.single_task<class simple_kernel>(
-          [=]() { numbers[2] = foo(numbers[0], numbers[1]); });
+      cgh.single_task([=]() { numbers[2] = foo(numbers[0], numbers[1]); });
     });
   } else {
     std::cout << "No bfloat16 support\n";
@@ -307,11 +304,11 @@ int main(int argc, char *argv[]) {
 
 == New bfloat16 math functions
 
-Many applications will require dedicated functions that take parameters of type `bfloat16`. This extension adds `bfloat16` support to the `fma`, `fmin`, `fmax` and `fabs` SYCL floating point math functions. These functions can be used as element wise operations on matrices, supplementing the `bfloat16` support in the sycl_ext_oneapi_matrix extension.
+Many applications will require dedicated functions that take parameters of type `bfloat16`. This extension adds `bfloat16` support to the `fma`, `fmin`, `fmax` and `fabs` SYCL floating point math functions. These functions can be used as element wise operations on matrices, supplementing the `bfloat16` support in the `sycl_ext_oneapi_matrix` extension.
 
 The descriptions of the `fma`, `fmin`, `fmax` and `fabs` SYCL floating point math functions can be found in the SYCL specification: https://www.khronos.org/registry/SYCL/specs/sycl-2020/html/sycl-2020.html#_math_functions.
 
-The following functions are only available when `T` is `bfloat16` or `sycl::marray<bfloat16, {N}>`, where `{N}` means any positive value of `size_t` type.
+
 
 === fma
 

@@ -52,6 +52,7 @@ class bfloat16 {
     return static_cast<uint16_t>((intStorage + roundingBias) >> 16);
 #endif
   }
+
   static float to_float(const storage_t &a) {
 #if defined(__SYCL_DEVICE_ONLY__)
 #if defined(__NVPTX__)
@@ -70,13 +71,14 @@ class bfloat16 {
 #endif
   }
 
-public:
   static bfloat16 from_bits(const storage_t &a) {
     bfloat16 res;
     res.value = a;
     return res;
   }
 
+public:
+
   // Implicit conversion from float to bfloat16
   bfloat16(const float &a) { value = from_float(a); }
 
@@ -85,9 +87,20 @@ class bfloat16 {
     return *this;
   }
 
+  // Implicit conversion from sycl::half to bfloat16
+  bfloat16(const sycl::half &a) { value = from_float(a); }
+
+  bfloat16 &operator=(const sycl::half &rhs) {
+    value = from_float(rhs);
+    return *this;
+  }
+
   // Implicit conversion from bfloat16 to float
   operator float() const { return to_float(value); }
 
+  // Implicit conversion from bfloat16 to sycl::half
+  operator sycl::half() const { return to_float(value); }
+
   // Get raw bits representation of bfloat16
   storage_t raw() const { return value; }