Add Bfloat16 support on Ampere GPU with CUDA 11 by AshburnLee · Pull Request #32132 · PaddlePaddle/Paddle

AshburnLee · 2021-04-07T12:24:57Z

PR types

Others

PR changes

Others

Describe

Add Bfloat16 support on Ampere GPU with CUDA 11. Below is the test result on RTX3090 CUDA 11.2:

All tests passed.

Update forked PaddlePaddle

Update my fork

update from PaddlePaddle

Update forked paddle repo

Update USERNAME/paddle

update Paddle USERNAME repo

update username repo

update local paddlepaddle

update paddlepaddle

paddle-bot-old · 2021-04-07T12:25:00Z

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

… bf16

Xreki

PR描述中截图贴出在Ampere架构GPU上单测的执行情况。

Xreki · 2021-04-19T08:01:54Z

paddle/fluid/platform/bfloat16.h

+// #ifdef __HIPCC__
+// #define PADDLE_CUDA_BF16
+// #include <hip/hip_bf16.h>
+// #endif


删掉注释的代码。

Xreki · 2021-04-19T08:20:53Z

paddle/fluid/platform/bfloat16.h

  HOSTDEVICE inline explicit bfloat16(const T& val)
      : x(bfloat16(static_cast<float>(val)).x) {}

+  // Assignment operators


赋值运算符也需要添加下__nv_bfloat16类型的支持。

Xreki · 2021-04-19T08:30:33Z

paddle/fluid/platform/bfloat16.h

+// Arithmetic & Comparison operators on CUDA11 & Ampere-arch GPU
+#if defined(__CUDACC__) && CUDA_VERSION >= 11000 && defined(__CUDA_ARCH__) && \
+    __CUDA__ARCH__ >= 800
+DEVICE inline __nv_bfloat16 operator+(const __nv_bfloat16& a,


cuda11本身有定义了这些运算符，在cuda_bf16.hpp中。float16.h中是为了cuda9以下的版本重载的这些half运算符；cuda9以上工具链自己提供了，不需要在用户代码中定义。

已删除，Done.

删除了cuda11中已定义运算符的单测

Xreki · 2021-04-19T08:36:49Z

paddle/fluid/platform/bfloat16_test.cu

+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License. */


注意copyright的格式，应该有空行。

Xreki · 2021-04-19T08:52:30Z

paddle/fluid/platform/CMakeLists.txt


 IF(WITH_ROCM)
  hip_test(float16_gpu_test SRCS float16_test.cu DEPS lod_tensor)
+  hip_test(bfloat16_gpu_test SRCS bfloat16_test.cu DEPS lod_tensor)


rocm没有支持，不要加这个单测。

Xreki · 2021-04-19T08:54:09Z

paddle/fluid/platform/bfloat16_test.cu

+    out[0] = in1[0] sign in2[0];                                         \
+  }
+
+#define ARITHMETIC_KERNEL_LAUNCH(op_type)                     \


单测里面也不建议用这种大段的宏，非常影响代码的可读性。

如果展开代码会相当长，且许多重复代码。

可以通过封装公共函数、定义Functor来避免重复的代码。

… bf16

Xreki

因后续工作依赖这个PR，故先合进去了。review提到的相关问题请再提PR修复。

Xreki · 2021-04-21T08:44:50Z

paddle/fluid/platform/bfloat16.h

+#if defined(__CUDACC__) && CUDA_VERSION >= 11000
+#define PADDLE_CUDA_BF16
+#include <cuda_bf16.h>
+#endif


提醒一下：有一些宏定义，直接从float16.h里面抄过来了，如果哪个文件同时include了float16.h和bfloat16.h，估计会出现宏重复定义的错误。不是这个PR引入的，暂时不用处理。

paddle/fluid/platform/bfloat16.h

Xreki · 2021-04-21T13:07:14Z

paddle/fluid/platform/bfloat16_test.cu

+#include <iostream>
+#include "paddle/fluid/framework/lod_tensor.h"
+
+#if defined(PADDLE_CUDA_BF16)


其实这个单测不应该在只有PADDLE_CUDA_BF16定义了的情况下才执行，因为blfoat16.h和float16.h的实现是兼容所有的CUDA版本、GPU型号的，也就是原生不支持float16、bfloat16的CUDA版本、GPU型号，会自动转换成float计算。

paddle/fluid/platform/bfloat16.h

Xreki · 2021-04-21T13:14:55Z

paddle/fluid/platform/bfloat16_test.cu

+namespace paddle {
+namespace platform {
+
+TEST(bfloat16, convert_float32_to_bfloat16_on_gpu) {


CUDA Kernel都删掉了吗？那这些单测都是在CPU上执行的吧？下个PR把GPU的单测加回来吧。

Xreki · 2021-04-21T13:20:12Z

paddle/fluid/platform/bfloat16_test.cu

+  framework::TensorCopy(src_tensor, gpu_place, gpu_ctx, &gpu_tensor);
+
+  // GPU LoDTensor to CPU LoDTensor
+  framework::TensorCopy(gpu_tensor, CPUPlace(), gpu_ctx, &dst_tensor);


这个单测从CPU拷贝到GPU、GPU上什么都不做再拷贝回CPU，没什么意义。

AshburnLee added 10 commits September 8, 2020 09:45

Merge pull request #1 from PaddlePaddle/develop

8f532b0

Update forked PaddlePaddle

Merge pull request #2 from PaddlePaddle/develop

5b5804d

Update my fork

Merge pull request #3 from PaddlePaddle/develop

cee2470

update from PaddlePaddle

Merge pull request #4 from PaddlePaddle/develop

5be3a45

Update forked paddle repo

Merge pull request #5 from PaddlePaddle/develop

a1d92b7

Update USERNAME/paddle

Merge pull request #6 from PaddlePaddle/develop

e674a5d

update Paddle USERNAME repo

Merge pull request #7 from PaddlePaddle/develop

855d00b

update username repo

Merge pull request #8 from PaddlePaddle/develop

7cb2c97

update local paddlepaddle

Merge pull request #9 from PaddlePaddle/develop

db9fc91

update paddlepaddle

Add Bfloat16 support on Ampere GPU with CUDA 11

33833d1

AshburnLee added 2 commits April 14, 2021 04:52

Add in platform/CMakeLists.txt

6988d37

Merge branch 'develop' of https://github.com/PaddlePaddle/paddle into…

2f93d41

… bf16

Xreki reviewed Apr 19, 2021

View reviewed changes

AshburnLee added 4 commits April 20, 2021 08:27

Complete testing on RTX8090

4c49d0b

Merge branch 'develop' of https://github.com/PaddlePaddle/paddle into…

ef5621b

… bf16

Remove codes

d52bf46

Merge branch 'develop' of https://github.com/PaddlePaddle/paddle into…

a95da6b

… bf16

Xreki approved these changes Apr 21, 2021

View reviewed changes

Xreki merged commit bf0ec9b into PaddlePaddle:develop Apr 21, 2021

AshburnLee deleted the bf16 branch April 22, 2021 02:36

Conversation

AshburnLee commented Apr 7, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR types

PR changes

Describe

Uh oh!

paddle-bot-old bot commented Apr 7, 2021

Uh oh!

Xreki left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Xreki Apr 21, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Xreki left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

AshburnLee commented Apr 7, 2021 •

edited

Loading

Xreki Apr 21, 2021 •

edited

Loading