add a fusion op: fused_dropout_act_bias #35129

zkh2016 · 2021-08-24T12:46:45Z

PR types

New features

PR changes

OPs

Describe

Fuse the elementwise_add, activatioin and dropout into one operator.

//before fusion
out1 = elementwise_add(src, bias)
out2 = activation(out1)
out3 = dropout(out2)

//after fusion
out = fused_dropout_act_bias(src, bias, activation_functor)

paddle-bot-old · 2021-08-24T12:47:01Z

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

xingfeng01 · 2021-08-25T05:57:39Z

paddle/fluid/operators/fused/fused_dropout.h

这这文件定义的函数是不是被几个地方共用呢？

嗯公用的，在另外一个pr也提了，得先合一个

paddle/fluid/operators/fused/fused_dropout_act_bias.h

xingfeng01 · 2021-08-25T05:59:50Z

paddle/fluid/operators/fused/fused_dropout_test.h

这个文件共用吗？

xingfeng01 · 2021-08-25T06:01:16Z

paddle/fluid/operators/fused/fused_dropout_act_bias.h

加些注释说明函数的功能吧

…tGrad

zhangting2020 · 2021-09-09T12:08:18Z

paddle/fluid/operators/fused/fused_dropout_act_bias.h

+    return static_cast<T>(casted_dout * (first + second));
+  }
+};
+


上面的激活函数，Relu和Gelu在math下面都有，可以直接复用吗，因为math下面实现的接口已经很统一了，复用的话这里应该就不需要再封装一遍？

done, gelu的实现参考gelu_op的，和math下的稍有不同。可以直接传math下的functor。

不同点在哪？

ref: https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/fluid/operators/gelu_op.h#L96
这个主要是参考gelu_op的实现，采用了两种计算方式，
一种近似计算和math的方式一样： gelu(x) = 0.5 * x * (1 + tanh(sqrt(2 / \pi) * (x + 0.044715 * x^{3})))
另一种：gelu(x) = 0.5 * x * (1 + erf(x / sqrt(2)))

Xreki · 2021-09-09T11:44:02Z

paddle/fluid/operators/fused/fused_dropout_act_bias.h

+namespace paddle {
+namespace operators {
+
+typedef platform::float16 fp16;


代码中不建议用这种缩写。

Xreki · 2021-09-09T14:20:17Z

paddle/fluid/operators/fused/fused_dropout_act_bias.h

+      }
+      // store result to global
+      platform::Store<T, VecSize>(dest_vec, &dst[r * cols + i]);
+      platform::Store<MaskType, VecSize>(mask_vec, &mask[r * cols + i]);


感觉与上一个PR中存在很多相同的代码，建议通过封装函数进行代码复用。

已改，提交在下一个PR中

Xreki · 2021-09-09T14:21:23Z

paddle/fluid/operators/fused/fused_dropout_act_bias.h

+                          const platform::CUDADeviceContext &ctx) {
+  // dropout_prob == 1.0f
+  if (std::abs(dropout_prob - 1.0f) < 1e-5) {
+    PADDLE_ENFORCE_CUDA_SUCCESS(


既然这个Memset被调用了这么多次，也可以实现个函数封装下。

Xreki · 2021-09-09T14:24:23Z

paddle/fluid/operators/fused/fused_dropout_act_bias.h

+
+    StoreT dx_vec;
+#pragma unroll
+    for (int ii = 0; ii < VecSize; ii++) {


是不是可以调用Kernel Primitives API函数？

Xreki · 2021-09-09T14:27:10Z

paddle/fluid/operators/fused/fused_dropout_act_bias.h

+  int bias_id = blockIdx.x * blockDim.x * VecSize + x;
+  if (y == 0 && x < VecSize * BlockSizeX && bias_id < cols) {
+    dbias[bias_id] = sum;
+  }


L279 - L307也是和上个PR中一样的。

Xreki · 2021-09-09T14:29:56Z

paddle/fluid/operators/fused/fused_dropout_act_bias_test.cu

+
+    {
+      out.Resize({rows, cols});
+      out.mutable_data<T>(place);


可以直接调用out.mutable_data<T>(dims, place);

Xreki · 2021-09-09T14:33:15Z

paddle/fluid/operators/fused/fused_residual_dropout_bias_test.cu

      test.Run();
      test.CheckOut(default_diff);
      if (!is_fp16) {
+        // test fp16, For inference, check_grad is not required. ref:


需要支持fp16训练。

Xreki

LGTM.

Xreki · 2021-09-16T01:54:05Z

paddle/fluid/operators/fused/fused_dropout_act_bias.h

+    return static_cast<T>(casted_dout * (first + second));
+  }
+};
+


不同点在哪？

Xreki · 2021-09-16T02:33:46Z

paddle/fluid/operators/fused/fused_dropout_act_bias.h

+    StoreT dx_vec;
+#pragma unroll
+    for (int ii = 0; ii < VecSize; ii++) {
+      T args[2];


当前这种写法没有必要定义T args[2];

已在下一个PR中修改

Xreki · 2021-09-16T03:10:52Z

paddle/fluid/operators/fused/fused_dropout_act_bias.h

+#pragma unroll
+      for (int i = 0; i < VecSize; i++) {
+        T val;
+        T args[2];


当前这种写法没有必要定义T args[2];

已在下一个PR中修改

Xreki · 2021-09-16T03:14:39Z

paddle/fluid/operators/fused/fused_dropout_act_bias_test.cu

+static void BaseTest(const bool is_fp16 = false) {
+  const int rows = 16;
+  std::vector<int> cols_list = {16, 17};
+  bool has_bias[2] = {true, false};


L271、L272这两行多余的。

已在下一个PR中修改

Xreki · 2021-09-16T03:21:17Z

paddle/fluid/operators/fused/fused_dropout_act_bias_test.cu

+           paddle::operators::GeluGradFunctor<double>>();
+}
+
+// test fp16, For inference, check_grad is not required. ref: test_dropout_op.py


Paddle/python/paddle/fluid/tests/unittests/test_dropout_op.py

Lines 96 to 107 in bab39eb

@skip_check_grad_ci(reason="For inference, check_grad is not required.")

class TestDropoutOp5(OpTest):

def setUp(self):

self.op_type = "dropout"

self.inputs = {'X': np.random.random((32, 64, 3)).astype("float32")}

self.attrs = {'dropout_prob': 0.75, 'is_test': True}

self.outputs = {

'Out': self.inputs['X'] * (1.0 - self.attrs['dropout_prob'])

}

def test_check_output(self):

self.check_output()

dropout单测中也存在fp32没有检查grad的，只代表这个测试case是为了测试推理的正确性。

已经下一个PR中加了grad的单测了

zkh2016 added 4 commits August 23, 2021 08:16

add a fusion op: fused_residual_dropout_bias

bf318b8

simplify the code, andd opt reduce sum

507117a

resolve review comments and add comments to the code

462caa1

fused_dropout: optimize code structure to facilitate reuse

93e0638

Merge branch 'PaddlePaddle:develop' into develop

e2808ff

xingfeng01 reviewed Aug 25, 2021

View reviewed changes

optimize code structure to facilitate reuse

036b430

zkh2016 force-pushed the fused_dropout_act_bias branch from d643509 to f3a365e Compare August 25, 2021 12:14

zkh2016 added 2 commits August 30, 2021 10:30

modify the code according to the review comments

4d33b98

replace cudaMemcpy with TensorFromVector and TensorToVector in Dropou…

bd44d04

…tGrad

zkh2016 force-pushed the fused_dropout_act_bias branch from f3a365e to f8f4e07 Compare August 30, 2021 12:09

set dropout attr 'is_test':false

d2beab7

zkh2016 added 6 commits September 2, 2021 09:22

optimize the code according to the review comments

5d2bbc8

use static_cast

934fcac

fix the blocks for large shape

44610ea

Merge remote-tracking branch 'upstream/develop' into develop

3133d33

merge upstream, and used new AlignedVector

1a83adb

add a fusion op: fused_dropout_act_bias

4dba815

zkh2016 force-pushed the fused_dropout_act_bias branch from a21bc90 to 4dba815 Compare September 8, 2021 11:45

zkh2016 added 3 commits September 9, 2021 02:11

remove unused code

f848739

Merge branch 'develop' into fused_dropout_act_bias

6d30340

redefine activation functor

b8a9861

implement the same gelu as the baseline for FFN

fd01daa

zhangting2020 reviewed Sep 9, 2021

View reviewed changes

Xreki reviewed Sep 9, 2021

View reviewed changes

zkh2016 added 2 commits September 10, 2021 02:25

add #define _USE_MATH_DEFINES for windows

cabb9d2

modify the code according to the review comment

3cfdff8

zkh2016 requested review from Xreki and zhangting2020 September 13, 2021 05:58

xingfeng01 approved these changes Sep 15, 2021

View reviewed changes

zhangting2020 approved these changes Sep 15, 2021

View reviewed changes

Xreki approved these changes Sep 16, 2021

View reviewed changes

Xreki merged commit cee7043 into PaddlePaddle:develop Sep 16, 2021

AnnaTrainingG pushed a commit to AnnaTrainingG/Paddle that referenced this pull request Sep 29, 2021

add a fusion op: fused_dropout_act_bias (PaddlePaddle#35129)

b0e5fc9

This was referenced Oct 22, 2021

add op: fused_feedforward(forward) #35843

Merged

add op: fused_feedforward(backward) #35611

Merged

zkh2016 deleted the fused_dropout_act_bias branch August 19, 2022 04:05

	@skip_check_grad_ci(reason="For inference, check_grad is not required.")
	class TestDropoutOp5(OpTest):
	def setUp(self):
	self.op_type = "dropout"
	self.inputs = {'X': np.random.random((32, 64, 3)).astype("float32")}
	self.attrs = {'dropout_prob': 0.75, 'is_test': True}
	self.outputs = {
	'Out': self.inputs['X'] * (1.0 - self.attrs['dropout_prob'])
	}

	def test_check_output(self):
	self.check_output()

add a fusion op: fused_dropout_act_bias #35129

add a fusion op: fused_dropout_act_bias #35129

Uh oh!

Conversation

zkh2016 commented Aug 24, 2021

PR types

PR changes

Describe

Uh oh!

paddle-bot-old bot commented Aug 24, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Xreki left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone