Add LRN efficient GPU implement. by gongweibao · Pull Request #5894 · PaddlePaddle/Paddle

gongweibao · 2017-11-24T08:31:46Z

qingqing01 · 2017-12-05T14:23:36Z

paddle/operators/lrn_op.cc

+template <typename T>
+struct LRNFunctor<platform::CPUPlace, T> {
+  void operator()(const framework::ExecutionContext& ctx,
+                  const framework::Tensor* input, framework::Tensor* out,


For input arguments: const framework::Tensor&

https://google.github.io/styleguide/cppguide.html#Reference_Arguments

Done!
Thanks!

qingqing01 · 2017-12-05T14:25:36Z

paddle/operators/lrn_op.cc

+    const int end = start + n;
+
+    auto e_mid = framework::EigenTensor<T, 4>::From(*mid);
+    e_mid.device(ctx.GetEigenDevice<platform::CPUPlace>()) = e_mid.constant(k);


For the CPU implementation of Eigen, there is no need to use .device().

e_mid.setConstant(k);

Done!
Thanks!

qingqing01 · 2017-12-05T14:26:01Z

paddle/operators/lrn_op.cc

+                               Eigen::array<int, 4>({{1, 1, H, W}}));
+
+            s.device(ctx.GetEigenDevice<platform::CPUPlace>()) +=
+                alpha * r.square();


The same as above:

s += alpha * r.square();

Done!
Thanks!

qingqing01 · 2017-12-05T14:26:22Z

paddle/operators/lrn_op.cc

+
+    auto out_e = framework::EigenVector<T>::Flatten(*out);
+    out_e.device(ctx.GetEigenDevice<platform::CPUPlace>()) =
+        x_v * e_mid.reshape(Eigen::DSizes<int, 1>(e_mid.size())).pow(-beta);


The same as above.

Done!
Thanks!

qingqing01 · 2017-12-05T14:26:47Z

paddle/operators/lrn_op.cc

+  void operator()(const framework::ExecutionContext& ctx,
+                  const framework::Tensor* x, const framework::Tensor* out,
+                  const framework::Tensor* mid, framework::Tensor* x_g,
+                  const framework::Tensor* out_g, int N, int C, int H, int W,


For the input arguments, the same as above comments.

Done!
Thanks!

qingqing01 · 2017-12-05T14:31:58Z

paddle/operators/lrn_op.cu

+                    T alpha, T beta) {
+  int img_size = N * H * W;
+  int block_size = 1024;
+  int grid_size = (img_size + 1024 - 1) / 1024;


用block_size替代line 69中的1024.

int grid_size = (img_size + block_size - 1) / block_size;

Done!
Thanks!

qingqing01 · 2017-12-05T14:32:21Z

paddle/operators/lrn_op.cu

+
+  int input_size = N * H * W * C;
+  block_size = 1024;
+  grid_size = (input_size + 1024 - 1) / 1024;


同上，用block_size替代line 79中的1024.

Done!
Thanks!

qingqing01 · 2017-12-05T14:35:52Z

paddle/operators/lrn_op.cu

+      }
+      if (index >= size) {
+        accum -= in[(index - size) * step] * in[(index - size) * step];
+      }


line 41和line 44中，可以利用寄存器先保存global内存中的数据，这样可以避免多次访问globle内存：

if (index < C) { T val = in[index * step]; accum += val * val; } if (index >= size) { T val = in[index - size) * step]; accum -= val * val; }

qingqing01 · 2017-12-05T14:37:35Z

paddle/operators/lrn_op.cu

+
+  const auto& stream =
+      reinterpret_cast<const platform::CUDADeviceContext&>(ctx.device_context())
+          .stream();


同上，参考https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/operators/lookup_table_op.cu#L139

Done!
Thanks!

qingqing01 · 2017-12-05T14:37:41Z

paddle/operators/lrn_op.cu

+  int img_size = N * H * W;
+
+  int block_size = 1024;
+  int grid_size = (img_size + 1024 - 1) / 1024;


Done!
Thanks!

qingqing01

LGTM.

gongweibao added 4 commits November 22, 2017 12:07

init

c30bfc6

forward compile ok

0f65e88

gpu forward ok

0bd0b69

backword ok

4715c53

gongweibao requested review from qingqing01 and reyoung November 24, 2017 08:32

rm not need

2834341

gongweibao changed the title ~~Add effient GPU implement~~ Add LRN efficient GPU implement. Nov 24, 2017

modify doc

745938c

gongweibao requested a review from hedaoyuan November 24, 2017 09:22

qingqing01 reviewed Dec 5, 2017

View reviewed changes

gongweibao added 2 commits December 6, 2017 01:54

fix by qingqing comments

75c62c4

fix by comments

b11d86d

qingqing01 approved these changes Dec 6, 2017

View reviewed changes

gongweibao merged commit c7e739f into PaddlePaddle:develop Dec 6, 2017

gongweibao deleted the lrngpu branch December 6, 2017 08:38

Comments

Conversation

gongweibao commented Nov 24, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

qingqing01 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants