Add LRN efficient GPU implement.#5894
Merged
gongweibao merged 8 commits intoPaddlePaddle:developfrom Dec 6, 2017
Merged
Conversation
qingqing01
reviewed
Dec 5, 2017
paddle/operators/lrn_op.cc
Outdated
| template <typename T> | ||
| struct LRNFunctor<platform::CPUPlace, T> { | ||
| void operator()(const framework::ExecutionContext& ctx, | ||
| const framework::Tensor* input, framework::Tensor* out, |
Contributor
There was a problem hiding this comment.
For input arguments: const framework::Tensor&
https://google.github.io/styleguide/cppguide.html#Reference_Arguments
paddle/operators/lrn_op.cc
Outdated
| const int end = start + n; | ||
|
|
||
| auto e_mid = framework::EigenTensor<T, 4>::From(*mid); | ||
| e_mid.device(ctx.GetEigenDevice<platform::CPUPlace>()) = e_mid.constant(k); |
Contributor
There was a problem hiding this comment.
For the CPU implementation of Eigen, there is no need to use .device().
e_mid.setConstant(k);
paddle/operators/lrn_op.cc
Outdated
| Eigen::array<int, 4>({{1, 1, H, W}})); | ||
|
|
||
| s.device(ctx.GetEigenDevice<platform::CPUPlace>()) += | ||
| alpha * r.square(); |
Contributor
There was a problem hiding this comment.
The same as above:
s += alpha * r.square();
paddle/operators/lrn_op.cc
Outdated
|
|
||
| auto out_e = framework::EigenVector<T>::Flatten(*out); | ||
| out_e.device(ctx.GetEigenDevice<platform::CPUPlace>()) = | ||
| x_v * e_mid.reshape(Eigen::DSizes<int, 1>(e_mid.size())).pow(-beta); |
paddle/operators/lrn_op.cc
Outdated
| void operator()(const framework::ExecutionContext& ctx, | ||
| const framework::Tensor* x, const framework::Tensor* out, | ||
| const framework::Tensor* mid, framework::Tensor* x_g, | ||
| const framework::Tensor* out_g, int N, int C, int H, int W, |
Contributor
There was a problem hiding this comment.
For the input arguments, the same as above comments.
paddle/operators/lrn_op.cu
Outdated
| T alpha, T beta) { | ||
| int img_size = N * H * W; | ||
| int block_size = 1024; | ||
| int grid_size = (img_size + 1024 - 1) / 1024; |
Contributor
There was a problem hiding this comment.
用block_size替代line 69中的1024.
int grid_size = (img_size + block_size - 1) / block_size;
paddle/operators/lrn_op.cu
Outdated
|
|
||
| int input_size = N * H * W * C; | ||
| block_size = 1024; | ||
| grid_size = (input_size + 1024 - 1) / 1024; |
Contributor
There was a problem hiding this comment.
同上,用block_size替代line 79中的1024.
| } | ||
| if (index >= size) { | ||
| accum -= in[(index - size) * step] * in[(index - size) * step]; | ||
| } |
Contributor
There was a problem hiding this comment.
line 41和line 44中,可以利用寄存器先保存global内存中的数据,这样可以避免多次访问globle内存:
if (index < C) {
T val = in[index * step];
accum += val * val;
}
if (index >= size) {
T val = in[index - size) * step];
accum -= val * val;
}
paddle/operators/lrn_op.cu
Outdated
|
|
||
| const auto& stream = | ||
| reinterpret_cast<const platform::CUDADeviceContext&>(ctx.device_context()) | ||
| .stream(); |
Contributor
There was a problem hiding this comment.
paddle/operators/lrn_op.cu
Outdated
| int img_size = N * H * W; | ||
|
|
||
| int block_size = 1024; | ||
| int grid_size = (img_size + 1024 - 1) / 1024; |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fix #5066