-
Notifications
You must be signed in to change notification settings - Fork 5.9k
MKLDNN conv2d kernel added #8451
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
721cf36 to
6216666
Compare
paddle/fluid/framework/op_registry.h
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should not add ; at the end, we want user use a macro like
SOME_MACRO();There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will pybind USE_OP_DEVICE_KERNEL(XXX, CUDNN) automatically in #8590, in order to make operators/CMakelists.txt much cleaner.
Then, only one sentence op_library(pool_op DEPS pooling) will pybind CPU/CUDA/CUDNN/MKLDNN all device kernel.
Thus:
- how about use
conv_mkldnn_op.cclikesconv_cudnn_op.cu.ccinstead ofconv_op.mn.cc? - refine these codes after pybind USE_OP_DEVICE_KERNEL(XXX, CUDNN) automatically #8590 finished?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If #8590 doesn't be finished before your small PRs, you can merge your changes at first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#8590 is finished and merged now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First of all, as synced with @pzelazko-intel, we will break this PR into some smaller ones.
As for current code, we also had a discussion.
The most important information is that the current implementation may not be the best efficient one, since the format is fixed as nchw and the transform functions is still under developing.
If anything is missing, @pzelazko-intel please point out.
paddle/fluid/operators/conv_op.cc
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we only make MKLDNN library enabled.
As synced with @pzelazko-intel, we would enable MKLDNN layout next time.
Then we would considerate transform function as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add "TODO" in codes for reminding? Including:
- enable MKLDNN layout
- enable groups
- something more.
Besides, could you not cover the previous commit next time, since we could not find the difference after your updating?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I'm going to add TODOs.
I've covered previous commits, because I wanted commit history to be clear.
Would I have opportunity to squash commits when merging?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we can squash commits when merging your PR.
paddle/fluid/operators/conv_op.mn.cc
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will enable groups later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
paddle/fluid/operators/conv_op.mn.cc
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The format is fixed as nchw, should support more. We will come back later.
paddle/fluid/operators/conv_op.mn.cc
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This Compute function is too long. We can think about breaking the code to smaller functions in mkldnn_herlper like cudnn_helper did.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please look at the answer below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
paddle/fluid/operators/conv_op.mn.cc
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only conv_pd is saved to context, I think we can save more, like engine, primitives, stream, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to refactor it as soon as we know how to handle data transferring between forward and backward in parallel mode (ParallelDo OP).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
refactoring is done
paddle/fluid/operators/conv_op.mn.cc
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not so sure if this output name is unique, especially under the scope filed.
I had a discussion with @QiJune before and did not get a formal conclusion at that time.
Maybe Baidu friends can give a better answer. This is just my concern and reminder.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that conv_cudnn_op.cu.cc also use Output.
Paddle/paddle/fluid/operators/conv_cudnn_op.cu.cc
Lines 35 to 42 in fc37482
| class CUDNNConvOpKernel : public framework::OpKernel<T> { | |
| public: | |
| void Compute(const framework::ExecutionContext& ctx) const override { | |
| PADDLE_ENFORCE(platform::is_gpu_place(ctx.GetPlace()), | |
| "It must use CUDAPlace."); | |
| auto* input = ctx.Input<Tensor>("Input"); | |
| auto* filter = ctx.Input<Tensor>("Filter"); | |
| auto* output = ctx.Output<Tensor>("Output"); |
Thus, why
conv_mkldnn_op.cc should not use the same output name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my perspective, it's not appropriate we remove this TODO.
I suggest we remain it and let the owner to fix it, since we may miss some background information.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not remove this line, it's been moved up.
7882699 to
edcf89a
Compare
|
Now in this PR I'm introducing only conv2d OP MKLDNN kernel. |
edcf89a to
0cd6662
Compare
|
LGTM on following files:
@jacquesqiao Can you help review following files:
@QiJune Can you help review following files:
@tensor-tang Can you help review following files:
|
9a3ecb8 to
a4ab82d
Compare
tensor-tang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My ARs for conv_mkldnn_op.cc and conv_op.cc
Just a reminder, as discussed before, the Compute functions are too large. Please do not forget it, since it's pretty important.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dilation is also supported with MKLDNN conv, you can add one more TODO later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MKLDNN doesn't support dilation in convolution yet
I think this error message is not clear enough, as we know MKL-DNN itself supports groups.
This message would misleading Paddle team and users.
It's that we did not enable it on paddle yet.
As well as the below dilations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
a4ab82d to
8090479
Compare
8090479 to
0e3f110
Compare
paddle/fluid/framework/operator.cc
Outdated
| return static_cast<proto::VarType::Type>(data_type); | ||
| } | ||
|
|
||
| bool OperatorWithKernel::CanCUDNNBeUsed(const ExecutionContext& ctx) const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's better to make CanCUDNNBeUsed and CanMKLDNNBeUsed two global functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jacquesqiao where would you propose to place these functions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pzelazko-intel according to this document https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/mkl/mkldnn_fluid.md#mkldnn_helper, we can put the interface there and add a mkldnn_helper.cc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
fb95caa to
dad7a02
Compare
|
Please answer reviewers’ every comment. If you are to follow the comment, please write “Done”; please give a reason otherwise. See code-review. |
|
@luotao1 refactoring has been completed. |
|
@luotao1 DeviceContext part looks good to me. |
|
framework part looks good to me, thanks! @pzelazko-intel |
| device_contexts_.emplace(places[i], | ||
| new platform::CPUDeviceContext( | ||
| boost::get<platform::CPUPlace>(places[i]))); | ||
| #endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly, I have a little question here.
When PADDLE_WITH_MKLDNN is enabled, we will not have platform::CPUDeviceContext anyway .
Is that fine with you @jacquesqiao @QiJune ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tensor-tang It seems that if PADDLE_WITH_GPU is enabled, there will be both CPUDeviceContext and CUDADeviceContext.
So, I think MKLDNNDeviceContext and CPUDeviceContext should be coexist.
Could you provide an example that has a two FC in the network, which one FC is CPU, and the other FC is MKLDNN? Just mnist demo will be fine.
We can see if "MKLDNN" is compatible with "CPU".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Firstly, MKLDNNDeviceContext is inherited from CPUDeviceContext now
https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/fluid/platform/device_context.h#L113. So functionally, it can pass the CI on this version.
But from my perspective, I thought CPUDeviceContext should be always available no matter which third-party is added, not only MKLDNN here. Since we can not guarantee all the ops paddle supported are also supported by the third-party library. If this third-party context is not inherited from cpu context, it would be a problem.
So I just want to hear you voice. I am not sure is that proper.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think so.
Since MKLDNNDeviceContext is inherited from CPUDeviceContext, there will be no problem.
I have not think out a better choice, we can move on first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, Thx
tensor-tang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for MKLDNN part
luotao1
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for @pzelazko-intel work, and thanks @jacquesqiao @QiJune @tensor-tang review.
MKLDNN conv2d and pool2d OP kernels can be enabled with
use_mkldnnOP flag - just like currenly presentuse_cudnnflag. It's set toTrueby default.use_cudnnflag has figher priority.Beside unit tests, we validated these kernels by running training and interference on MNIST dataset and comparing results with caffe library.