improve performance of DepthwiseConv(NHWC)#31677
improve performance of DepthwiseConv(NHWC)#31677zhangting2020 merged 11 commits intoPaddlePaddle:developfrom
Conversation
|
Thanks for your contribution! |
17e68c2 to
19dbc23
Compare
db36517 to
95b5d89
Compare
a4db1c2 to
454a0f8
Compare
acbfa82 to
14d16a7
Compare
14d16a7 to
50b5508
Compare
|
Test the above cases on V100:
|
There was a problem hiding this comment.
The original code here seems to cause an error when input_channels is not equal to the output_channels. We will add a case in unit tests.
There was a problem hiding this comment.
Yes, it should be input_channels here.
There was a problem hiding this comment.
Could you describe why this change was made?
There was a problem hiding this comment.
To improve gld_efficiency, filter_data was transposed from CHW to HWC in this PR. So weight in (h_f, w_f, c_out) should be const T* weight = filter_data + weight_offset * output_channels + c_out, in which weight_offset equals h_f * filter_width + w_f.
50b5508 to
f1bca11
Compare
f1bca11 to
29bb5a9
Compare


PR types
Performance optimizationPR changes
OPsDescribe
improve performance of DepthwiseConv(NHWC)Forward of DepthwiseConv(NHWC)
Backward of DepthwiseConv(NHWC)