Improve mobilenetv2 INT8 performance by using INT8 relu as post-op#17570
Conversation
| } | ||
| } else if (!force_fp32_output) { | ||
| if (fuse_relu) { | ||
| if (fuse_relu || fuse_brelu) { |
There was a problem hiding this comment.
How about using a variable to compute fuse_relu || fuse_brelu at first?
bool xxx = fuse_relu || fuse_brelu
There was a problem hiding this comment.
How about using a variable to compute
fuse_relu || fuse_breluat first?
bool xxx = fuse_relu || fuse_brelu
Yes. I agree with you. I will change. It is about using uint8_t or int8_t
| output_shift_scale, sum_scale, is_test); | ||
| conv_pd = ConvFwdPrimitiveDesc( | ||
| src_md, weights_md, dst_md, strides, paddings, mkldnn_engine, | ||
| fuse_relu || fuse_brelu, fuse_residual_conn, false, 0.0, |
There was a problem hiding this comment.
false, 0.0, please see 'It will be removed once the int8 is enabled.' #17130 (comment)
There was a problem hiding this comment.
false, 0.0, please see 'It will be removed once the int8 is enabled.' #17130 (comment)
Hi, @luotao1 I changed but do not merge and do not do further review. Previous PR worked, but after merging some updates, there seems some error. Let me test locally first.
There was a problem hiding this comment.
I just found CreatePostOps made problems. As in https://github.com/lidanqing-intel/Paddle/commits/develop-mobilnetv2, only last commit makes the accuracy 0. All previous commits are giving good accuracy. I will fix during the day, also will talk with @wojtuss.
a19b7b1 to
c02b869
Compare
test=develop change the "fuse_relu||fuse_brelu" to "unsigned_output" test=develop
3a606f4 to
50e9491
Compare
…nabled in mkldnn v0.18 test=develop
| mkldnn_engine, fuse_relu, fuse_residual_conn, false /*fuse_brelu*/, | ||
| 0.0 /*fuse_brelu_threshold*/, output_shift_scale, sum_scale, | ||
| is_test); | ||
| mkldnn_engine, fuse_relu || fuse_brelu /*fuse_relu*/, |
There was a problem hiding this comment.
470: fuse_relu || fuse_brelu-》unsigned_output and why notes /*fuse_relu*/
There was a problem hiding this comment.
@luotao1 Hi, as in comments line 460. When mkldnn v0.20 is enabled, the INT8 brelu post-ops will be supported, I will substitute with the code noted in /**/.
I think we should not substitute fuse_relu || fuse_brelu with unsigned_output, in fact, we should use fuse_relu only at this position. We use fuse_relu || fuse_brelu now because in INT8 inference and achieve good accuracy, just because relu and brelu both give unsigned in inference. But the correct way is the code in /**/.
There was a problem hiding this comment.
Got it, when we can update to mkldnn v0.20?
There was a problem hiding this comment.
Got it, when we can update to mkldnn v0.20?
mkldnn v0.20 code freeze is 7th of June, but they need time for testing and release date is by 28th of June.
| output_shift_scale, sum_scale, is_test); | ||
| conv_pd = ConvFwdPrimitiveDesc( | ||
| src_md, weights_md, dst_md, strides, paddings, mkldnn_engine, | ||
| fuse_relu || fuse_brelu /*fuse_relu*/, fuse_residual_conn, |
There was a problem hiding this comment.
477: fuse_relu || fuse_brelu-》unsigned_output and why notes /*fuse_relu*/
| conv_pd = ConvFwdPrimitiveDesc( | ||
| src_md, weights_md, dst_md, strides, paddings, mkldnn_engine, | ||
| fuse_relu || fuse_brelu /*fuse_relu*/, fuse_residual_conn, | ||
| false /*fuse_brelu*/, fuse_brelu_threshold, output_shift_scale, |
There was a problem hiding this comment.
do you need false /*fuse_brelu*/ again since you have fuse_relu || fuse_brelu before?
| 0.0 /*fuse_brelu_threshold*/, output_shift_scale, sum_scale, | ||
| is_test); | ||
| mkldnn_engine, fuse_relu || fuse_brelu /*fuse_relu*/, | ||
| fuse_residual_conn, false /*fuse_brelu*/, fuse_brelu_threshold, |
There was a problem hiding this comment.
do you need false /*fuse_brelu*/ again since you have fuse_relu || fuse_brelu before?
| src_md, weights_md, dst_md, strides, paddings, mkldnn_engine, | ||
| fuse_relu || fuse_brelu /*fuse_relu*/, fuse_residual_conn, | ||
| false /*fuse_brelu*/, fuse_brelu_threshold, output_shift_scale, | ||
| sum_scale, is_test); |
There was a problem hiding this comment.
Could you combine ConvFwdPrimitiveDesc two functions, if (bias) = false, pass bias_md=nullptr into it.
There was a problem hiding this comment.
Could you combine
ConvFwdPrimitiveDesctwo functions,if (bias) = false, passbias_md=nullptrinto it.
Ok, I will do like this.
There was a problem hiding this comment.
I merge your pr at first, and you can refine in next PR!
There was a problem hiding this comment.
I merge your pr at first, and you can refine in next PR!
Thank you! I start to refine now and will check the refined code through tests. I just update that it could be mkldnn v1.0 that will be used by baidu. Both support this op and both will be released after paddle release 1.5.
There was a problem hiding this comment.
Got it. But the refined code PR could be created before paddle release 1.5?
There was a problem hiding this comment.
Got it. But the refined code PR could be created before paddle release 1.5?
Yes, will be before 1.5
This PR improved mobilenetv2 INT8 performance with good accuracy.
We have to use relu instead of brelu as the post-op in INT8 conv2d kernel , because INT8 brelu as a post-op is not enabled in mkldnn v01.8. I add TODO and comments of what will be changed when v0.20 is enabled.
The performance of mobilenetv2 with this PR is follows:
test machine: Intel(R) Core(TM) i9-7940X CPU @ 3.10GHz, 14 Cores
paddle_num_threads 14
Performance of all the int8 models on CLX and SKX will be delivered by this Friday.