Skip to content

Improve mobilenetv2 INT8 performance by using INT8 relu as post-op#17570

Merged
luotao1 merged 5 commits intoPaddlePaddle:developfrom
lidanqing-vv:int8-mobilenetv2-updated-17130
May 28, 2019
Merged

Improve mobilenetv2 INT8 performance by using INT8 relu as post-op#17570
luotao1 merged 5 commits intoPaddlePaddle:developfrom
lidanqing-vv:int8-mobilenetv2-updated-17130

Conversation

@lidanqing-vv
Copy link
Contributor

@lidanqing-vv lidanqing-vv commented May 22, 2019

This PR improved mobilenetv2 INT8 performance with good accuracy.

We have to use relu instead of brelu as the post-op in INT8 conv2d kernel , because INT8 brelu as a post-op is not enabled in mkldnn v01.8. I add TODO and comments of what will be changed when v0.20 is enabled.

The performance of mobilenetv2 with this PR is follows:

INT8/FP32 Top1 Accuracy Performance
FP32 71.90% X
INT8 71.43% 1.92 X

test machine: Intel(R) Core(TM) i9-7940X CPU @ 3.10GHz, 14 Cores
paddle_num_threads 14

Performance of all the int8 models on CLX and SKX will be delivered by this Friday.

@lidanqing-vv
Copy link
Contributor Author

Because I wanted to remove the conflicted commits in PR17546 and I did something like renaming. Somehow previous PR was closed automatically.
I am refering to previous PR's reviews and commit new changes according to that PR17546 reviews.

}
} else if (!force_fp32_output) {
if (fuse_relu) {
if (fuse_relu || fuse_brelu) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about using a variable to compute fuse_relu || fuse_brelu at first?
bool xxx = fuse_relu || fuse_brelu

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about using a variable to compute fuse_relu || fuse_brelu at first?
bool xxx = fuse_relu || fuse_brelu

Yes. I agree with you. I will change. It is about using uint8_t or int8_t

output_shift_scale, sum_scale, is_test);
conv_pd = ConvFwdPrimitiveDesc(
src_md, weights_md, dst_md, strides, paddings, mkldnn_engine,
fuse_relu || fuse_brelu, fuse_residual_conn, false, 0.0,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

false, 0.0, please see 'It will be removed once the int8 is enabled.' #17130 (comment)

Copy link
Contributor Author

@lidanqing-vv lidanqing-vv May 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

false, 0.0, please see 'It will be removed once the int8 is enabled.' #17130 (comment)

Hi, @luotao1 I changed but do not merge and do not do further review. Previous PR worked, but after merging some updates, there seems some error. Let me test locally first.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just found CreatePostOps made problems. As in https://github.com/lidanqing-intel/Paddle/commits/develop-mobilnetv2, only last commit makes the accuracy 0. All previous commits are giving good accuracy. I will fix during the day, also will talk with @wojtuss.

@lidanqing-vv lidanqing-vv changed the title Add INT8 fuse+relu6 support and mobilenetv2 INT8 test [WIP] Add INT8 fuse+relu6 support and mobilenetv2 INT8 test May 22, 2019
@lidanqing-vv lidanqing-vv force-pushed the int8-mobilenetv2-updated-17130 branch from a19b7b1 to c02b869 Compare May 22, 2019 13:36
test=develop

change the "fuse_relu||fuse_brelu" to "unsigned_output"
test=develop
@lidanqing-vv lidanqing-vv force-pushed the int8-mobilenetv2-updated-17130 branch from 3a606f4 to 50e9491 Compare May 22, 2019 21:34
@lidanqing-vv lidanqing-vv changed the title [WIP] Add INT8 fuse+relu6 support and mobilenetv2 INT8 test Improve mobilenetv2 INT8 performance by using INT8 relu as post-op May 27, 2019
@wojtuss wojtuss added this to the v1.5 for Intel milestone May 28, 2019
mkldnn_engine, fuse_relu, fuse_residual_conn, false /*fuse_brelu*/,
0.0 /*fuse_brelu_threshold*/, output_shift_scale, sum_scale,
is_test);
mkldnn_engine, fuse_relu || fuse_brelu /*fuse_relu*/,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

470: fuse_relu || fuse_brelu-》unsigned_output and why notes /*fuse_relu*/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@luotao1 Hi, as in comments line 460. When mkldnn v0.20 is enabled, the INT8 brelu post-ops will be supported, I will substitute with the code noted in /**/.

I think we should not substitute fuse_relu || fuse_brelu with unsigned_output, in fact, we should use fuse_relu only at this position. We use fuse_relu || fuse_brelu now because in INT8 inference and achieve good accuracy, just because relu and brelu both give unsigned in inference. But the correct way is the code in /**/.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, when we can update to mkldnn v0.20?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, when we can update to mkldnn v0.20?

mkldnn v0.20 code freeze is 7th of June, but they need time for testing and release date is by 28th of June.

output_shift_scale, sum_scale, is_test);
conv_pd = ConvFwdPrimitiveDesc(
src_md, weights_md, dst_md, strides, paddings, mkldnn_engine,
fuse_relu || fuse_brelu /*fuse_relu*/, fuse_residual_conn,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

477: fuse_relu || fuse_brelu-》unsigned_output and why notes /*fuse_relu*/

conv_pd = ConvFwdPrimitiveDesc(
src_md, weights_md, dst_md, strides, paddings, mkldnn_engine,
fuse_relu || fuse_brelu /*fuse_relu*/, fuse_residual_conn,
false /*fuse_brelu*/, fuse_brelu_threshold, output_shift_scale,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you need false /*fuse_brelu*/ again since you have fuse_relu || fuse_brelu before?

0.0 /*fuse_brelu_threshold*/, output_shift_scale, sum_scale,
is_test);
mkldnn_engine, fuse_relu || fuse_brelu /*fuse_relu*/,
fuse_residual_conn, false /*fuse_brelu*/, fuse_brelu_threshold,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you need false /*fuse_brelu*/ again since you have fuse_relu || fuse_brelu before?

src_md, weights_md, dst_md, strides, paddings, mkldnn_engine,
fuse_relu || fuse_brelu /*fuse_relu*/, fuse_residual_conn,
false /*fuse_brelu*/, fuse_brelu_threshold, output_shift_scale,
sum_scale, is_test);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you combine ConvFwdPrimitiveDesc two functions, if (bias) = false, pass bias_md=nullptr into it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you combine ConvFwdPrimitiveDesc two functions, if (bias) = false, pass bias_md=nullptr into it.

Ok, I will do like this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I merge your pr at first, and you can refine in next PR!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I merge your pr at first, and you can refine in next PR!

Thank you! I start to refine now and will check the refined code through tests. I just update that it could be mkldnn v1.0 that will be used by baidu. Both support this op and both will be released after paddle release 1.5.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. But the refined code PR could be created before paddle release 1.5?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. But the refined code PR could be created before paddle release 1.5?

Yes, will be before 1.5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants