Enable the convolution/relu6(bounded_relu) fusion for FP32 on Intel platform. by guomingz · Pull Request #17130 · PaddlePaddle/Paddle

guomingz · 2019-04-28T03:20:51Z

Relu6 is the bottleneck op for Mobilenet-v2. As the mkldnn supports the conv/relu6 fusion, we implement it fusion via cpass way. Due to the int8 enabling for this fusion will be supported in MKLDNN v0.20, so this PR is focused on the fp32 optimization.

Below table shows the benchmark(FPS) which measured on skx-8180(28 cores)

Batch size	w/ fusion	w/o fusion
1	214.7	53.4
50	1219.727	137.280

test=develop

…he conv/relu6 fusion, we implement it fusion via cpass way. Due to the int8 enabling for this fusion will be supported in MKLDNN v0.20, so this PR is focused on the fp32 optimization. Below table shows the benchmark(FPS) which measured on skx-8180(28 cores) Batch size | with fusion | without fusion -- | -- | -- 1 | 214.7 | 53.4 50 | 1219.727 | 137.280 test=develop

test=develop

guomingz · 2019-04-28T08:15:17Z

@sfraczek @wojtuss @luotao1 please help to review this PR. Thank you in advance.

guomingz · 2019-04-28T12:00:53Z

it's weird that PR ran PR_CI (Paddle) for several times, even the previous one was successful, it will rerun again.

guomingz · 2019-04-29T14:37:10Z

hi @luotao1 , it seems that intel folks may on vacation or busy to review this PR. Is it possible to ask baidu folks review this PR? Thanks

paddle/fluid/platform/mkldnn_reuse.h

test=develop

guomingz · 2019-04-30T17:12:57Z

hi @luotao1 , the review from Intel Internal is done. Please have a review if needed.

guomingz · 2019-05-07T02:08:36Z

hi @luotao1 the review had done. Shall we need to invite Baidu folks to review it? thank you in advance.

test=develop

sfraczek

Please add the pass to paddle_pass_builder.cc:

guomingz · 2019-05-08T14:18:16Z

Please add the pass to paddle_pass_builder.cc:

Thank you for reminder. Just register this pass for the mkldnn engine.

Please have a review:)

test=develop

sfraczek

ok LGTM.
Just one suggestion: It would be good to add // at the end of the previous lines so the new conv_brelu pass is in a separate line.

guomingz · 2019-05-08T14:33:56Z

ok LGTM.
Just one suggestion: It would be good to add // at the end of the previous lines so the new conv_brelu pass is in a separate line.

Thanks for hint. I was confused the clang-format output without your hints. It's a subtle way to adjust the indentation. Let me update the code for better style.

sfraczek

LGTM

lidanqing-vv · 2019-05-10T12:14:19Z

paddle/fluid/operators/mkldnn/conv_mkldnn_op.cc


    PADDLE_ENFORCE(is_conv3d != true, "int8 does not support conv3d currently");
+    PADDLE_ENFORCE(fuse_brelu != true,
+                   "int8 does not support conv/relu6 fusion currently");


Why doesn't int8 support conv/relu6 fusion? It is to be done or not doable ?

Why doesn't int8 support conv/relu6 fusion? It is to be done or not doable ?

mkldnn v0.20 will support it.

@guomingz
Vadim Pirogov from MKL-DNN team has confirmed that INT8 support for conv2d + relu6 post-op is already present in 0.18.

Hi! @guomingz We just checked INT8 conv2d + relu6 fuse also works for MobileNetV2 and brings good accuracy and performance improvement. Thank you very much for this PR!

Hi! @guomingz We just checked INT8 conv2d + relu6 fuse also works for MobileNetV2 and brings good accuracy and performance improvement. Thank you very much for this PR!

hi @lidanqing-intel @wojtuss . Thanks for your comments. Actually, we already enabled the conv/relu6 int8 part in our local repo. But during the pre-ci, we found one potential issue that blocks the pre-ci. You may access the https://jira.devtools.intel.com/projects/MFDNN/issues/MFDNN-1552 for more details. That's the reason we only create the fp32 part PR.

@guomingz , maybe the INT8 conv+relu6 does not work with grouped convolutions? In MobileNetV2 there are no grouped convolutions and it works fine with INT8. If that is the case, you could exclude only grouped convolutions until 0.20 supports it.

@guomingz , maybe the INT8 conv+relu6 does not work with grouped convolutions? In MobileNetV2 there are no grouped convolutions and it works fine with INT8. If that is the case, you could exclude only grouped convolutions until 0.20 supports it.

As the paddlepaddle pre-ci has group-convolution testing which would block the rest checker. So we don't enable it at the current stage.

If you or your team consider this is as the must-have feature ,we may create another PR for int8 only? How do u think? Besides that, we need to allocate the resources for it .@hshen14 will explain the details.

@guomingz ,
Yes, this fuse is very important to us. Could this feature be turned off only for grouped convolutions? Could the pre-ci tests be updated to handle it in a different/both way?
@hshen14 , @luotao1 , what do you think about it?

sfraczek · 2019-05-13T09:57:35Z

I found one more thing. You haven't added a test for the fuse for completeness.

Paddle/paddle/fluid/framework/ir/CMakeLists.txt

Line 117 in 04bd413

    
           cc_test(test_conv_relu_mkldnn_fuse_pass SRCS mkldnn/conv_relu_mkldnn_fuse_pass_tester.cc DEPS conv_relu_mkldnn_fuse_pass)

wojtuss · 2019-05-14T10:34:13Z

@guomingz , @luotao1 , We would be grateful for the quick merge of this PR (after adding the test).
Thank you!

test=develop

guomingz · 2019-05-14T12:50:28Z

I found one more thing. You haven't added a test for the fuse for completeness.

Paddle/paddle/fluid/framework/ir/CMakeLists.txt

Line 117 in 04bd413

cc_test(test_conv_relu_mkldnn_fuse_pass SRCS mkldnn/conv_relu_mkldnn_fuse_pass_tester.cc DEPS conv_relu_mkldnn_fuse_pass)

hello @sfraczek . I just added the test_conv_brelu_mkldnn_fuse_pass case per your comments. Please have a review on that. Thanks.

luotao1 · 2019-05-15T06:32:05Z

paddle/fluid/operators/conv_op.cc

+  AddAttr<bool>("fuse_brelu",
+                "(bool, default false) Only used in mkldnn kernel")
+      .SetDefault(false);
+  AddAttr<float>("fuse_brelu_threshold",


Why need fuse_brelu_threshold?

Why need fuse_brelu_threshold?

It's a parameter of bounded ReLU. 6 is the typical value but it may vary.

@luotao1 , relu6 is a case of bounded_relu op in which 6 is the default value of the threshold parameter.

Hi, when will this PR be merged ? :)

@qingqing01 How do you see this PR add two attributes fuse_brelu and fuse_brelu_threshold?

wojtuss · 2019-05-20T08:25:08Z

@guomingz , @luotao1 , the license/cla status is showing that CLA was not signed yet by a Contributor. It was already green before. Why is it pending?

luotao1 · 2019-05-20T08:34:16Z

CLA has some problem several hours ago, you can close and reopen this PR again. Then CLA will be triggered successfully.

wojtuss · 2019-05-20T10:28:06Z

@guomingz , could you please reopen this PR so the license/cla check could pass?

@luotao1 , are there any other requirements for this PR to be merged?

luotao1 · 2019-05-20T10:45:05Z

Does the attr fuse_brelu_threshold change anywhere? I don't see how to change it? Do you use a pass to change it?

guomingz · 2019-05-20T13:39:40Z

Does the attr fuse_brelu_threshold change anywhere? I don't see how to change it? Do you use a pass to change it?

it's up to the topology itself. If someday one invents another topology, which use another threshold for bounded relu, it will change accordingly. What i mean is this field is not modified by the users, it's just attribute.

guomingz · 2019-05-20T13:40:37Z

@guomingz , could you please reopen this PR so the license/cla check could pass?

@luotao1 , are there any other requirements for this PR to be merged?

hi @wojtuss , i've reopened this PR.

wojtuss · 2019-05-21T06:22:33Z

@luotao1 , the threshold attribute is important to the MKL-DNN bounded_relu op as well as to the PaddlePaddle relu6 op (the relu6 has it as well). If we dropped the attribute after fuse, it might happen that for some other value of the threshold the fuse would not be done properly.

luotao1 · 2019-05-21T06:29:43Z

paddle/fluid/operators/mkldnn/conv_mkldnn_op.cc

+
+        conv_pd = ConvFwdPrimitiveDesc(
+            src_md, weights_md, bias_md, dst_md, strides, paddings,
+            mkldnn_engine, fuse_relu, fuse_residual_conn, false, 0.0,


false means fuse_brelu?

0.0 means fuse_brelu_threshold?
It's hard to understand.

They're placeholder as int8 fusion was not enabling if you read the previous conversation. It will be removed once the int8 is enabled.

You can write

false /*xxx */, 0.0 /*xxx*/

You can write

false /*xxx */, 0.0 /*xxx*/

updated the code!

Got it. But the default of fuse_brelu_threshold is 6.0, why here is 0.0?

It will be removed once the int8 is enabled.

@guomingz @lidanqing-intel I do not see it be removed in c9becf4

Got it. But the default of fuse_brelu_threshold is 6.0, why here is 0.0?

0.0 implys the int8 brelu fusion is not enabled as the brelu flag set to false. 6.0 may cause potential misleading on brelu fusion status.

luotao1 · 2019-05-21T06:30:44Z

paddle/fluid/operators/mkldnn/conv_mkldnn_op.cc

    platform::ConvMKLDNNHandler::AppendKey(
        &key, src_tz, weights_tz, strides, paddings, dilations, groups, src_dt,
-        input->format(), fuse_relu, fuse_residual_conn,
+        input->format(), fuse_relu, fuse_residual_conn, false,


What's the meaning of false? It's hard to understand.

wojtuss · 2019-05-21T06:56:10Z

paddle/fluid/operators/mkldnn/conv_mkldnn_op.cc

-                                 output_shift_scale, sum_scale, is_test);
+        conv_pd = ConvFwdPrimitiveDesc(src_md, weights_md, dst_md, strides,
+                                       paddings, mkldnn_engine, fuse_relu,
+                                       fuse_residual_conn, false, 0.0,


wojtuss · 2019-05-21T06:57:47Z

paddle/fluid/operators/mkldnn/conv_transpose_mkldnn_op.cc

      conv_transpose_pd = handler.AcquireConvolutionPrimitiveDescriptor(
          src_md, weights_md, bias_md, dst_md, strides, paddings, mkldnn_engine,
-          fuse_relu, false, fwd_prop_kind);
+          fuse_relu, false, false, 0.0, fwd_prop_kind);


wojtuss · 2019-05-21T06:57:54Z

paddle/fluid/operators/mkldnn/conv_transpose_mkldnn_op.cc

      conv_transpose_pd = handler.AcquireConvolutionPrimitiveDescriptor(
          src_md, weights_md, boost::none, dst_md, strides, paddings,
-          mkldnn_engine, fuse_relu, false, fwd_prop_kind);
+          mkldnn_engine, fuse_relu, false, false, 0.0, fwd_prop_kind);


luotao1 · 2019-05-21T07:17:32Z

please add mobilenet v2 ut said in #17468 (comment)

guomingz · 2019-05-21T07:29:34Z

please add mobilenet v2 ut said in #17468 (comment)

we've added the feature-level ut yet. How about ask the Poland team to raise the mobilenet-v2 ut in another PR? Let's this PR focus on feature itself.

wojtuss · 2019-05-21T08:07:21Z

@luotao1 , @guomingz , could you please reply also to #17130 (comment)
Without enabling this fuse for INT8, the fuse won't help us with MobileNetV2. The fuse works for INT8 conv2d. Potentially, it does not work with grouped convolutions only.

Let the parameter definition embedded into the code. That's will make the code easy to understand. test=develop

luotao1 · 2019-05-22T02:47:42Z

@guomingz @hshen14 please answer #17130 (comment)

hshen14 · 2019-05-22T02:57:54Z

This PR is to support Conv and BoundedReLU fusion in FP32 with the required pass-based graph optimization and op-level unit test. If you encounter the issue when applying the fusion in INT8, I would suggest you could create another PR per your need and add some comments to explain the limitation.

luotao1

LGTM

guomingz added 3 commits April 28, 2019 11:04

Fix the format issue

937271e

test=develop

Add the missing nolint comments.

625c8ce

test=develop

luotao1 added the Intel label Apr 28, 2019

guomingz mentioned this pull request Apr 29, 2019

PR_CI(PADDLE) is not stable. #17150

Closed

$sfraczek$

sfraczek suggested changes Apr 30, 2019

View reviewed changes

paddle/fluid/platform/mkldnn_reuse.h Outdated Show resolved Hide resolved

paddle/fluid/platform/mkldnn_reuse.h Outdated Show resolved Hide resolved

Fix the typos.

69d6a9f

test=develop

$sfraczek$

sfraczek approved these changes Apr 30, 2019

View reviewed changes

Register the conv_brelu_mkldnn_fuse_pass for the MKLDNN engine.

d6f305b

test=develop

$sfraczek$

sfraczek suggested changes May 8, 2019

View reviewed changes

Adjust the indentation.

2ef8f88

test=develop

$sfraczek$

sfraczek previously approved these changes May 8, 2019

View reviewed changes

guomingz dismissed sfraczek’s stale review via 2ef8f88 May 8, 2019 14:35

$sfraczek$

sfraczek previously approved these changes May 9, 2019

View reviewed changes

lidanqing-vv reviewed May 10, 2019

View reviewed changes

luotao1 requested a review from a user May 14, 2019 10:43

Add the test_conv_brelu_mkldnn_fuse_pass case.

a312f0b

test=develop

guomingz dismissed sfraczek’s stale review via a312f0b May 14, 2019 12:49

$sfraczek$

sfraczek previously approved these changes May 14, 2019

View reviewed changes

ghost added this to the v1.5 for Intel milestone May 14, 2019

luotao1 reviewed May 15, 2019

View reviewed changes

lidanqing-vv mentioned this pull request May 17, 2019

Enabling resnet101, vgg16, vgg19 INT8v2 model tests #17468

Merged

guomingz closed this May 20, 2019

guomingz reopened this May 20, 2019

luotao1 reviewed May 21, 2019

View reviewed changes

wojtuss reviewed May 21, 2019

View reviewed changes

lidanqing-vv mentioned this pull request May 21, 2019

[WIP] Add INT8 fuse+relu6 support and mobilenetv2 INT8 test based on PR17130 #17546

Closed

Slightly update the code per Baidu comments.

e4d6b1b

Let the parameter definition embedded into the code. That's will make the code easy to understand. test=develop

guomingz dismissed sfraczek’s stale review via e4d6b1b May 21, 2019 15:40

luotao1 approved these changes May 22, 2019

View reviewed changes

luotao1 merged commit 2281ebf into PaddlePaddle:develop May 22, 2019

luotao1 mentioned this pull request May 22, 2019

Improve mobilenetv2 INT8 performance by using INT8 relu as post-op #17570

Merged

Conversation

guomingz commented Apr 28, 2019

Uh oh!

guomingz commented Apr 28, 2019

Uh oh!

guomingz commented Apr 28, 2019

Uh oh!

guomingz commented Apr 29, 2019

Uh oh!

Uh oh!

Uh oh!

guomingz commented Apr 30, 2019

Uh oh!

guomingz commented May 7, 2019

Uh oh!

sfraczek left a comment

Choose a reason for hiding this comment

Uh oh!

guomingz commented May 8, 2019

Uh oh!

sfraczek left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

guomingz commented May 8, 2019

Uh oh!

sfraczek left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sfraczek commented May 13, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wojtuss commented May 14, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

guomingz commented May 14, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wojtuss commented May 20, 2019

Uh oh!

luotao1 commented May 20, 2019

Uh oh!

wojtuss commented May 20, 2019

Uh oh!

luotao1 commented May 20, 2019

Uh oh!

guomingz commented May 20, 2019

Uh oh!

guomingz commented May 20, 2019

Uh oh!

wojtuss commented May 21, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

$@sfraczek$ sfraczek left a comment

$@sfraczek$ sfraczek left a comment •

edited

Loading

$@sfraczek$ sfraczek left a comment

sfraczek commented May 13, 2019 •

edited

Loading

wojtuss commented May 14, 2019 •

edited

Loading

guomingz May 22, 2019 •

edited

Loading

luotao1 commented May 21, 2019 •

edited

Loading

wojtuss commented May 21, 2019 •

edited

Loading