Skip to content

Enable the convolution/relu6(bounded_relu) fusion for FP32 on Intel platform.#17130

Merged
luotao1 merged 8 commits intoPaddlePaddle:developfrom
guomingz:conv_relu6_fusion_fp32
May 22, 2019
Merged

Enable the convolution/relu6(bounded_relu) fusion for FP32 on Intel platform.#17130
luotao1 merged 8 commits intoPaddlePaddle:developfrom
guomingz:conv_relu6_fusion_fp32

Conversation

@guomingz
Copy link
Contributor

Relu6 is the bottleneck op for Mobilenet-v2. As the mkldnn supports the conv/relu6 fusion, we implement it fusion via cpass way. Due to the int8 enabling for this fusion will be supported in MKLDNN v0.20, so this PR is focused on the fp32 optimization.

Below table shows the benchmark(FPS) which measured on skx-8180(28 cores)

Batch size w/ fusion w/o fusion
1 214.7 53.4
50 1219.727 137.280

test=develop

…he conv/relu6 fusion, we implement it fusion via cpass way. Due to the int8 enabling for this fusion will be supported in MKLDNN v0.20, so this PR is focused on the fp32 optimization.

Below table shows the benchmark(FPS) which measured on skx-8180(28 cores)
Batch size | with fusion | without fusion
-- | -- | --
1 | 214.7 | 53.4
50 | 1219.727 | 137.280

test=develop
test=develop
@luotao1 luotao1 added the Intel label Apr 28, 2019
@guomingz
Copy link
Contributor Author

@sfraczek @wojtuss @luotao1 please help to review this PR. Thank you in advance.

@guomingz
Copy link
Contributor Author

it's weird that PR ran PR_CI (Paddle) for several times, even the previous one was successful, it will rerun again.

@guomingz
Copy link
Contributor Author

hi @luotao1 , it seems that intel folks may on vacation or busy to review this PR. Is it possible to ask baidu folks review this PR? Thanks

test=develop
@guomingz
Copy link
Contributor Author

hi @luotao1 , the review from Intel Internal is done. Please have a review if needed.

@guomingz
Copy link
Contributor Author

guomingz commented May 7, 2019

hi @luotao1 the review had done. Shall we need to invite Baidu folks to review it? thank you in advance.

Copy link

@sfraczek sfraczek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add the pass to paddle_pass_builder.cc:
image

@guomingz
Copy link
Contributor Author

guomingz commented May 8, 2019

Please add the pass to paddle_pass_builder.cc:
image

Thank you for reminder. Just register this pass for the mkldnn engine.

Please have a review:)

sfraczek
sfraczek previously approved these changes May 8, 2019
Copy link

@sfraczek sfraczek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok LGTM.
Just one suggestion: It would be good to add // at the end of the previous lines so the new conv_brelu pass is in a separate line.

@guomingz
Copy link
Contributor Author

guomingz commented May 8, 2019

ok LGTM.
Just one suggestion: It would be good to add // at the end of the previous lines so the new conv_brelu pass is in a separate line.

Thanks for hint. I was confused the clang-format output without your hints. It's a subtle way to adjust the indentation. Let me update the code for better style.

sfraczek
sfraczek previously approved these changes May 9, 2019
Copy link

@sfraczek sfraczek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM


PADDLE_ENFORCE(is_conv3d != true, "int8 does not support conv3d currently");
PADDLE_ENFORCE(fuse_brelu != true,
"int8 does not support conv/relu6 fusion currently");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why doesn't int8 support conv/relu6 fusion? It is to be done or not doable ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why doesn't int8 support conv/relu6 fusion? It is to be done or not doable ?

mkldnn v0.20 will support it.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@guomingz
Vadim Pirogov from MKL-DNN team has confirmed that INT8 support for conv2d + relu6 post-op is already present in 0.18.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi! @guomingz We just checked INT8 conv2d + relu6 fuse also works for MobileNetV2 and brings good accuracy and performance improvement. Thank you very much for this PR!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi! @guomingz We just checked INT8 conv2d + relu6 fuse also works for MobileNetV2 and brings good accuracy and performance improvement. Thank you very much for this PR!

hi @lidanqing-intel @wojtuss . Thanks for your comments. Actually, we already enabled the conv/relu6 int8 part in our local repo. But during the pre-ci, we found one potential issue that blocks the pre-ci. You may access the https://jira.devtools.intel.com/projects/MFDNN/issues/MFDNN-1552 for more details. That's the reason we only create the fp32 part PR.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@guomingz , maybe the INT8 conv+relu6 does not work with grouped convolutions? In MobileNetV2 there are no grouped convolutions and it works fine with INT8. If that is the case, you could exclude only grouped convolutions until 0.20 supports it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@guomingz , maybe the INT8 conv+relu6 does not work with grouped convolutions? In MobileNetV2 there are no grouped convolutions and it works fine with INT8. If that is the case, you could exclude only grouped convolutions until 0.20 supports it.

As the paddlepaddle pre-ci has group-convolution testing which would block the rest checker. So we don't enable it at the current stage.

If you or your team consider this is as the must-have feature ,we may create another PR for int8 only? How do u think? Besides that, we need to allocate the resources for it .@hshen14 will explain the details.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@guomingz ,
Yes, this fuse is very important to us. Could this feature be turned off only for grouped convolutions? Could the pre-ci tests be updated to handle it in a different/both way?
@hshen14 , @luotao1 , what do you think about it?

@sfraczek
Copy link

sfraczek commented May 13, 2019

I found one more thing. You haven't added a test for the fuse for completeness.

cc_test(test_conv_relu_mkldnn_fuse_pass SRCS mkldnn/conv_relu_mkldnn_fuse_pass_tester.cc DEPS conv_relu_mkldnn_fuse_pass)

@wojtuss
Copy link

wojtuss commented May 14, 2019

@guomingz , @luotao1 , We would be grateful for the quick merge of this PR (after adding the test).
Thank you!

@luotao1 luotao1 requested a review from a user May 14, 2019 10:43
@guomingz
Copy link
Contributor Author

I found one more thing. You haven't added a test for the fuse for completeness.

cc_test(test_conv_relu_mkldnn_fuse_pass SRCS mkldnn/conv_relu_mkldnn_fuse_pass_tester.cc DEPS conv_relu_mkldnn_fuse_pass)

hello @sfraczek . I just added the test_conv_brelu_mkldnn_fuse_pass case per your comments. Please have a review on that. Thanks.

sfraczek
sfraczek previously approved these changes May 14, 2019
@ghost ghost added this to the v1.5 for Intel milestone May 14, 2019
AddAttr<bool>("fuse_brelu",
"(bool, default false) Only used in mkldnn kernel")
.SetDefault(false);
AddAttr<float>("fuse_brelu_threshold",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why need fuse_brelu_threshold?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why need fuse_brelu_threshold?

It's a parameter of bounded ReLU. 6 is the typical value but it may vary.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@luotao1 , relu6 is a case of bounded_relu op in which 6 is the default value of the threshold parameter.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, when will this PR be merged ? :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@qingqing01 How do you see this PR add two attributes fuse_brelu and fuse_brelu_threshold?

@wojtuss
Copy link

wojtuss commented May 20, 2019

@guomingz , @luotao1 , the license/cla status is showing that CLA was not signed yet by a Contributor. It was already green before. Why is it pending?

@luotao1
Copy link
Contributor

luotao1 commented May 20, 2019

CLA has some problem several hours ago, you can close and reopen this PR again. Then CLA will be triggered successfully.

@wojtuss
Copy link

wojtuss commented May 20, 2019

@guomingz , could you please reopen this PR so the license/cla check could pass?

@luotao1 , are there any other requirements for this PR to be merged?

@luotao1
Copy link
Contributor

luotao1 commented May 20, 2019

Does the attr fuse_brelu_threshold change anywhere? I don't see how to change it? Do you use a pass to change it?

@guomingz
Copy link
Contributor Author

Does the attr fuse_brelu_threshold change anywhere? I don't see how to change it? Do you use a pass to change it?

it's up to the topology itself. If someday one invents another topology, which use another threshold for bounded relu, it will change accordingly. What i mean is this field is not modified by the users, it's just attribute.

@guomingz guomingz closed this May 20, 2019
@guomingz guomingz reopened this May 20, 2019
@guomingz
Copy link
Contributor Author

@guomingz , could you please reopen this PR so the license/cla check could pass?

@luotao1 , are there any other requirements for this PR to be merged?

hi @wojtuss , i've reopened this PR.

@wojtuss
Copy link

wojtuss commented May 21, 2019

@luotao1 , the threshold attribute is important to the MKL-DNN bounded_relu op as well as to the PaddlePaddle relu6 op (the relu6 has it as well). If we dropped the attribute after fuse, it might happen that for some other value of the threshold the fuse would not be done properly.


conv_pd = ConvFwdPrimitiveDesc(
src_md, weights_md, bias_md, dst_md, strides, paddings,
mkldnn_engine, fuse_relu, fuse_residual_conn, false, 0.0,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • false means fuse_brelu?
  • 0.0 means fuse_brelu_threshold?
    It's hard to understand.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They're placeholder as int8 fusion was not enabling if you read the previous conversation. It will be removed once the int8 is enabled.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can write

false /*xxx */, 0.0 /*xxx*/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can write

false /*xxx */, 0.0 /*xxx*/

updated the code!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. But the default of fuse_brelu_threshold is 6.0, why here is 0.0?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be removed once the int8 is enabled.

@guomingz @lidanqing-intel I do not see it be removed in c9becf4

Copy link
Contributor Author

@guomingz guomingz May 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. But the default of fuse_brelu_threshold is 6.0, why here is 0.0?

0.0 implys the int8 brelu fusion is not enabled as the brelu flag set to false. 6.0 may cause potential misleading on brelu fusion status.

platform::ConvMKLDNNHandler::AppendKey(
&key, src_tz, weights_tz, strides, paddings, dilations, groups, src_dt,
input->format(), fuse_relu, fuse_residual_conn,
input->format(), fuse_relu, fuse_residual_conn, false,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the meaning of false? It's hard to understand.

output_shift_scale, sum_scale, is_test);
conv_pd = ConvFwdPrimitiveDesc(src_md, weights_md, dst_md, strides,
paddings, mkldnn_engine, fuse_relu,
fuse_residual_conn, false, 0.0,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

conv_transpose_pd = handler.AcquireConvolutionPrimitiveDescriptor(
src_md, weights_md, bias_md, dst_md, strides, paddings, mkldnn_engine,
fuse_relu, false, fwd_prop_kind);
fuse_relu, false, false, 0.0, fwd_prop_kind);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

conv_transpose_pd = handler.AcquireConvolutionPrimitiveDescriptor(
src_md, weights_md, boost::none, dst_md, strides, paddings,
mkldnn_engine, fuse_relu, false, fwd_prop_kind);
mkldnn_engine, fuse_relu, false, false, 0.0, fwd_prop_kind);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@luotao1
Copy link
Contributor

luotao1 commented May 21, 2019

please add mobilenet v2 ut said in #17468 (comment)

@guomingz
Copy link
Contributor Author

please add mobilenet v2 ut said in #17468 (comment)

we've added the feature-level ut yet. How about ask the Poland team to raise the mobilenet-v2 ut in another PR? Let's this PR focus on feature itself.

@wojtuss
Copy link

wojtuss commented May 21, 2019

@luotao1 , @guomingz , could you please reply also to #17130 (comment)
Without enabling this fuse for INT8, the fuse won't help us with MobileNetV2. The fuse works for INT8 conv2d. Potentially, it does not work with grouped convolutions only.

Let the parameter definition embedded into the code.
That's will make the code easy to understand.

test=develop
@luotao1
Copy link
Contributor

luotao1 commented May 22, 2019

@guomingz @hshen14 please answer #17130 (comment)

@hshen14
Copy link
Contributor

hshen14 commented May 22, 2019

This PR is to support Conv and BoundedReLU fusion in FP32 with the required pass-based graph optimization and op-level unit test. If you encounter the issue when applying the fusion in INT8, I would suggest you could create another PR per your need and add some comments to explain the limitation.

Copy link
Contributor

@luotao1 luotao1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants