fix repeating passes by wojtuss · Pull Request #16606 · PaddlePaddle/Paddle

wojtuss · 2019-04-01T14:10:19Z

This patch fixes redundant adding passes multiple times.

test=develop

luotao1 · 2019-04-01T14:19:58Z

paddle/fluid/inference/api/paddle_pass_builder.h

-      : PassStrategy(other.AllPasses()) {}
+      : PassStrategy(other.AllPasses()) {
+    use_gpu_ = other.use_gpu_;
+    use_mkldnn_ = other.use_mkldnn_;


It don't need use_mkldnn_quantizer_ = other.use_mkldnn_quantizer_?

Ah, true, I have missed that one.
Done.

wojtuss · 2019-04-01T23:12:05Z

@luotao1 , I have reported the issue: #16609
the pass runtime_context_cache_pass breaks the FP32 inference accuracy if there is more than one iteration. I have commented out the pass here and modified the test #16559 so it could detect the failure with the pass.

test=develop

luotao1 · 2019-04-02T06:10:51Z

paddle/fluid/inference/api/paddle_pass_builder.h

-                "conv_relu_mkldnn_fuse_pass",    //
-                "conv_elementwise_add_mkldnn_fuse_pass"})) {
+                "conv_elementwise_add_mkldnn_fuse_pass",
+                "conv_relu_mkldnn_fuse_pass"})) {


why do you change the order here?

Because it was in the wrong order. It was overlooked in the previous fix of mkldnn passes.

Could you please illustrate an example? Thanks.

How about move the implementation of EnableMKLDNN and EnableMkldnnQuantizer to paddle_pass_builder.cc?

Sure. With ResNet50, after conv_bias_mkldnn_pass there picture is as follows:

Then the conv_elementwise_add_mkldnn pass has to be applied:

and then conv_relu_mkldnn_fuse_pass:

In the opposite order we would get remaining relu operators:

It was unnoticable until now, as some passes (conv_relu_mkldnn_fuse_pass among them) were unintentionally repeated. Eliminating arbitrary repetitions, the problem becomes visible, and this patch fixes it.

Clear now, thanks @wojtuss

How about move the implementation of EnableMKLDNN and EnableMkldnnQuantizer to paddle_pass_builder.cc?

I do not think it can be done in an elegant and user-friendly way. A user has to have an option to enable MKL-DNN and MkldnnQuantizer, so AnalysisConfig is the best place to have the methods in. Implementation of the methods is minimal.

I mean could you move line 122-134 from paddle_pass_builder.h to paddle_pass_builder.cc?

luotao1 · 2019-04-02T06:49:30Z

paddle/fluid/inference/api/analysis_config.cc

          << "EnableMKLDNN() only works when IR optimization is enabled.";
+    } else {
+      pass_builder()->EnableMKLDNN();
    }


Could we remove line 244-247? Since I see LOG(ERROR) << "Please compile with MKLDNN first to use MKLDNN";use_mkldnn_ = false; is already in pass_builder()->EnableMKLDNN();
Could we remove line 258-261 for the same reason?

Right. Done.

luotao1 · 2019-04-02T07:15:25Z

The reason is that runtime_context_pass must be at the end of all the passes. I do experiment like

It runs OK

128: I0402 07:09:54.537349  5517 analysis_predictor.cc:393] MKLDNN is enabled
128: I0402 07:09:54.537361  5517 analysis_predictor.cc:399] Quantization is enabled
128: --- Running analysis [ir_graph_build_pass]
128: --- Running analysis [ir_analysis_pass]
128: --- Running IR pass [mkldnn_placement_pass]
128: --- Running IR pass [infer_clean_graph_pass]
128: --- Running IR pass [attention_lstm_fuse_pass]
128: --- Running IR pass [seqpool_concat_fuse_pass]
128: --- Running IR pass [seqconv_eltadd_relu_fuse_pass]
128: --- Running IR pass [fc_lstm_fuse_pass]
128: --- Running IR pass [mul_lstm_fuse_pass]
128: --- Running IR pass [fc_gru_fuse_pass]
128: --- Running IR pass [mul_gru_fuse_pass]
128: --- Running IR pass [seq_concat_fc_fuse_pass]
128: --- Running IR pass [fc_fuse_pass]
128: ---  detected 1 subgraphs
128: --- Running IR pass [repeated_fc_relu_fuse_pass]
128: --- Running IR pass [squared_mat_sub_fuse_pass]
128: --- Running IR pass [conv_bn_fuse_pass]
128: ---  detected 14 subgraphs
128: --- Running IR pass [conv_eltwiseadd_bn_fuse_pass]
128: --- Running IR pass [is_test_pass]
128: --- Running IR pass [identity_scale_op_clean_pass]
128: --- Running IR pass [depthwise_conv_mkldnn_pass]
128: ---  detected 13 subgraphs
128: --- Running IR pass [conv_bn_fuse_pass]
128: ---  detected 13 subgraphs
128: --- Running IR pass [conv_eltwiseadd_bn_fuse_pass]
128: --- Running IR pass [conv_bias_mkldnn_fuse_pass]
128: --- Running IR pass [conv3d_bias_mkldnn_fuse_pass]
128: --- Running IR pass [conv_elementwise_add_mkldnn_fuse_pass]
128: Fused graph 0
128: --- Running IR pass [conv_relu_mkldnn_fuse_pass]
128: ---  detected 27 subgraphs
128: --- Running IR pass [cpu_quantize_placement_pass]
128: --- Running IR pass [runtime_context_cache_pass]
128: --- Running analysis [ir_params_sync_among_devices_pass]
128: --- Running analysis [ir_graph_to_program_pass]
128: I0402 07:09:56.815405  5517 analysis_predictor.cc:429] == optimize end ==
128: --- Running warmup iteration for quantization
128: --- Calculating scales for quantization
128: --- Running analysis [ir_analysis_pass]
128: --- Running IR pass [infer_clean_graph_pass]
128: --- Running IR pass [cpu_quantize_pass]
128: ---  detected 27 subgraphs
128: ---    quantized 27 conv2d ops
128: ---    quantized 0 conv2d ops with residual connection
128: ---  detected 1 subgraphs
128: ---    quantized 1 pool2d ops
128: --- Running IR pass [cpu_quantize_squash_pass]
128: ---  detected 28 subgraphs
128: ---  detected 27 subgraphs
128: ---    squashed 27 dequantize-quantize pairs
128: --- Running analysis [memory_optimize_pass]
128: --- Running analysis [ir_graph_to_program_pass]
128: I0402 07:10:05.435830  5517 mkldnn_quantizer.cc:393] == optimize 2 end ==
128: I0402 07:10:05.436307  5517 tester_helper.h:295] Running thread 0, warm up run...
128: I0402 07:10:18.795514  5517 helper.h:273] ====== threads: 1, thread id: 0 ======
128: I0402 07:10:18.795536  5517 helper.h:275] ====== batch_size: 50, iterations: 1, repetitions: 1 ======
128: I0402 07:10:18.795539  5517 helper.h:277] ====== batch latency: 13359.2ms, number of samples: 50, sample latency: 267.184ms, fps: 3.74274 ======
128: I0402 07:10:18.795552  5517 tester_helper.h:323] Thread 0, number of threads 1, run 1 times...
128: I0402 07:10:45.517607  5517 helper.h:273] ====== threads: 1, thread id: 0 ======
128: I0402 07:10:45.517630  5517 helper.h:275] ====== batch_size: 50, iterations: 2, repetitions: 1 ======
128: I0402 07:10:45.517633  5517 helper.h:277] ====== batch latency: 13361ms, number of samples: 100, sample latency: 267.22ms, fps: 3.74223 ======
128: I0402 07:10:45.520999  5517 tester_helper.h:508] --- comparing outputs ---
128: I0402 07:10:45.521011  5517 tester_helper.h:449] Avg top1 INT8 accuracy: 0.7100
128: I0402 07:10:45.521016  5517 tester_helper.h:451] Avg top1 FP32 accuracy: 0.7100
128: I0402 07:10:45.521019  5517 tester_helper.h:453] Accepted accuracy drop threshold: 0.01
128: [       OK ] Analyzer_int8_resnet50.quantization (57285 ms)
128: [----------] 1 test from Analyzer_int8_resnet50 (57285 ms total)
128:
128: [----------] Global test environment tear-down
128: [==========] 1 test from 1 test case ran. (57285 ms total)
128: [  PASSED  ] 1 test.
1/1 Test #128: test_analyzer_int8_mobilenet .....   Passed   57.77 sec

I am working on move runtime_context_pass to the end of all the passes.

luotao1 · 2019-04-02T08:07:43Z

paddle/fluid/inference/api/paddle_pass_builder.h

-                "conv_relu_mkldnn_fuse_pass",    //
-                "conv_elementwise_add_mkldnn_fuse_pass"})) {
+                "conv_elementwise_add_mkldnn_fuse_pass",
+                "conv_relu_mkldnn_fuse_pass"})) {


I mean could you move line 122-134 from paddle_pass_builder.h to paddle_pass_builder.cc?

luotao1 · 2019-04-02T08:10:35Z

paddle/fluid/inference/api/paddle_pass_builder.cc

      "identity_scale_op_clean_pass",  //
-      "runtime_context_cache_pass",    //
+      // TODO(?): fix the pass below as it breaks accuracy
+      // "runtime_context_cache_pass",


Could you not delete runtime_context_cache_pass here? I will try to fix this problem in another PR. And your PR could be merged at first.

Ah, OK.
Done, both.

luotao1

LGTM. Please cherry-pick #16659, #16606 (after this PR merge) and #16608 (#16625 don't trigger windows_ci successfully) in one PR to release/1.4. Note that add test=release/1.4 in commit info.

wojtuss · 2019-04-02T14:09:13Z

Do you want us to squash and merge this PR when it is all green?

luotao1 · 2019-04-02T14:45:21Z

I squash and merge the PR, and you can do cherry-pick now.

Cherry-pick of #16559, #16606 and #16608

Wojciech Uss added 2 commits April 1, 2019 09:08

fix repeating passes

1f89ebf

test=develop

add missing copy

29abc18

test=develop

wojtuss added the Intel label Apr 1, 2019

wojtuss requested a review from luotao1 April 1, 2019 14:10

luotao1 reviewed Apr 1, 2019

View reviewed changes

turn off runtime_context_cache_pass

f0b8dfb

wojtuss mentioned this pull request Apr 1, 2019

fix dataset reading and add support for full dataset #16559

Merged

Wojciech Uss added 2 commits April 2, 2019 02:27

remove redundant logging

18cc9e9

test=develop

fixes after review

7cbe4b9

test=develop

luotao1 reviewed Apr 2, 2019

View reviewed changes

luotao1 approved these changes Apr 2, 2019

View reviewed changes

luotao1 merged commit ec2750b into PaddlePaddle:develop Apr 2, 2019

$@sfraczek$ sfraczek mentioned this pull request Apr 2, 2019

Cherry-pick of #16559, #16606 and #16608 #16643

Merged

luotao1 added a commit that referenced this pull request Apr 3, 2019

Merge pull request #16643 from sfraczek/fix-repeating-passes

d3b6291

Cherry-pick of #16559, #16606 and #16608

This was referenced Apr 4, 2019

Fix mkldnn related irpasses being added triple times #16175

Closed

mkldnn related irpasses are added triple times when mkldnn is enabled in AnaylsisPredictor #16174

Closed

Conversation

wojtuss commented Apr 1, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wojtuss commented Apr 1, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wojtuss Apr 2, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

luotao1 commented Apr 2, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

luotao1 left a comment

Choose a reason for hiding this comment

Uh oh!

wojtuss commented Apr 2, 2019

Uh oh!

luotao1 commented Apr 2, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

wojtuss Apr 2, 2019 •

edited

Loading

luotao1 commented Apr 2, 2019 •

edited

Loading