Skip to content

fix repeating passes#16606

Merged
luotao1 merged 5 commits intoPaddlePaddle:developfrom
wojtuss:wojtuss/fix-repeating-passes
Apr 2, 2019
Merged

fix repeating passes#16606
luotao1 merged 5 commits intoPaddlePaddle:developfrom
wojtuss:wojtuss/fix-repeating-passes

Conversation

@wojtuss
Copy link

@wojtuss wojtuss commented Apr 1, 2019

This patch fixes redundant adding passes multiple times.

test=develop

Wojciech Uss added 2 commits April 1, 2019 09:08
test=develop
test=develop
@wojtuss wojtuss added the Intel label Apr 1, 2019
@wojtuss wojtuss requested a review from luotao1 April 1, 2019 14:10
: PassStrategy(other.AllPasses()) {}
: PassStrategy(other.AllPasses()) {
use_gpu_ = other.use_gpu_;
use_mkldnn_ = other.use_mkldnn_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It don't need use_mkldnn_quantizer_ = other.use_mkldnn_quantizer_?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, true, I have missed that one.
Done.

@wojtuss
Copy link
Author

wojtuss commented Apr 1, 2019

@luotao1 , I have reported the issue: #16609
the pass runtime_context_cache_pass breaks the FP32 inference accuracy if there is more than one iteration. I have commented out the pass here and modified the test #16559 so it could detect the failure with the pass.

Wojciech Uss added 2 commits April 2, 2019 02:27
test=develop
"conv_relu_mkldnn_fuse_pass", //
"conv_elementwise_add_mkldnn_fuse_pass"})) {
"conv_elementwise_add_mkldnn_fuse_pass",
"conv_relu_mkldnn_fuse_pass"})) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do you change the order here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because it was in the wrong order. It was overlooked in the previous fix of mkldnn passes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please illustrate an example? Thanks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about move the implementation of EnableMKLDNN and EnableMkldnnQuantizer to paddle_pass_builder.cc?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. With ResNet50, after conv_bias_mkldnn_pass there picture is as follows:
image
Then the conv_elementwise_add_mkldnn pass has to be applied:
image
and then conv_relu_mkldnn_fuse_pass:
image
In the opposite order we would get remaining relu operators:
image

It was unnoticable until now, as some passes (conv_relu_mkldnn_fuse_pass among them) were unintentionally repeated. Eliminating arbitrary repetitions, the problem becomes visible, and this patch fixes it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clear now, thanks @wojtuss

Copy link
Author

@wojtuss wojtuss Apr 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about move the implementation of EnableMKLDNN and EnableMkldnnQuantizer to paddle_pass_builder.cc?

I do not think it can be done in an elegant and user-friendly way. A user has to have an option to enable MKL-DNN and MkldnnQuantizer, so AnalysisConfig is the best place to have the methods in. Implementation of the methods is minimal.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean could you move line 122-134 from paddle_pass_builder.h to paddle_pass_builder.cc?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

<< "EnableMKLDNN() only works when IR optimization is enabled.";
} else {
pass_builder()->EnableMKLDNN();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we remove line 244-247? Since I see LOG(ERROR) << "Please compile with MKLDNN first to use MKLDNN";use_mkldnn_ = false; is already in pass_builder()->EnableMKLDNN();
Could we remove line 258-261 for the same reason?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. Done.

@luotao1
Copy link
Contributor

luotao1 commented Apr 2, 2019

The reason is that runtime_context_pass must be at the end of all the passes. I do experiment like
image
It runs OK

128: I0402 07:09:54.537349  5517 analysis_predictor.cc:393] MKLDNN is enabled
128: I0402 07:09:54.537361  5517 analysis_predictor.cc:399] Quantization is enabled
128: --- Running analysis [ir_graph_build_pass]
128: --- Running analysis [ir_analysis_pass]
128: --- Running IR pass [mkldnn_placement_pass]
128: --- Running IR pass [infer_clean_graph_pass]
128: --- Running IR pass [attention_lstm_fuse_pass]
128: --- Running IR pass [seqpool_concat_fuse_pass]
128: --- Running IR pass [seqconv_eltadd_relu_fuse_pass]
128: --- Running IR pass [fc_lstm_fuse_pass]
128: --- Running IR pass [mul_lstm_fuse_pass]
128: --- Running IR pass [fc_gru_fuse_pass]
128: --- Running IR pass [mul_gru_fuse_pass]
128: --- Running IR pass [seq_concat_fc_fuse_pass]
128: --- Running IR pass [fc_fuse_pass]
128: ---  detected 1 subgraphs
128: --- Running IR pass [repeated_fc_relu_fuse_pass]
128: --- Running IR pass [squared_mat_sub_fuse_pass]
128: --- Running IR pass [conv_bn_fuse_pass]
128: ---  detected 14 subgraphs
128: --- Running IR pass [conv_eltwiseadd_bn_fuse_pass]
128: --- Running IR pass [is_test_pass]
128: --- Running IR pass [identity_scale_op_clean_pass]
128: --- Running IR pass [depthwise_conv_mkldnn_pass]
128: ---  detected 13 subgraphs
128: --- Running IR pass [conv_bn_fuse_pass]
128: ---  detected 13 subgraphs
128: --- Running IR pass [conv_eltwiseadd_bn_fuse_pass]
128: --- Running IR pass [conv_bias_mkldnn_fuse_pass]
128: --- Running IR pass [conv3d_bias_mkldnn_fuse_pass]
128: --- Running IR pass [conv_elementwise_add_mkldnn_fuse_pass]
128: Fused graph 0
128: --- Running IR pass [conv_relu_mkldnn_fuse_pass]
128: ---  detected 27 subgraphs
128: --- Running IR pass [cpu_quantize_placement_pass]
128: --- Running IR pass [runtime_context_cache_pass]
128: --- Running analysis [ir_params_sync_among_devices_pass]
128: --- Running analysis [ir_graph_to_program_pass]
128: I0402 07:09:56.815405  5517 analysis_predictor.cc:429] == optimize end ==
128: --- Running warmup iteration for quantization
128: --- Calculating scales for quantization
128: --- Running analysis [ir_analysis_pass]
128: --- Running IR pass [infer_clean_graph_pass]
128: --- Running IR pass [cpu_quantize_pass]
128: ---  detected 27 subgraphs
128: ---    quantized 27 conv2d ops
128: ---    quantized 0 conv2d ops with residual connection
128: ---  detected 1 subgraphs
128: ---    quantized 1 pool2d ops
128: --- Running IR pass [cpu_quantize_squash_pass]
128: ---  detected 28 subgraphs
128: ---  detected 27 subgraphs
128: ---    squashed 27 dequantize-quantize pairs
128: --- Running analysis [memory_optimize_pass]
128: --- Running analysis [ir_graph_to_program_pass]
128: I0402 07:10:05.435830  5517 mkldnn_quantizer.cc:393] == optimize 2 end ==
128: I0402 07:10:05.436307  5517 tester_helper.h:295] Running thread 0, warm up run...
128: I0402 07:10:18.795514  5517 helper.h:273] ====== threads: 1, thread id: 0 ======
128: I0402 07:10:18.795536  5517 helper.h:275] ====== batch_size: 50, iterations: 1, repetitions: 1 ======
128: I0402 07:10:18.795539  5517 helper.h:277] ====== batch latency: 13359.2ms, number of samples: 50, sample latency: 267.184ms, fps: 3.74274 ======
128: I0402 07:10:18.795552  5517 tester_helper.h:323] Thread 0, number of threads 1, run 1 times...
128: I0402 07:10:45.517607  5517 helper.h:273] ====== threads: 1, thread id: 0 ======
128: I0402 07:10:45.517630  5517 helper.h:275] ====== batch_size: 50, iterations: 2, repetitions: 1 ======
128: I0402 07:10:45.517633  5517 helper.h:277] ====== batch latency: 13361ms, number of samples: 100, sample latency: 267.22ms, fps: 3.74223 ======
128: I0402 07:10:45.520999  5517 tester_helper.h:508] --- comparing outputs ---
128: I0402 07:10:45.521011  5517 tester_helper.h:449] Avg top1 INT8 accuracy: 0.7100
128: I0402 07:10:45.521016  5517 tester_helper.h:451] Avg top1 FP32 accuracy: 0.7100
128: I0402 07:10:45.521019  5517 tester_helper.h:453] Accepted accuracy drop threshold: 0.01
128: [       OK ] Analyzer_int8_resnet50.quantization (57285 ms)
128: [----------] 1 test from Analyzer_int8_resnet50 (57285 ms total)
128:
128: [----------] Global test environment tear-down
128: [==========] 1 test from 1 test case ran. (57285 ms total)
128: [  PASSED  ] 1 test.
1/1 Test #128: test_analyzer_int8_mobilenet .....   Passed   57.77 sec

I am working on move runtime_context_pass to the end of all the passes.

"conv_relu_mkldnn_fuse_pass", //
"conv_elementwise_add_mkldnn_fuse_pass"})) {
"conv_elementwise_add_mkldnn_fuse_pass",
"conv_relu_mkldnn_fuse_pass"})) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean could you move line 122-134 from paddle_pass_builder.h to paddle_pass_builder.cc?

"identity_scale_op_clean_pass", //
"runtime_context_cache_pass", //
// TODO(?): fix the pass below as it breaks accuracy
// "runtime_context_cache_pass",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you not delete runtime_context_cache_pass here? I will try to fix this problem in another PR. And your PR could be merged at first.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, OK.
Done, both.

Copy link
Contributor

@luotao1 luotao1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Please cherry-pick #16659, #16606 (after this PR merge) and #16608 (#16625 don't trigger windows_ci successfully) in one PR to release/1.4. Note that add test=release/1.4 in commit info.

@wojtuss
Copy link
Author

wojtuss commented Apr 2, 2019

Do you want us to squash and merge this PR when it is all green?

@luotao1 luotao1 merged commit ec2750b into PaddlePaddle:develop Apr 2, 2019
@luotao1
Copy link
Contributor

luotao1 commented Apr 2, 2019

I squash and merge the PR, and you can do cherry-pick now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants