Skip to content

Add conv dequant squash#18905

Merged
luotao1 merged 8 commits intoPaddlePaddle:developfrom
wozna:conv_dequant_squashe
Aug 27, 2019
Merged

Add conv dequant squash#18905
luotao1 merged 8 commits intoPaddlePaddle:developfrom
wozna:conv_dequant_squashe

Conversation

@wozna
Copy link

@wozna wozna commented Jul 30, 2019

This squash is based on PR #18754
This squash change chain with conv op. and dequantize op. to conv with forced float output.
Thanks to this you will avoid additional conversion to int8.

In mobilenet ssd there are 12 pattern like this and the use of this squash improves accuracy
(on clx machine):
- Before squash: acc_fp32 = 0.7733, acc_int8=0.77
- After that squash: acc_fp32 = 0.7733, acc_int8=0.7719

Unfortunately, after a few tests using mkl-dnn and int8 on clx machine, fps decreased:
- Before squash: avg fps = 151.634
- After squash: avg fps = 146.8749
Is's almost 3%.
That could be some kind of option if accuracy is more important than performance.

wozna added 3 commits July 23, 2019 03:56
@wozna
Copy link
Author

wozna commented Jul 30, 2019

@luotao1 what do you think about the result and option to enable the pass?

@bingyanghuang
Copy link
Contributor

Tested on Intel(R) Xeon(R) Gold 6248 CPU @ 2.50GHz

  • Test command line
    ./paddle/fluid/inference/tests/api/test_analyzer_int8_object_detection "ARGS" --infer_model=third_party/inference_demo/int8v2/mobilenet-ssd/model --infer_data=/home/bingyang/.cache/paddle/dataset/pascalvoc/pascalvoc_full.bin --warmup_batch_size=10 --batch_size=1 --paddle_num_threads=1

  • Before

I0821 06:37:47.878208 109578 tester_helper.h:468] --- Performance summary ---
I0821 06:37:47.878234 109578 tester_helper.h:469] FP32: avg fps: 15.5156, avg latency: 64.4513 ms
I0821 06:37:47.878245 109578 tester_helper.h:472] INT8: avg fps: 20.4021, avg latency: 49.0146 ms
I0821 06:37:47.880376 109578 tester_helper.h:452] --- Accuracy summary ---
I0821 06:37:47.880390 109578 tester_helper.h:453] Accepted mAP drop threshold: 0.01. (condition: (FP32_mAP  - INT8_mAP ) <= threshold)
I0821 06:37:47.880398 109578 tester_helper.h:457] FP32: avg mAP 0.7392
I0821 06:37:47.880403 109578 tester_helper.h:459] INT8: avg mAP 0.7287
  • After
I0821 05:35:45.883280 75832 tester_helper.h:468] --- Performance summary ---
I0821 05:35:45.883306 75832 tester_helper.h:469] FP32: avg fps: 15.4869, avg latency: 64.5709 ms
I0821 05:35:45.883317 75832 tester_helper.h:472] INT8: avg fps: 21.2940, avg latency: 46.9617 ms
I0821 05:35:45.885444 75832 tester_helper.h:452] --- Accuracy summary ---
I0821 05:35:45.885458 75832 tester_helper.h:453] Accepted mAP drop threshold: 0.01. (condition: (FP32_mAP  - INT8_mAP ) <= threshold)
I0821 05:35:45.885465 75832 tester_helper.h:457] FP32: avg mAP 0.7392
I0821 05:35:45.885469 75832 tester_helper.h:459] INT8: avg mAP 0.7315
  • Benchmark conclusion
Compare FPS INT8 Accuracy
Before 20.4021 0.7287
After 21.2940 0.7315

@luotao1
Copy link
Contributor

luotao1 commented Aug 21, 2019

@juncaipeng @wzzju Please double-check the #18905 (comment)

@wozna wozna force-pushed the conv_dequant_squashe branch from 5efd9a1 to 6d4ce01 Compare August 21, 2019 12:15
test=develop
@wozna wozna force-pushed the conv_dequant_squashe branch from 6d4ce01 to e412296 Compare August 21, 2019 12:32
test=develop
Copy link

@wojtuss wojtuss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@wozna
Copy link
Author

wozna commented Aug 23, 2019

@luotao1 if everything is fine then this PR is an improvement ready to merge

@juncaipeng
Copy link
Contributor

juncaipeng commented Aug 27, 2019

Tested on Intel(R) Xeon(R) Gold 6271 CPU @ 2.60GHz

  • Test command
./test_analyzer_int8_object_detection "ARGS" --infer_model=mobilenet-ssd/model/ --infer_data=/dev/shm/pascalvoc_full.bin --warmup_batch_size=10 --batch_size=1 --paddle_num_threads=1
  • Before
I0827 13:02:15.814708 370741 tester_helper.h:471] --- Performance summary ---
I0827 13:02:15.814764 370741 tester_helper.h:472] FP32: avg fps: 12.3974, avg latency: 80.6618 ms
I0827 13:02:15.814790 370741 tester_helper.h:475] INT8: avg fps: 15.9749, avg latency: 62.5980 ms
I0827 13:02:15.817037 370741 tester_helper.h:455] --- Accuracy summary ---
I0827 13:02:15.817065 370741 tester_helper.h:456] Accepted mAP drop threshold: 0.01. (condition: (FP32_mAP  - INT8_mAP ) <= threshold)
I0827 13:02:15.817085 370741 tester_helper.h:460] FP32: avg mAP 0.7392
I0827 13:02:15.817091 370741 tester_helper.h:462] INT8: avg mAP 0.7287
  • After
I0827 11:10:15.486403 161780 tester_helper.h:468] --- Performance summary ---
I0827 11:10:15.486446 161780 tester_helper.h:469] FP32: avg fps: 12.4536, avg latency: 80.2981 ms
I0827 11:10:15.486469 161780 tester_helper.h:472] INT8: avg fps: 16.6916, avg latency: 59.9104 ms
I0827 11:10:15.488651 161780 tester_helper.h:452] --- Accuracy summary ---
I0827 11:10:15.488679 161780 tester_helper.h:453] Accepted mAP drop threshold: 0.01. (condition: (FP32_mAP  - INT8_mAP ) <= threshold)
I0827 11:10:15.488698 161780 tester_helper.h:457] FP32: avg mAP 0.7392
I0827 11:10:15.488703 161780 tester_helper.h:459] INT8: avg mAP 0.7315
  • Benchmark conclusion
Compare FPS INT8 Accuracy
Before 15.9749 0.7287
After 16.6916 0.7315

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants