Skip to content

Add conv reqantize squash#18754

Merged
luotao1 merged 3 commits intoPaddlePaddle:developfrom
wozna:squash_wozna
Aug 13, 2019
Merged

Add conv reqantize squash#18754
luotao1 merged 3 commits intoPaddlePaddle:developfrom
wozna:squash_wozna

Conversation

@wozna
Copy link

@wozna wozna commented Jul 23, 2019

This squash improves the accuracy of inference on the GoogLeNet model on ImageNet data.

FP32: avg top1 accuracy: 0.7050

Using INT8 and mkldnn accuracy increases from
INT8: avg top1 accuracy: 0.7017
to
INT8: avg top1 accuracy: 0.7022

This is reopen of #18676

test=develop

test=develop
@bingyanghuang
Copy link
Contributor

@wozna Please add the test=develop to trigger the CI in your latest commit.

test=develop
@wozna wozna force-pushed the squash_wozna branch 2 times, most recently from fc53c15 to c486c21 Compare July 29, 2019 07:16
@bingyanghuang
Copy link
Contributor

@wojtuss Please help review this PR.

wojtuss
wojtuss previously approved these changes Jul 30, 2019
Copy link

@wojtuss wojtuss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@wozna wozna mentioned this pull request Jul 30, 2019
@luotao1
Copy link
Contributor

luotao1 commented Jul 30, 2019

I will double-check it ASAP.

Copy link
Contributor

@Sand3r- Sand3r- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wozna Thank you so much for ensuring such an advanced test coverage for your changes and for the introduced changes themselves. Please consider conforming to the suggestions provided below.

@luotao1
Copy link
Contributor

luotao1 commented Jul 30, 2019

before:

I0730 18:03:55.057801 45052 tester_helper.h:462] --- Performance summary ---
I0730 18:03:55.057843 45052 tester_helper.h:463] FP32: avg fps: 33.5400, avg latency: 29.8152 ms
I0730 18:03:55.057854 45052 tester_helper.h:466] INT8: avg fps: 68.2523, avg latency: 14.6515 ms
I0730 18:03:55.067595 45052 tester_helper.h:447] --- Accuracy summary ---
I0730 18:03:55.067620 45052 tester_helper.h:448] Accepted top1 accuracy drop threshold: 0.01. (condition: (FP32_top1_acc - INT8_top1_acc) <= threshold)
I0730 18:03:55.067639 45052 tester_helper.h:451] FP32: avg top1 accuracy: 0.7050
I0730 18:03:55.067646 45052 tester_helper.h:453] INT8: avg top1 accuracy: 0.7008

after:

I0730 20:00:23.235113 294204 tester_helper.h:462] --- Performance summary ---
I0730 20:00:23.235152 294204 tester_helper.h:463] FP32: avg fps: 34.1651, avg latency: 29.2696 ms
I0730 20:00:23.235173 294204 tester_helper.h:466] INT8: avg fps: 82.6219, avg latency: 12.1033 ms
I0730 20:00:23.244776 294204 tester_helper.h:447] --- Accuracy summary ---
I0730 20:00:23.244801 294204 tester_helper.h:448] Accepted top1 accuracy drop threshold: 0.01. (condition: (FP32_top1_acc - INT8_top1_acc) <= threshold)
I0730 20:00:23.244819 294204 tester_helper.h:451] FP32: avg top1 accuracy: 0.7050
I0730 20:00:23.244824 294204 tester_helper.h:453] INT8: avg top1 accuracy: 0.7003

from 70.08->70.03, machie is Intel(R) Xeon(R) Gold 6271 CPU @ 2.60GHz
develop commit: cfcb96d

@bingyanghuang
Copy link
Contributor

Based on LuoTao's benchmark, performance increase from 68.2523 to 82.6219, and the accuracy only drop from 0.7008 to 0.7003. I think this PR is good to merge. What do you think about? @wojtuss

@wozna
Copy link
Author

wozna commented Jul 31, 2019

I checked it on Intel(R) Xeon(R) Gold 6248 CPU @ 2.50GHz
and I got the same results.

before

I0731 11:36:49.323259 42280 tester_helper.h:462] --- Performance summary ---
 I0731 11:36:49.323282 42280 tester_helper.h:463] FP32: avg fps: 582.5014, avg latency: 1.7167 ms
 I0731 11:36:49.323288 42280 tester_helper.h:466] INT8: avg fps: 1010.2657, avg latency: 0.9898 ms
 I0731 11:36:49.323566 42280 tester_helper.h:447] --- Accuracy summary ---
 I0731 11:36:49.323572 42280 tester_helper.h:448] Accepted top1 accuracy drop threshold: 0.01 (condition: (FP32_top1_acc - INT8_top1_acc) <= threshold)
 I0731 11:36:49.323578 42280 tester_helper.h:451] FP32: avg top1 accuracy: 0.7050
 I0731 11:36:49.323582 42280 tester_helper.h:453] INT8: avg top1 accuracy: 0.7017

after

I0731 11:27:40.606056 39099 tester_helper.h:462] --- Performance summary ---
 I0731 11:27:40.606074 39099 tester_helper.h:463] FP32: avg fps: 600.3001, avg latency: 1.6658 ms
 I0731 11:27:40.606081 39099 tester_helper.h:466] INT8: avg fps: 1146.9660, avg latency: 0.8719 ms
 I0731 11:27:40.606254 39099 tester_helper.h:447] --- Accuracy summary ---
 I0731 11:27:40.606259 39099 tester_helper.h:448] Accepted top1 accuracy drop threshold: 0.01. (condition: (FP32_top1_acc - INT8_top1_acc) <= threshold)
 I0731 11:27:40.606263 39099 tester_helper.h:451] FP32: avg top1 accuracy: 0.7050
 I0731 11:27:40.606267 39099 tester_helper.h:453] INT8: avg top1 accuracy: 0.7022

@luotao1
Copy link
Contributor

luotao1 commented Jul 31, 2019

Could you provide:

  • commit_id
  • cmake command

Besides, I run with OMP_NUM_THREADS=1.

@wozna
Copy link
Author

wozna commented Jul 31, 2019

  • cmake command
    /Paddle/build/paddle/fluid/inference/tests/api/test_analyzer_int8_image_classification "ARGS" "--infer_model=/Paddle/build/third_party/inference_demo/int8v2/googlenet/model" "--infer_data=/data/PaddlePaddle/1G/imagenet/val.bin" "--warmup_batch_size=100" "--batch_size=50" "--paddle_num_threads=28" "--iterations=1000"

Besides, I run with OMP_NUM_THREADS=1.
I will try with this config.

@bingyanghuang
Copy link
Contributor

I did the test in our CLX6248 with commit 233746d , and command line is:

./paddle/fluid/inference/tests/api/test_analyzer_int8_image_classification --infer_model=third_party/inference_demo/int8v2/googlenet/model --infer_data=/~/.cache/paddle/dataset/int8/download/int8_full_val.bin --batch_size=1 --paddle_num_threads=1

Got the following results.

  • Before
I0801 08:46:15.091809 262332 tester_helper.h:462] --- Performance summary ---
I0801 06:41:59.026803 249389 tester_helper.h:463] FP32: avg fps: 43.5602, avg latency: 22.9567 ms
I0801 06:41:59.026811 249389 tester_helper.h:466] INT8: avg fps: 93.9150, avg latency: 10.6479 ms
I0801 06:41:59.038902 249389 tester_helper.h:447] --- Accuracy summary ---
I0801 06:41:59.038915 249389 tester_helper.h:448] Accepted top1 accuracy drop threshold: 0.01. (condition: (FP32_top1_acc - INT8_top1_acc) <= threshold)
I0801 06:41:59.038921 249389 tester_helper.h:451] FP32: avg top1 accuracy: 0.7050
I0801 06:41:59.038925 249389 tester_helper.h:453] INT8: avg top1 accuracy: 0.7008
  • After
I0801 08:46:15.091809 262332 tester_helper.h:462] --- Performance summary ---
I0801 08:46:15.091835 262332 tester_helper.h:463] FP32: avg fps: 44.4318, avg latency: 22.5064 ms
I0801 08:46:15.091846 262332 tester_helper.h:466] INT8: avg fps: 117.5707, avg latency: 8.5055 ms
I0801 08:46:15.101172 262332 tester_helper.h:447] --- Accuracy summary ---
I0801 08:46:15.101187 262332 tester_helper.h:448] Accepted top1 accuracy drop threshold: 0.01. (condition: (FP32_top1_acc - INT8_top1_acc) <= threshold)
I0801 08:46:15.101194 262332 tester_helper.h:451] FP32: avg top1 accuracy: 0.7050
I0801 08:46:15.101199 262332 tester_helper.h:453] INT8: avg top1 accuracy: 0.7003

Same conclusion as luotao. @wozna wojtek is planning to investigate the server configuration problem, will figure out the reason why we got the different results.

Copy link
Contributor

@Sand3r- Sand3r- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks good to me. 👍 Good job @wozna. It's nearly a 25% of a speedup topologywise.

@bingyanghuang
Copy link
Contributor

@luotao1 Please start a review.

@bingyanghuang
Copy link
Contributor

Based on luotao's benchmark on 6271:
image

INT8 performance got about 21% speedup but INT8 accuracy got 0.0005 accuracy drop . Since 0.0005 is minor compared with big performance gain, we decided to merge this PR @Sand3r- @wojtuss @luotao1

@luotao1
Copy link
Contributor

luotao1 commented Aug 13, 2019

Got it.

@luotao1 luotao1 merged commit 492a00f into PaddlePaddle:develop Aug 13, 2019
@wozna wozna deleted the squash_wozna branch February 24, 2023 15:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants