Skip to content

[MKL-DNN] Extended LRN with reusing via Acquire API#18675

Merged
luotao1 merged 1 commit intoPaddlePaddle:developfrom
jczaja:prv-lrn-acquire
Jul 23, 2019
Merged

[MKL-DNN] Extended LRN with reusing via Acquire API#18675
luotao1 merged 1 commit intoPaddlePaddle:developfrom
jczaja:prv-lrn-acquire

Conversation

@jczaja
Copy link
Contributor

@jczaja jczaja commented Jul 17, 2019

Changes discussed here are reimplementing LRN mkl-dnn ops to use common Acquire API .

Benefits:

  • Easier to modify and maintain code e.g. further modifications for Multi-threading and mkl-dnn 1.0 will be easier
  • Performance improvement due to Reusing introduced: Google Net v1 (model with LRN) inference via CAPI with bs=1 is ~9% faster on AVX512 platform e.g. SKX (8180)

test=develop

- compileation fix

- Yet another compilation fix

- Even yet another compilation fix

- Surprise! Again compilation fix

- lint fixes

test=develop

- Fix to workspace acquire of LRN

test=develop

- Fix to hash of BWD LRN

test=develop

- fix to lrn BWD PD acquire

test=develop

- Fixing LRN PD creation

test=develop

- cosmetic fix in comment

test=develop

- Fixes after review

test=develop
@jczaja jczaja force-pushed the prv-lrn-acquire branch from a2e69f1 to 071d6ec Compare July 22, 2019 10:24
Copy link
Contributor

@Sand3r- Sand3r- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jczaja Thank you so much for the contribution and the speedup it brings on GoogleNet v1!

@jczaja
Copy link
Contributor Author

jczaja commented Jul 23, 2019

@luotao1 Could you please review this PR? Again your approval is needed to have CI passed

Copy link
Contributor

@luotao1 luotao1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@luotao1
Copy link
Contributor

luotao1 commented Jul 23, 2019

I test Google Net v1 (model with LRN) inference via CAPI with bs=1 on AVX2 platform e.g. E5-2650 v4.

./test_analyzer_image_classification --gtest_filter=Analyzer_resnet50.profile_mkldnn --batch_size=1 --warmup --repeat=100 --paddle_num_threads=4 --infer_model=googlenetfluid/ --profile

lrn from 241ms to 208ms, speedup 13%

  • before
Event              Calls       Total       Min.        Max.        Ave.        Ratio.
thread0::conv2d    5700        1881.6      0.085055    3.01909     0.330106    0.787164
**thread0::lrn       200         241.991     0.716039    2.08498     1.20995     0.101236**
thread0::pool2d    1400        178.921     0.0452      0.266267    0.127801    0.0748511
thread0::concat    900         81.7893     0.04769     0.496502    0.090877    0.0342163
thread0::fc        100         4.40745     0.042422    0.046222    0.0440745   0.00184385
thread0::fetch     100         1.21941     0.010564    0.013756    0.0121941   0.000510137
thread0::feed      100         0.427123    0.003667    0.005537    0.00427123  0.000178686
  • after
Event              Calls       Total       Min.        Max.        Ave.        Ratio.
thread0::conv2d    5700        1872.48     0.086678    3.06727     0.328505    0.799478
**thread0::lrn       200         208.1       0.540581    1.58649     1.0405      0.0888512**
thread0::pool2d    1400        178.047     0.046527    0.557823    0.127176    0.0760194
thread0::concat    900         77.3396     0.047206    0.470094    0.0859329   0.0330212
thread0::fc        100         4.48958     0.042211    0.063319    0.0448958   0.00191689
thread0::fetch     100         1.24835     0.011442    0.016381    0.0124835   0.000532998
thread0::feed      100         0.421286    0.003702    0.006086    0.00421286  0.000179874

@luotao1
Copy link
Contributor

luotao1 commented Jul 23, 2019

~9% faster

Is this Op-level speedup or Model-level speedup? @jczaja

@jczaja
Copy link
Contributor Author

jczaja commented Jul 23, 2019

@luotao1 Reported speedup is on model level (SKX 8180 AVX512). I stored only partial log:

before:
sample latency: 26.7129 fps: 37.43

...
thread0::lrn: 2524.55
...

After:
sample latency: 24.4794 fps: 40.8506

....
thread0::lrn: 1194.76
......

@luotao1 luotao1 merged commit 95c1816 into PaddlePaddle:develop Jul 23, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants