[MKL-DNN] Extended LRN with reusing via Acquire API by jczaja · Pull Request #18675 · PaddlePaddle/Paddle

jczaja · 2019-07-17T13:13:16Z

Changes discussed here are reimplementing LRN mkl-dnn ops to use common Acquire API .

Benefits:

Easier to modify and maintain code e.g. further modifications for Multi-threading and mkl-dnn 1.0 will be easier
Performance improvement due to Reusing introduced: Google Net v1 (model with LRN) inference via CAPI with bs=1 is ~9% faster on AVX512 platform e.g. SKX (8180)

paddle/fluid/platform/mkldnn_reuse.h

paddle/fluid/operators/mkldnn/lrn_mkldnn_op.cc

test=develop - compileation fix - Yet another compilation fix - Even yet another compilation fix - Surprise! Again compilation fix - lint fixes test=develop - Fix to workspace acquire of LRN test=develop - Fix to hash of BWD LRN test=develop - fix to lrn BWD PD acquire test=develop - Fixing LRN PD creation test=develop - cosmetic fix in comment test=develop - Fixes after review test=develop

Sand3r-

@jczaja Thank you so much for the contribution and the speedup it brings on GoogleNet v1!

jczaja · 2019-07-23T10:49:38Z

@luotao1 Could you please review this PR? Again your approval is needed to have CI passed

luotao1

LGTM

luotao1 · 2019-07-23T11:40:05Z

I test Google Net v1 (model with LRN) inference via CAPI with bs=1 on AVX2 platform e.g. E5-2650 v4.

./test_analyzer_image_classification --gtest_filter=Analyzer_resnet50.profile_mkldnn --batch_size=1 --warmup --repeat=100 --paddle_num_threads=4 --infer_model=googlenetfluid/ --profile

lrn from 241ms to 208ms, speedup 13%

before

Event              Calls       Total       Min.        Max.        Ave.        Ratio.
thread0::conv2d    5700        1881.6      0.085055    3.01909     0.330106    0.787164
**thread0::lrn       200         241.991     0.716039    2.08498     1.20995     0.101236**
thread0::pool2d    1400        178.921     0.0452      0.266267    0.127801    0.0748511
thread0::concat    900         81.7893     0.04769     0.496502    0.090877    0.0342163
thread0::fc        100         4.40745     0.042422    0.046222    0.0440745   0.00184385
thread0::fetch     100         1.21941     0.010564    0.013756    0.0121941   0.000510137
thread0::feed      100         0.427123    0.003667    0.005537    0.00427123  0.000178686

after

Event              Calls       Total       Min.        Max.        Ave.        Ratio.
thread0::conv2d    5700        1872.48     0.086678    3.06727     0.328505    0.799478
**thread0::lrn       200         208.1       0.540581    1.58649     1.0405      0.0888512**
thread0::pool2d    1400        178.047     0.046527    0.557823    0.127176    0.0760194
thread0::concat    900         77.3396     0.047206    0.470094    0.0859329   0.0330212
thread0::fc        100         4.48958     0.042211    0.063319    0.0448958   0.00191689
thread0::fetch     100         1.24835     0.011442    0.016381    0.0124835   0.000532998
thread0::feed      100         0.421286    0.003702    0.006086    0.00421286  0.000179874

luotao1 · 2019-07-23T11:40:54Z

~9% faster

Is this Op-level speedup or Model-level speedup? @jczaja

jczaja · 2019-07-23T11:52:34Z

@luotao1 Reported speedup is on model level (SKX 8180 AVX512). I stored only partial log:

before:
sample latency: 26.7129 fps: 37.43

...
thread0::lrn: 2524.55
...

After:
sample latency: 24.4794 fps: 40.8506

....
thread0::lrn: 1194.76
......

jczaja added performance tuning Intel Code Cleanup labels Jul 17, 2019

jczaja force-pushed the prv-lrn-acquire branch from 48bd59f to 6915d64 Compare July 18, 2019 13:56

jczaja assigned luotao1 Jul 18, 2019

jczaja requested a review from kbinias July 18, 2019 14:05

jczaja force-pushed the prv-lrn-acquire branch from 6915d64 to a2e69f1 Compare July 18, 2019 14:10

jczaja requested a review from Sand3r- July 22, 2019 09:18

Sand3r- reviewed Jul 22, 2019

View reviewed changes

paddle/fluid/platform/mkldnn_reuse.h Outdated Show resolved Hide resolved

paddle/fluid/operators/mkldnn/lrn_mkldnn_op.cc Outdated Show resolved Hide resolved

jczaja force-pushed the prv-lrn-acquire branch from a2e69f1 to 071d6ec Compare July 22, 2019 10:24

Sand3r- approved these changes Jul 23, 2019

View reviewed changes

luotao1 approved these changes Jul 23, 2019

View reviewed changes

luotao1 merged commit 95c1816 into PaddlePaddle:develop Jul 23, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MKL-DNN] Extended LRN with reusing via Acquire API#18675

[MKL-DNN] Extended LRN with reusing via Acquire API#18675
luotao1 merged 1 commit intoPaddlePaddle:developfrom
jczaja:prv-lrn-acquire

jczaja commented Jul 17, 2019 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Sand3r- left a comment

Uh oh!

jczaja commented Jul 23, 2019

Uh oh!

luotao1 left a comment

Uh oh!

luotao1 commented Jul 23, 2019

Uh oh!

luotao1 commented Jul 23, 2019

Uh oh!

jczaja commented Jul 23, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

jczaja commented Jul 17, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Sand3r- left a comment

Choose a reason for hiding this comment

Uh oh!

jczaja commented Jul 23, 2019

Uh oh!

luotao1 left a comment

Choose a reason for hiding this comment

Uh oh!

luotao1 commented Jul 23, 2019

Uh oh!

luotao1 commented Jul 23, 2019

Uh oh!

jczaja commented Jul 23, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jczaja commented Jul 17, 2019 •

edited

Loading