Skip to content

Conversation

@AnnaTrainingG
Copy link
Contributor

@AnnaTrainingG AnnaTrainingG commented May 13, 2021

PR types

Performance optimization

PR changes

OPs

Describe

modified reduce_sum_op for higher_performance

reduce维度 加速比 与pytorch对比情况 说明
axis = 0 1.4 ~ 22.7 打平或者超过pytorch。 reduce最高维度
axis = -1 1.0 ~1.3 打平或者超过pytorch,17个case中有2个case差于pytorch,约为pytorch时间的2倍 reduce最低维度
axis = 1 2.44 ~24.88 打平或者超过pytorch, 17个case中有1个case差于pytorch,约为pytorch时间的2倍 reduce中间1维
axis =[] 1.0 ~1.03 打平或者超过pytorch, 17个case中有1个case差于pytorch,约为pytorch时间的2倍 reduce所有维度

reduce最高维度性能变化数据:

axis case pytorch us paddle_old us paddle_new us 加速比 old/new 加速比pytorch/padle_new 是否为benchmark
axis=0 [512    2048] 12.442 28.272 10.821 2.61 1.15
axis=0 [128    1024] 5.595 5.181 3.711 1.40 1.51
axis=0 [30522  1024] 162.77 1767.3 152.229 11.61 1.07
axis=0 [1024   16] 4.703 2.471 3.509 0.70 1.34
axis=0 [256    12800] 18.756 81.647 17.734 4.60 1.06
axis=0 [256    10240] 15.742 59.888 15.379 3.89 1.02
axis=0 [1024   1280] 11.625 33.204 8.399 3.95 1.38
axis=0 [32768  1280] 205.95 3504.7 198.15 17.69 1.04
axis=0 [30522  10240] 1414.6 32643 1437.523 22.71 0.98
axis=0 [16 16  1   1] 2.884 1.332 1.44 0.93 2.00

reduce中间维度性能变化数据

最终paddle性能 | reduce_sum/reduce_mean是使用相同的cudakernel因此此处的性能数据是以reduce_sum为主进行测试统计的。

axis case pytorch us paddle_old us paddle_new us 加速比 old/new 加速比pytorch/padle_new 是否为benchmark
axis = 1 [2  512 2048] 20.069 58.785 18.84 3.12 1.07
axis = 1 [2  128 1024] 5.864 9.161 3.76 2.44 1.56
axis = 1 [2  30522   1024] 296.54 4734 297.31 15.92 1.00
axis = 1 [2  1024    16] 5.163 2.678 3.37 0.80 1.53
axis = 1 [2  256 12800] 33.255 173.95 32.43 5.36 1.03
axis = 1 [2  256 10240] 27.616 120.5 26.23 4.59 1.05
axis = 1 [2  1024    1280] 18.325 66.735 18.69 3.57 0.98
axis = 1 [2  32768   1280] 390.39 8531.5 390.52 21.85 1.00
axis = 1 [2  30522   10240] 1420.7 71629 2878.59 24.88 0.49
axis= 1 [16 8   128] 3.95 4.253 1.31 3.24 3.01

reduce 最低维度性能测试数据:

axis case pytorch paddle paddle_new old/new pytorch/new
axis = -1 [2  512 2048] 13.39 12.303 11.952 1.03 1.12
axis = -1 [2  128 1024] 4.711 2.891 2.359 1.23 2.00
axis = -1 [2  30522   1024] 288.06 370.61 284.69 1.30 1.01
axis = -1 [2  1024    16] 2.446 4.15 4.152 1.00 0.59
axis = -1 [2  256 12800] 34.85 32.708 32.933 0.99 1.06
axis = -1 [2  256 10240] 27.327 26.282 26.274 1.00 1.04
axis = -1 [2  1024    1280] 15.162 16.674 14.456 1.15 1.05
axis = -1 [2  32768   1280] 381.14 431.74 378.42 1.14 1.01
axis = -1 [2  30522   10240] 1398.5 2791.8 2799.1 1.00 0.50
axis = -1 [16 8   128] 2.599 1.5 1.487 1.01 1.75

fp16 修改前后性能对比:

fp16 old fp16 new old/new
193.87us 126.78us 1.53
7.296us 2.4us 3.04
2.656us 4.25us 0.62
123.48us 123.48us 1.00

reduce_mean_benchmark 测试

case axis float32 new us old us old /new
[16, 2048, 33, 33] [2, 3] 179 199 1.11
[16, 8, 128] [1] 1.39 4.312 3.10
[16, 16, 1, 1] [0] 1.8 1.5650 0.86
[30522, 1024] [] 144.742 144.738 1.00

case 2 出现性能下降,原因case较小,波动影响较大

@paddle-bot-old
Copy link

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@CLAassistant
Copy link

CLAassistant commented May 14, 2021

CLA assistant check
All committers have signed the CLA.

@paddle-bot-old
Copy link

Sorry to inform you that a27a62c's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually.

@xingfeng01
Copy link
Contributor

LGTM

1 similar comment
@ZzSean
Copy link
Contributor

ZzSean commented Aug 3, 2021

LGTM

@thisjiang
Copy link
Contributor

LGTM

2 similar comments
@xingfeng01
Copy link
Contributor

LGTM

@ZzSean
Copy link
Contributor

ZzSean commented Aug 9, 2021

LGTM

Copy link
Contributor

@Xreki Xreki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Xreki Xreki merged commit 6a9fac1 into PaddlePaddle:develop Aug 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants