Skip to content

Grid_sampler optimization#39751

Merged
ZzSean merged 33 commits intoPaddlePaddle:developfrom
AshburnLee:grid_sampler_fw_bilinear
Feb 28, 2022
Merged

Grid_sampler optimization#39751
ZzSean merged 33 commits intoPaddlePaddle:developfrom
AshburnLee:grid_sampler_fw_bilinear

Conversation

@AshburnLee
Copy link
Copy Markdown
Contributor

@AshburnLee AshburnLee commented Feb 20, 2022

PR types

Performance optimization

PR changes

OPs

Describe

功能

  • 经过开发测试,3Dkernel整体性能不能优于优化前的1D kernel。
  • 经过分析,发现该OP实现过程中存在重复操作,导致每次该op执行时都会有一个EigenMetaKernel被launch,而该kernel的耗时占比不能被忽略,故删除。
  • 经进一步分析,当block大小为512,经输出img计算得到的grid大小远小于SM数(V100 80个SM),而相同的case,竞品block设为256(paddle设为256后,实际性能整体差于竞品,故保持512),grid大小为74,接近SM数。故代码中添加了对于block大小为512时,grid大小的判断和重新设置,LaunchConfig1D中有类似的处理。效果如下

在模型20个case上的效果

前向
截屏2022-02-28 14 48 14

反向
截屏2022-02-23 20 08 31

结论

  • 前向:将SM数考虑进去后,模型case性能优于develop,除case#7(从差于竞品10.79%距离缩小到差于8.38%),其他不差于竞品。对与上次优化输出img为300*4的5个case,与竞品差距大幅度减小(分别是9.11%->2.14%、10.79%->8.38%、12.94%->3.32%、13.93%->4.49%、10.24%->1.53%)。
  • 反向:模型case性能优于develop。但是由于反向逻辑存在原子操作,其掩盖了上述处理得到的性能收益(同一个case的前/反向有相同的处理规模,但前/反向的耗时差距很大,原子操作是瓶颈)。
  • op benchmark case 较优化前有明显提升(见CI-op-benchmark)。

update Paddle USERNAME repo
@paddle-bot-old
Copy link
Copy Markdown

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@AshburnLee AshburnLee changed the title Grid sampler fw bilinear Grid_sampler optimization Feb 25, 2022
Copy link
Copy Markdown
Contributor

@ZzSean ZzSean left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ZzSean ZzSean merged commit 2c66775 into PaddlePaddle:develop Feb 28, 2022
@AshburnLee AshburnLee deleted the grid_sampler_fw_bilinear branch February 28, 2022 07:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants