Skip to content

Optimize emb_eltwise_layernorm_plugin and support fp16#27128

Merged
Shixiaowei02 merged 2 commits intoPaddlePaddle:developfrom
cryoco:optimize-emb_eltwise_plugin
Sep 18, 2020
Merged

Optimize emb_eltwise_layernorm_plugin and support fp16#27128
Shixiaowei02 merged 2 commits intoPaddlePaddle:developfrom
cryoco:optimize-emb_eltwise_plugin

Conversation

@cryoco
Copy link

@cryoco cryoco commented Sep 7, 2020

PR types

Function optimization

PR changes

Others

Describe

This commit is for issue #25014 and is based on PR #25003. It also adds fp16 support back to emb_eltwise_layernorm_plugin.

The patch for issue #25014 is to remove unnecessary data allocation and memory copies in enqueue. The experiment shows that it can improve the end-to-end performance of ERINE on NVIDIA Tesla T4 GPU by 8%

@paddle-bot-old
Copy link

paddle-bot-old bot commented Sep 7, 2020

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@cryoco cryoco force-pushed the optimize-emb_eltwise_plugin branch 2 times, most recently from 04a19e7 to 94471e0 Compare September 8, 2020 02:01
@cryoco cryoco force-pushed the optimize-emb_eltwise_plugin branch from 94471e0 to 8243d20 Compare September 8, 2020 02:49
@cryoco cryoco force-pushed the optimize-emb_eltwise_plugin branch 6 times, most recently from 5d22b05 to e2c376a Compare September 18, 2020 03:08
@cryoco cryoco force-pushed the optimize-emb_eltwise_plugin branch from e2c376a to 037bbe6 Compare September 18, 2020 03:11
Copy link
Member

@shangzhizhou shangzhizhou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Shixiaowei02 Shixiaowei02 merged commit a5ef246 into PaddlePaddle:develop Sep 18, 2020
@cryoco cryoco deleted the optimize-emb_eltwise_plugin branch September 18, 2020 10:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants