Skip to content

Refine clip_by_global_norm#38209

Merged
zhangbo9674 merged 6 commits intoPaddlePaddle:developfrom
zhangbo9674:dev/clip_by_global_norm
Dec 27, 2021
Merged

Refine clip_by_global_norm#38209
zhangbo9674 merged 6 commits intoPaddlePaddle:developfrom
zhangbo9674:dev/clip_by_global_norm

Conversation

@zhangbo9674
Copy link
Copy Markdown
Contributor

PR types

Performance optimization

PR changes

APIs

Describe

优化ClipByGlobalNorm性能:
以10*10的paddle.nn.Linear为例,重复进行100轮优化,clip_by_global_nrom的调用耗时分析如下:
(1)优化前:
图片
(2)优化后:
图片
优化前后耗时比为:0.77/0.54=1.43,优化29.9%。

@paddle-bot-old
Copy link
Copy Markdown

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

zhiqiu
zhiqiu previously approved these changes Dec 27, 2021
Copy link
Copy Markdown
Contributor

@zhiqiu zhiqiu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment thread python/paddle/fluid/clip.py Outdated
if g.dtype == core.VarDesc.VarType.FP16 else clip_var)
new_grad = layers.elementwise_mul(x=g, y=clip_input)
params_and_grads.append((p, new_grad))
if global_norm_var > max_global_norm:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

建议在上面把 global_norm_var > max_global_norm 处理为一个bool flag,这样不用在循环里run多次compare OP

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tks, Done!

@zhangbo9674 zhangbo9674 merged commit 65f7fa0 into PaddlePaddle:develop Dec 27, 2021
@zhangbo9674 zhangbo9674 deleted the dev/clip_by_global_norm branch March 2, 2023 02:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants