[hybrid] remove scale op in insert_scale_loss_grad_ops by wangxicoding · Pull Request #35775 · PaddlePaddle/Paddle

wangxicoding · 2021-09-15T11:51:44Z

PR types

Performance optimization

PR changes

Others

Describe

移除insert_scale_loss_grad_ops中插入的scale op，直接取出loss_grad_op也即fill_constant中的value值修改，可减少插入scale op的个数。理论可以提高一小丢丢丢性能。

develop loss scale
PR loss scale，0.0078125 = 0.5 * 0.015625

测试

Ernei3.0，base模型

速度，基本没啥变化，一小丢丢丢提升

develop(tokens/s)	PR(tokens/s
44924	44951

精度，对齐

后续优化TODO

pipeline里面可以将loss_grad_op放到LRSchedule里，一个step只执行一次，而非每个micro-step都执行。或者更激进点，设置为persistable，在startup_program里面初始化，一次训练只执行一次。

paddle-bot-old · 2021-09-15T11:51:50Z

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

sandyhouse

LGTM for code modification, but I think after this modification, the program description maybe confusing for others.

JZ-LIANG

LGTM. Thought this change is equivalent to origin, it breaks the strong assumption in framework that the gradient back-propogation starts from a constant ONE.

JZ-LIANG · 2021-09-16T02:41:57Z

LGTM. Thought this change is equivalent to origin, but it change the strong assumption in framework that the gradient back-propogation starts from a constant ONE.

Might need a comment to notify the later maintainer that the start point of gradient backward would change according to the DataParallel and ShardingParallel degree.

wangxicoding · 2021-09-16T02:51:09Z

LGTM. Thought this change is equivalent to origin, but it change the strong assumption in framework that the gradient back-propogation starts from a constant ONE.

Might need a comment to notify the later maintainer that the start point of gradient backward would change according to the DataParallel and ShardingParallel degree.

OK，add in next PR.

JZ-LIANG · 2021-09-16T02:52:23Z

python/paddle/fluid/optimizer.py

+                    "loss_grad_op must be fill_constant op, " \
+                    "but this op is {}".format(op.type)
+                assert op.has_attr('value')
+                loss_scale = float(op.attr('value'))


Mind the potential precision loss here. fill_constant op will cast the value into fp32 and then save as string into its OpDesc. reload & reset this value might cause precision loss when the denominator is odd （3，7， 11， etc）

loss_grad_op(that is fill_constant) use value instead of str_value, which AttrType is float. value will be saved with float32 in protobuf. If we encount precision problem, I this this must be caused by float AttrType, and I think double AttrType is better, which framework does not provide.

the type of op.attr('value') is already float64 in python, add float(op.attr('value')) is only for explicit.

…35775)

remove scale op, modify loss scale op scale value

32d5da1

fix test

d499125

wangxicoding requested review from JZ-LIANG, fuyinno4, gongweibao and sandyhouse September 16, 2021 02:12

sandyhouse approved these changes Sep 16, 2021

View reviewed changes

JZ-LIANG approved these changes Sep 16, 2021

View reviewed changes

wangxicoding merged commit 02b0be0 into PaddlePaddle:develop Sep 16, 2021

JZ-LIANG approved these changes Sep 16, 2021

View reviewed changes

wangxicoding deleted the hybrid_remove_loss_scale_op branch September 16, 2021 03:09

AnnaTrainingG pushed a commit to AnnaTrainingG/Paddle that referenced this pull request Sep 29, 2021

[hybrid] remove scale op in insert_scale_loss_grad_ops (PaddlePaddle#…

069e32e

…35775)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[hybrid] remove scale op in insert_scale_loss_grad_ops#35775

[hybrid] remove scale op in insert_scale_loss_grad_ops#35775
wangxicoding merged 2 commits intoPaddlePaddle:developfrom
wangxicoding:hybrid_remove_loss_scale_op

wangxicoding commented Sep 15, 2021 •

edited

Loading

Uh oh!

paddle-bot-old bot commented Sep 15, 2021

Uh oh!

sandyhouse left a comment

Uh oh!

JZ-LIANG left a comment •

edited

Loading

Uh oh!

JZ-LIANG commented Sep 16, 2021

Uh oh!

wangxicoding commented Sep 16, 2021

Uh oh!

JZ-LIANG Sep 16, 2021

Uh oh!

wangxicoding Sep 16, 2021

Uh oh!

wangxicoding Sep 16, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

wangxicoding commented Sep 15, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR types

PR changes

Describe

测试

后续优化TODO

Uh oh!

paddle-bot-old bot commented Sep 15, 2021

Uh oh!

sandyhouse left a comment

Choose a reason for hiding this comment

Uh oh!

JZ-LIANG left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JZ-LIANG commented Sep 16, 2021

Uh oh!

wangxicoding commented Sep 16, 2021

Uh oh!

JZ-LIANG Sep 16, 2021

Choose a reason for hiding this comment

Uh oh!

wangxicoding Sep 16, 2021

Choose a reason for hiding this comment

Uh oh!

wangxicoding Sep 16, 2021

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

wangxicoding commented Sep 15, 2021 •

edited

Loading

JZ-LIANG left a comment •

edited

Loading