fuse L2Decay and momentum when param.regularizer is set#32845
Merged
zhangting2020 merged 5 commits intoPaddlePaddle:developfrom Jun 10, 2021
Merged
fuse L2Decay and momentum when param.regularizer is set#32845zhangting2020 merged 5 commits intoPaddlePaddle:developfrom
zhangting2020 merged 5 commits intoPaddlePaddle:developfrom
Conversation
|
Thanks for your contribution! |
ee3c9b3 to
47558e3
Compare
47558e3 to
58eda79
Compare
|
Sorry to inform you that e88475d's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually. |
d0439a9 to
4621e0c
Compare
4621e0c to
b0ca588
Compare
b0ca588 to
83ab8c5
Compare
Xreki
reviewed
Jun 4, 2021
python/paddle/optimizer/momentum.py
Outdated
Contributor
There was a problem hiding this comment.
L270 - L297可以直接写成调用基类的_create_regularization_of_grad函数?
6b21f7f to
c7fa29e
Compare
zhangting2020
added a commit
to zhangting2020/Paddle
that referenced
this pull request
Jun 10, 2021
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
PR types
Performance optimizationPR changes
OthersDescribe
fuse L2Decay and momentum when param.regularizer is setbefore
当前Paddle支持momentum + L2Decay的融合:
Paddle/python/paddle/optimizer/momentum.py
Lines 108 to 115 in 1ef2327
_append_optimize_op时,通过设置momentum op的以下参数,将weight_decay和momentum计算都在momentum op中完成,达到融合的目的Paddle/python/paddle/optimizer/momentum.py
Lines 209 to 210 in 1ef2327
但是如果模型中通过momentum的weight_decay参数设置了全局的regularizer=L2Decay,但是某些层又通过paddle.ParamAttr设置了特定的regularizer,则会发生以下情况:
_append_optimize_op时,设置momentum op的参数,以实现融合append_regularization_ops(params_grads, self.regularization)以及self._create_optimization_pass(params_grads)_create_regularization_of_grad完成weight_decay,如下代码,会执行param的regularizerPaddle/python/paddle/fluid/regularizer.py
Lines 25 to 40 in 1ef2327
_append_optimize_op,因(1)中设置了self._regularization_method和self._regularization_coeff,将会导致momentum op中再次做weight_decayafter
由于在
append_regularization_ops(params_grads, self.regularization)中会遍历所有参数,执行参数的regularization。如果是使用momentum,则需要在遍历参数时,判断参数的regularizer是否为L2Decay,如果是,则跳过做regularization。然后在_append_optimize_op时,去设置momentum op的regularization_method参数。因此本PR做了以下修改:append_regularization_ops和_create_regularization_of_grad删除,移动到了optimizer.py文件中,作为Optimizer Class的实例方法。这样保证了不影响到其他优化器。Paddle/python/paddle/fluid/regularizer.py
Lines 25 to 108 in 5fa44c3
_create_regularization_of_grad方法,和父类此方法唯一的区别是:当param设置了L2Decay,就直接跳过参数的regularization。具体参考本PR中momentum.py文件的修改:综上,只要参数指定的regularizer是L2Decay,就会用该参数的regularizer替代全局的设置,避免了进行2次regularization,同时依然达到融合的效果。
performance
拿TSM进行测试,该模型为一些参数设置了自己的regularizer=L2Decay,bug修复前,会导致某些参数进行2次regularization。从profile report中可以看到,会有多次scale和sum的调用。
同时该bug可能还影响了收敛速度和精度。对比了bug修复前的训练log: