update optimizer for 2.0#26288
Conversation
|
Hi, It's a test PR, it will not trigger CI. If you want to trigger CI, please remove |
| beta1=0.9, | ||
| beta2=0.999, | ||
| epsilon=1e-8, | ||
| parameters=None, |
There was a problem hiding this comment.
parameters 的位置能上前移动么,毕竟动态图强依赖这个参数
There was a problem hiding this comment.
为了与其他优化器保持一致,暂时先不移动这个参数
| outputs={"ParamOut": param_and_grad[0]}) | ||
| return new_param_grads, (table_param, table_grad), sgd_op | ||
|
|
||
| def _append_dgc_ops(self, param_and_grad): |
There was a problem hiding this comment.
在DGCMomentum优化器中会重写并用到,这里主要是为了防止backward中报错
XiaoguangHu01
left a comment
There was a problem hiding this comment.
反馈几个小问题,可以先合入,然后再修改。
python/paddle/fluid/tests/unittests/test_fleet_graph_execution_meta_optimizer.py
Outdated
Show resolved
Hide resolved
| Related paper: `Adam: A Method for Stochastic Optimization <https://arxiv.org/abs/1412.6980>`_ | ||
|
|
||
| Args: | ||
| learning_rate (float|LearningRateDecay, optional): The learning rate used to update ``Parameter``. |
There was a problem hiding this comment.
learning_rate的类型 英文是float|LearningRateDecay,中文是float|Variable,保持一致哈,另外Variable->Tensor
| The default value is 0.999. | ||
| epsilon (float, optional): A small float value for numerical stability. | ||
| The default value is 1e-08. | ||
| parameters (list, optional): List of ``Tensor`` names to update to minimize ``loss``. \ |
There was a problem hiding this comment.
parameters的参数顺序 中英文保持一致哈
| indicate program pruning. If so, the program will be pruned by ``feed`` and | ||
| ``fetch_list`` before run, see details in ``Executor``. | ||
|
|
||
| Examples: |
| it is added here for numerical stability to prevent the division by 0 error. | ||
|
|
||
| Args: | ||
| learning_rate (float|LearningRateDecay, optional): The learning rate used to update ``Parameter``. |
There was a problem hiding this comment.
float|LearningRateDecay 还是 float|Tensor?
There was a problem hiding this comment.
float|LearningRateDecay ,中文文档后续更新
|
参数parameter_list 变为 parameters |
PR types
New features
PR changes
OPs
Describe
完善Adam、Adamax、Optimizer、RMSProp op
新增AdamW op
Optimizer类
参数parameter_list 变为 parameters
参数regularization 变为weight_decay,传入float类型时为L2Decay的系数
set_dict接口变为set_state_dict
动态图下新增step接口,替代minimize
current_step_lr接口变为get_lr
clear_gradicents变为clear_grad,原接口仍存在,作为clear_grad的alias接口
AdamOptimzer变为Adam、AdamaxOptimizer变为Adamax、RMSPropOptimizer变为RMSProp,其余改动与基类Optimizer相同。
新增AdamW类
继承自DecoupledWeightDecay、Adam
中文文档链接:PaddlePaddle/docs#2424