add custom init grad for backward function#31540
add custom init grad for backward function#31540MingMingShangTian merged 23 commits intoPaddlePaddle:developfrom
Conversation
|
Thanks for your contribution! |
|
|
||
| @framework.dygraph_only | ||
| def backward(self, retain_graph=False): | ||
| def backward(self, retain_graph=False, grad_tensor=None): |
There was a problem hiding this comment.
这个grad_tensor放到retain_graph前面是不是更合理一些,因为相对来说使用评率会更高,放到前面可能会引入一些兼容性风险,但长期更合理,可以看看目前框架内有多少使用retain_graph的测试
| :code:`retain_graph` to True, then the grads will be retained. Thus, seting it to False is much more memory-efficient. | ||
| Defaults to False. | ||
|
|
||
| grad_tensor(Tensor, optional): initial gradient values of `outputs` . If `grad_tensor` is None, |
There was a problem hiding this comment.
因为这个目前是的Tensor的API,所以这里initial gradient values of outputs改为initial gradient values of current Tensor是不是更容易理解
| Defaults to False. | ||
|
|
||
| grad_tensor(Tensor, optional): initial gradient values of `outputs` . If `grad_tensor` is None, | ||
| the initial gradient values of `outputs` would be Tensor filled with 1; |
|
|
||
| grad_tensor(Tensor, optional): initial gradient values of `outputs` . If `grad_tensor` is None, | ||
| the initial gradient values of `outputs` would be Tensor filled with 1; | ||
| if `grad_tensor` is not None, it must have the same length as `outputs`. |
| grad_tensor=paddle.to_tensor(2.) | ||
| for i in range(5): | ||
| y = paddle.pow(x, 4.0) | ||
| y.backward(grad_tensor=grad_tensor) |
| else: | ||
| self._run_backward(framework._dygraph_tracer(), retain_graph) | ||
| if grad_tensor is not None: | ||
| assert grad_tensor.shape == self.shape, "Variable Shape not match, Variable of grad_tensor [ {} ] with shape {} mismatch Variable [ {} ] with shape {}".format( |
There was a problem hiding this comment.
Variable -> Tensor,not match -> does not match,语句再组织一下,语法不太通顺,Variable统一改为使用Tensor
| namespace imperative { | ||
|
|
||
| void BasicEngine::Init(VarBase* var, bool retain_graph) { | ||
| void BasicEngine::Init(VarBase* var, bool retain_graph, VarBase* grad_tensor) { |
There was a problem hiding this comment.
可以把 grad_tensor 设置为默认从参数nullptr
There was a problem hiding this comment.
声明处默认参数为nullptr
paddle/fluid/pybind/imperative.cc
Outdated
| .def("_run_backward", | ||
| [](imperative::VarBase &self, const imperative::Tracer &tracer, | ||
| bool retain_graph) { | ||
| bool retain_graph, imperative::VarBase &grad_tensor) { |
There was a problem hiding this comment.
这里需要处理下默认参数, 后面加下py::arg("grad_tensor") = nullptr
| grad_var->Resize(fwd_var.dims()); | ||
| grad_var->mutable_data(fwd_var.place(), fwd_var.type()); | ||
| operators::math::set_constant(*dev_ctx, grad_var, 1.0); | ||
| paddle::framework::TensorCopy(grad_tensor->Var().Get<framework::LoDTensor>(), |
There was a problem hiding this comment.
grad_tensor=nullptr时,就设为1
| else: | ||
| self._run_backward(framework._dygraph_tracer(), retain_graph) | ||
| if grad_tensor is None: | ||
| grad_tensor = paddle.ones_like(self) |
There was a problem hiding this comment.
为None的_run_backward传两个参数,不为None的话_run_backward传三个参数。这样兼容升级就完全不影响以前的模型
There was a problem hiding this comment.
这里改为std::shared_ptr 之后,None类型会转为nullptr,传3个参数可统一调用
| if grad_tensor is not None: | ||
| assert isinstance( | ||
| grad_tensor, core. | ||
| VarBase), "The type of grad_tensot must be paddle.VarBase" |
There was a problem hiding this comment.
paddle.VarBase -> paddle.Tensor?
| assert isinstance( | ||
| grad_tensor, core. | ||
| VarBase), "The type of grad_tensot must be paddle.VarBase" | ||
| assert grad_tensor.shape == self.shape, "Variable shape not match, Variable of grad_tensor [ {} ] with shape {} mismatch Variable [ {} ] with shape {}".format( |
There was a problem hiding this comment.
Variable shape -> Tensor shape, Variable的描述统一使用Tensor
| grad_var->mutable_data(fwd_var.place(), fwd_var.type()); | ||
| operators::math::set_constant(*dev_ctx, grad_var, 1.0); | ||
| } else { | ||
| paddle::framework::TensorCopy( |
There was a problem hiding this comment.
这里是否需要check,grad_tensor的维度和var的维度是否一致呢?
|
|
||
| @framework.dygraph_only | ||
| def backward(self, retain_graph=False): | ||
| def backward(self, grad_tensor=None, retain_graph=False): |
There was a problem hiding this comment.
这里只能处理一个tensor吧。如果要处理多个grad tensor呢?
There was a problem hiding this comment.
可以循环处理吗?单独的backward接口使用很低频
chenwhql
left a comment
There was a problem hiding this comment.
报错提示建议再完善一下,建议首字母大写,结尾加句点,避免用must be,should be这种语气,直接告诉用户哪里错了,然后可以怎么改
| PADDLE_ENFORCE_EQ( | ||
| tensors.size(), grad_tensors.size(), | ||
| platform::errors::Unavailable( | ||
| "the size of tensors must equal the size of grad_tensors, but" |
There was a problem hiding this comment.
Recommend to capitalize the first letter, the -> The
|
|
||
| PADDLE_ENFORCE_EQ( | ||
| var->HasGradVar(), true, | ||
| platform::errors::NotFound("Grad variable not exist for variable %s", |
There was a problem hiding this comment.
The message is a bit strange, maybe can tell users Tensor %s has no gradient directly
| auto var = tensors[i]; | ||
| auto grad_tensor = grad_tensors[i]; | ||
|
|
||
| auto init_node_ = var->GradVarBase()->GradNode(); |
There was a problem hiding this comment.
temp var doesn't need _, use init_node as name directly
| std::vector<GradientAccumulator*> leaf_accumulators_; | ||
|
|
||
| bool retain_graph_; | ||
| bool create_graph_; |
There was a problem hiding this comment.
Where use the create_graph_?
There was a problem hiding this comment.
Not used, removed.
| py::call_guard<py::gil_scoped_release>()); | ||
|
|
||
| m.def( | ||
| "dygraph_run_backward", |
There was a problem hiding this comment.
this method no need show to users, use _ in the begining of method name, maybe still use _run_backward is better
There was a problem hiding this comment.
dygraph_run_backward -> _run_backward, 不带下划线开头的是公开API,这个API不需要开放给用户
paddle/fluid/pybind/imperative.cc
Outdated
| .def("_run_backward", | ||
| [](imperative::VarBase &self, const imperative::Tracer &tracer, | ||
| bool retain_graph) { | ||
| [](std::shared_ptr<imperative::VarBase> &self, |
There was a problem hiding this comment.
remove this method, call core._run_backward directly in python
There was a problem hiding this comment.
removed this method, but core._run_backward is private which can not be find in python. use core.dygraph_run_backward instead.
| from . import backward_mode | ||
| from .backward_mode import backward | ||
|
|
||
| __all__ = ['grad'] |
There was a problem hiding this comment.
also need backward in all here
There was a problem hiding this comment.
the next line __all__ += backward_mode.__all__ add the backward
| tensors = check_tensors(tensors, "tensors") | ||
|
|
||
| assert len(tensors) == len(set( | ||
| tensors)), "the arg tensors should not contains same element" |
There was a problem hiding this comment.
use complete word, The argument 'tensors' of paddle.autograd.backward contains duplicate paddle.Tensor object.
| if each_tensor is not None: | ||
| assert isinstance( | ||
| each_tensor, paddle.Tensor | ||
| ), "grad_tensors must be None, Tensor or list containing None or Tensor" |
There was a problem hiding this comment.
Confusing, The argument 'grad_tensors' of paddle.autograd.backward is invalid, it can be 'None', 'paddle.Tensor' or 'list[None/paddle.Tensor]'.
|
同意豁免PR-CI-ROCM-Compile ,代码与ROCM无关 |

PR types
New features
PR changes
APIs
Describe
When computing the tensor's backward, the initial grad_tensor is dafault set as paddle.ones. This PR add new kwargs
grad_tensorfor user to self define the staring grad. If thegrad_tensornot set, use the default value which is paddle.ones.Doc preview
