Support memory eager deletion on recurrent OP#17710
Support memory eager deletion on recurrent OP#17710zhhsplendid merged 27 commits intoPaddlePaddle:developfrom
Conversation
| namespace operators { | ||
|
|
||
| const framework::VariableNameMap& OpVariant::Inputs() const { | ||
| return *boost::apply_visitor(InputsVisitor(), op_); |
There was a problem hiding this comment.
How about move InputsVisitor, OutputsVisitor, AttributeMapVisitor to this source file as well?
There was a problem hiding this comment.
You can also move RawPointerVisitor to this source file.
| include(operators) | ||
| register_operators(DEPS naive_executor) | ||
| cc_library(while_op_helper SRCS while_op_helper.cc DEPS operator) | ||
| cc_library(op_variant SRCS op_variant.cc DEPS operator) |
There was a problem hiding this comment.
Target op_variant should depend on proto_desc as well? Since op_desc.cc is compiled in target proto_desc.
There was a problem hiding this comment.
Do you mean program_desc.cc? Since I use program_desc.h, not op_desc.h. But there is no difference because they are both compiled in proto_desc
| namespace paddle { | ||
| namespace operators { | ||
|
|
||
| using paddle::framework::OperatorBase; |
There was a problem hiding this comment.
Not good to expose OperatorBase without namespace inside header file.
| auto &attrs = const_cast<framework::AttributeMap &>(op.Attrs()); | ||
| VLOG(2) << "Prepare to skip " << attr.size() | ||
| << " var(s): " << GetDebugString(attr); | ||
| << " var(s): " << paddle::string::join_strings(attr, ' '); |
There was a problem hiding this comment.
You can simplify as string::join_strings(attr, ' ').
| executor.Prepare(*program, block->ID(), | ||
| Attr<std::vector<std::string>>( | ||
| kSkipEagerDeletionVars) /*skip_ref_cnt_vars*/, | ||
| false /*force_disable_gc*/); |
There was a problem hiding this comment.
Remove the last parameter false.
| using paddle::operators::OpAndGradOpPair; | ||
|
|
||
| // Pass class set skip eager deletion vars for recurrent ops | ||
| class RecurrentOpEagerDeletionPass : public ir::Pass { |
There was a problem hiding this comment.
RecurrentOpEagerDeletionPass can be placed inside recurrent_op_eager_deletion_pass.cc. Therefore, this header file is unnecessary.
There was a problem hiding this comment.
Can I keep it? I prefer every .cc file should have an associated .h file in general, except for some special cases.
| for (const std::string &name : output_vars) { | ||
| fwd_skip_vars.insert(name); | ||
| } | ||
| SetSkipVars(fwd_op, fwd_skip_vars); |
There was a problem hiding this comment.
kInitialStates should be skipped too? See here.
There was a problem hiding this comment.
Discuss offline that maybe it doesn't have to be skipped
| } | ||
| PADDLE_ENFORCE_NOT_NULL(matched_fwd_op, "Cannot find matched forward op"); | ||
| SetRecurrentOpAndRecurrentGradOpSkipVarAttr(*matched_fwd_op, bwd_op); | ||
| recurrent_ops.erase(*matched_fwd_op); |
There was a problem hiding this comment.
What if there are rest forward recurrent ops that have no gradient? You should also set skip vars in these ops.
| PADDLE_ENFORCE_EQ( | ||
| fwd_input.size(), in_grads.size(), | ||
| "Backward input gradient number does not match forward input number."); | ||
| for (size_t i = 0; i < in_grads.size(); ++i) { |
There was a problem hiding this comment.
Seems wrong. You should review the codes of recurrent_op.cc to find out which variables should be skipped.
There was a problem hiding this comment.
Discussed offline, it may not be wrong.
There was a problem hiding this comment.
Discussed offline, it may not be wrong.
| @@ -0,0 +1,460 @@ | |||
| # Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved. | |||
There was a problem hiding this comment.
As comments above, please add corresponding unittests.
- Run using ParallelExecutor.
- There are duplicate recurrent ops in the graph, even it is nested inside another recurrent op. See the nested while op tests inside here.
- There are recurrent ops with gradient and without gradient.
- Add unittests of ptb model. See here.
- Any other corner cases that should be concerned....
test=develop
| // See the License for the specific language governing permissions and | ||
| // limitations under the License. | ||
|
|
||
| #pragma once |
There was a problem hiding this comment.
Recommend to remove this header file.
There was a problem hiding this comment.
Reply to you offline.
| namespace paddle { | ||
| namespace operators { | ||
| /* | ||
| constexpr char RecurrentBase::kInputs[]; |
There was a problem hiding this comment.
Remove unused codes instead of using comments.
There was a problem hiding this comment.
Sorry, this is where I forgot to remove
test=develop
test=develop
… rnn_op test=develop
It aims to handle CI cannot find reference to constexpr char[] test=develop
test=develop
test=develop
test=develop
test=develop
Test PaddingRNN on V100 GPU device.
Test configuration: large model, padding mode (which is the mode using recurrentOp), one GPU.
GPU memory (MiB): 6414 (this PR) vs 6837 (without this PR)
Speed (steps/s): 10.28 (this PR) vs 9.89 (without this PR)
* Support memory eager deletion on recurrent OP (#17710) Test PaddingRNN on V100 GPU device. Test configuration: large model, padding mode (which is the mode using recurrentOp), one GPU. GPU memory (MiB): 6414 (this PR) vs 6837 (without this PR) Speed (steps/s): 10.28 (this PR) vs 9.89 (without this PR) * Fix random test_recurrent_op failure (#18718) The change includes 3 things: 1. Set CPU_NUM to 1 in the tests because the ParallelExecutor will print warning that CPU_NUM is not set and use default 1. 2. Old tests compare two RNNs, hand written simple RNN and same RNN built by Paddle, but initialized RNN weights in numpy random and Paddle random separately. Fixed it by setting weights and bias values. 3. Also set numpy random seed in the tests. Now the two RNNs diff can be smaller (rtol from 0.1, 0.2 to. 0.01) in the tests.
Test PaddingRNN on V100 GPU device.
Test configuration: large model, padding mode (which is the mode using recurrentOp), one GPU.
GPU memory (MiB): 6414 (this PR) vs 6837 (without this PR)
Speed (steps/s): 10.28 (this PR) vs 9.89 (without this PR)