Skip to content

Support memory eager deletion on recurrent OP#17710

Merged
zhhsplendid merged 27 commits intoPaddlePaddle:developfrom
zhhsplendid:rnn_op
Jul 19, 2019
Merged

Support memory eager deletion on recurrent OP#17710
zhhsplendid merged 27 commits intoPaddlePaddle:developfrom
zhhsplendid:rnn_op

Conversation

@zhhsplendid
Copy link
Member

@zhhsplendid zhhsplendid commented May 29, 2019

Test PaddingRNN on V100 GPU device.

Test configuration: large model, padding mode (which is the mode using recurrentOp), one GPU.

GPU memory (MiB): 6414 (this PR) vs 6837 (without this PR)
Speed (steps/s): 10.28 (this PR) vs 9.89 (without this PR)

namespace operators {

const framework::VariableNameMap& OpVariant::Inputs() const {
return *boost::apply_visitor(InputsVisitor(), op_);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about move InputsVisitor, OutputsVisitor, AttributeMapVisitor to this source file as well?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can also move RawPointerVisitor to this source file.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

include(operators)
register_operators(DEPS naive_executor)
cc_library(while_op_helper SRCS while_op_helper.cc DEPS operator)
cc_library(op_variant SRCS op_variant.cc DEPS operator)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Target op_variant should depend on proto_desc as well? Since op_desc.cc is compiled in target proto_desc.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean program_desc.cc? Since I use program_desc.h, not op_desc.h. But there is no difference because they are both compiled in proto_desc

namespace paddle {
namespace operators {

using paddle::framework::OperatorBase;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not good to expose OperatorBase without namespace inside header file.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

auto &attrs = const_cast<framework::AttributeMap &>(op.Attrs());
VLOG(2) << "Prepare to skip " << attr.size()
<< " var(s): " << GetDebugString(attr);
<< " var(s): " << paddle::string::join_strings(attr, ' ');
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can simplify as string::join_strings(attr, ' ').

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

executor.Prepare(*program, block->ID(),
Attr<std::vector<std::string>>(
kSkipEagerDeletionVars) /*skip_ref_cnt_vars*/,
false /*force_disable_gc*/);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the last parameter false.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

using paddle::operators::OpAndGradOpPair;

// Pass class set skip eager deletion vars for recurrent ops
class RecurrentOpEagerDeletionPass : public ir::Pass {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RecurrentOpEagerDeletionPass can be placed inside recurrent_op_eager_deletion_pass.cc. Therefore, this header file is unnecessary.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can I keep it? I prefer every .cc file should have an associated .h file in general, except for some special cases.

for (const std::string &name : output_vars) {
fwd_skip_vars.insert(name);
}
SetSkipVars(fwd_op, fwd_skip_vars);
Copy link
Collaborator

@sneaxiy sneaxiy Jul 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kInitialStates should be skipped too? See here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discuss offline that maybe it doesn't have to be skipped

}
PADDLE_ENFORCE_NOT_NULL(matched_fwd_op, "Cannot find matched forward op");
SetRecurrentOpAndRecurrentGradOpSkipVarAttr(*matched_fwd_op, bwd_op);
recurrent_ops.erase(*matched_fwd_op);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if there are rest forward recurrent ops that have no gradient? You should also set skip vars in these ops.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

PADDLE_ENFORCE_EQ(
fwd_input.size(), in_grads.size(),
"Backward input gradient number does not match forward input number.");
for (size_t i = 0; i < in_grads.size(); ++i) {
Copy link
Collaborator

@sneaxiy sneaxiy Jul 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems wrong. You should review the codes of recurrent_op.cc to find out which variables should be skipped.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline, it may not be wrong.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline, it may not be wrong.

@@ -0,0 +1,460 @@
# Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
Copy link
Collaborator

@sneaxiy sneaxiy Jul 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As comments above, please add corresponding unittests.

  1. Run using ParallelExecutor.
  2. There are duplicate recurrent ops in the graph, even it is nested inside another recurrent op. See the nested while op tests inside here.
  3. There are recurrent ops with gradient and without gradient.
  4. Add unittests of ptb model. See here.
  5. Any other corner cases that should be concerned....

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

// See the License for the specific language governing permissions and
// limitations under the License.

#pragma once
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recommend to remove this header file.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reply to you offline.

namespace paddle {
namespace operators {
/*
constexpr char RecurrentBase::kInputs[];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove unused codes instead of using comments.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, this is where I forgot to remove

Copy link
Collaborator

@sneaxiy sneaxiy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@zhhsplendid zhhsplendid merged commit 89bc3fd into PaddlePaddle:develop Jul 19, 2019
zhhsplendid added a commit to zhhsplendid/Paddle that referenced this pull request Aug 26, 2019
Test PaddingRNN on V100 GPU device.

Test configuration: large model, padding mode (which is the mode using recurrentOp), one GPU.
                   
GPU memory (MiB):   6414 (this PR)     vs   6837 (without this PR)
Speed (steps/s):         10.28 (this PR)    vs    9.89 (without this PR)
zhhsplendid added a commit that referenced this pull request Aug 30, 2019
* Support memory eager deletion on recurrent OP (#17710)

Test PaddingRNN on V100 GPU device.

Test configuration: large model, padding mode (which is the mode using recurrentOp), one GPU.
                   
GPU memory (MiB):   6414 (this PR)     vs   6837 (without this PR)
Speed (steps/s):         10.28 (this PR)    vs    9.89 (without this PR)

* Fix random test_recurrent_op failure (#18718)

The change includes 3 things:

1. Set CPU_NUM to 1 in the tests because the ParallelExecutor will print warning that CPU_NUM is not set and use default 1.

2. Old tests compare two RNNs, hand written simple RNN and same RNN built by Paddle, but initialized RNN weights in numpy random and Paddle random separately. Fixed it by setting weights and bias values.

3. Also set numpy random seed in the tests. Now the two RNNs diff can be smaller (rtol from 0.1, 0.2 to. 0.01) in the tests.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants