Skip to content

Enhance gc to support deleting tensor buffer in advance#16409

Merged
sneaxiy merged 7 commits intoPaddlePaddle:developfrom
sneaxiy:feature/advance_gc
Mar 27, 2019
Merged

Enhance gc to support deleting tensor buffer in advance#16409
sneaxiy merged 7 commits intoPaddlePaddle:developfrom
sneaxiy:feature/advance_gc

Conversation

@sneaxiy
Copy link
Collaborator

@sneaxiy sneaxiy commented Mar 23, 2019

For example, gc can collect data buffer of input Y in elementwise_add_grad op before this op runs. In the meanwhile, shape and lod of Y can be kept when elementwise_add_grad op runs.

Op developers can use DECLARE_NO_NEED_BUFFER_VARS_INFERENCE to declare a class that indicates the unused-buffer tensors.

For example, inputs of concat op x0, x1, x2 can be deleted before concat_grad op runs:
image

@sneaxiy sneaxiy force-pushed the feature/advance_gc branch 5 times, most recently from dbb80d6 to 134e9a3 Compare March 24, 2019 03:22
refine gc code
test=develop
@sneaxiy sneaxiy force-pushed the feature/advance_gc branch from 134e9a3 to a93a9ee Compare March 24, 2019 08:15
test=develop
@sneaxiy sneaxiy force-pushed the feature/advance_gc branch 4 times, most recently from 7b675aa to 493cd06 Compare March 25, 2019 07:56
test=develop
@sneaxiy sneaxiy force-pushed the feature/advance_gc branch 4 times, most recently from 06b1fd6 to adf5e09 Compare March 25, 2019 12:12
@sneaxiy sneaxiy requested review from chengduoZH and panyx0718 March 25, 2019 12:35
sneaxiy added 2 commits March 26, 2019 04:05
fix ctest eager deletion disable bug
test=develop
@sneaxiy sneaxiy force-pushed the feature/advance_gc branch from adf5e09 to 7b72c11 Compare March 26, 2019 06:04
@sneaxiy sneaxiy requested a review from liupluswei March 26, 2019 06:09
@sneaxiy sneaxiy force-pushed the feature/advance_gc branch from 7b72c11 to 796cc2c Compare March 26, 2019 07:17
@sneaxiy sneaxiy force-pushed the feature/advance_gc branch from 796cc2c to 78fb3a6 Compare March 26, 2019 09:30
template <typename T>
struct OpInfoFiller<T, kNoNeedBufferVarsInference> {
void operator()(const char* op_type, OpInfo* info) const {
info->infer_no_need_buffer_vars_ = [](const VariableNameMap& inputs,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious about when will these three parameters be used to get the NoNeedBufferVars, seems now we just return the parameters specified in the macro as an unordered_set?

Copy link
Collaborator Author

@sneaxiy sneaxiy Mar 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reserve these parameters for future use. Some ops may not need some forward inputs or outputs when some attribute is true/false. For example, batch_norm_grad_op does not need Bias when use_mkldnn is false.

static constexpr OpInfoFillType kFillType = kType;
};

using OpRegistryClasses = std::tuple< // NOLINT
Copy link
Collaborator Author

@sneaxiy sneaxiy Mar 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugly but scalable codes here. I rewrite OpInfoFillTypeID::ID() method because the character number limit is set to be 80 in a line.

if (ctx->HasOutput(framework::GradVarName("X"))) {
auto out_dims = ctx->GetInputDim(framework::GradVarName("Out"));
ctx->SetOutputDim(framework::GradVarName("X"), out_dims);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the above code is confusing, if the ctx->HasOutput(framework::GradVarName("X") is False, there not need call AddPositionEncodingOpGrad .

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am confused with these codes too. I just follow the original logic.

chengduoZH
chengduoZH previously approved these changes Mar 27, 2019
++iter) {
bool ok;
auto result =
ExtractComputationOpFromLastLivedVar(*iter, i, shrink_func, &ok);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually I am a little bit confused about ExtractComputationOpFromLastLivedVar function. I know it want to get the last op which will use this variable, but in reference count part, do we need to care about which op generate this variable? What's the impact if we only care about the ops use this variable (the variable is used in the input part)

Copy link
Collaborator Author

@sneaxiy sneaxiy Mar 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ExtractComputationOpFromLastLivedVar returns (1) the last ops which read this variable if read ops exist, or (2) the last one op which writes this variables if read op does not exist. See HERE in details.

continue;
}

for (auto &out_pair : op_base->Outputs()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to care about outputs here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we should care about outputs for in-place operation.

@sneaxiy sneaxiy merged commit c7c6eeb into PaddlePaddle:develop Mar 27, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants