Skip to content

PaddlePaddle memory leak test=develop#18382

Closed
pawelpiotrowicz wants to merge 1 commit intoPaddlePaddle:developfrom
pawelpiotrowicz:pawepiot/paddlepaddle_memory_leak
Closed

PaddlePaddle memory leak test=develop#18382
pawelpiotrowicz wants to merge 1 commit intoPaddlePaddle:developfrom
pawelpiotrowicz:pawepiot/paddlepaddle_memory_leak

Conversation

@pawelpiotrowicz
Copy link
Contributor

@kbinias kbinias added the Intel label Jun 27, 2019
@kbinias kbinias requested review from kbinias and luotao1 June 27, 2019 13:01
@kbinias kbinias added NGraph and removed NGraph labels Jun 27, 2019
@luotao1
Copy link
Contributor

luotao1 commented Jun 27, 2019

What's the backgroud of this PR? Could you give the memory leak case?

@pawelpiotrowicz
Copy link
Contributor Author

@luotao1 In short, fix solves the issue below:

#11 0x48c329b in paddle::framework::details::OpInfoFiller<paddle::operators::Conv2DOpMaker, (paddle::framework::details::OpInfoFillType)1>::operator()(char const*, paddle::framework::OpInfo*) const /home/pawepiot/workspace/multi_instance_public/paddle-public/paddle/fluid/framework/details/op_registry.h:167
#12 0x48beb41 in paddle::framework::details::OperatorRegistrarRecursive<1ul, false, paddle::operators::ConvOp, paddle::operators::Conv2DOpMaker, paddle::operators::ConvOpInferVarType, paddle::operators::Conv2DGradMaker>::OperatorRegistrarRecursive(char const*, paddle::framework::OpInfo*) /home/pawepiot/workspace/multi_instance_public/paddle-public/paddle/fluid/framework/details/op_registry.h:136
#13 0x48bdb46 in paddle::framework::details::OperatorRegistrarRecursive<0ul, false, paddle::operators::ConvOp, paddle::operators::Conv2DOpMaker, paddle::operators::ConvOpInferVarType, paddle::operators::Conv2DGradMaker>::OperatorRegistrarRecursive(char const*, paddle::framework::OpInfo*) /home/pawepiot/workspace/multi_instance_public/paddle-public/paddle/fluid/framework/details/op_registry.h:139
#14 0x48bc225 in paddle::framework::OperatorRegistrar<paddle::operators::ConvOp, paddle::operators::Conv2DOpMaker, paddle::operators::ConvOpInferVarType, paddle::operators::Conv2DGradMaker>::OperatorRegistrar(char const*) /home/pawepiot/workspace/multi_instance_public/paddle-public/paddle/fluid/framework/op_registry.h:61
#15 0x48b8bd1 in __static_initialization_and_destruction_0 /home/pawepiot/workspace/multi_instance_public/paddle-public/paddle/fluid/operators/conv_op.cc:618
#16 0x48b8cf2 in _GLOBAL__sub_I_conv_op.cc /home/pawepiot/workspace/multi_instance_public/paddle-public/paddle/fluid/operators/conv_op.cc:652
#17 0x5bae6bc in __libc_csu_init (/home/pawepiot/workspace/multi_instance_public/paddle-public/build/paddle/fluid/inference/tests/api/test_analyzer_image_classification+0x5bae6bc)

SUMMARY: AddressSanitizer: 209193 byte(s) leaked in 8511 allocation(s).

@pawelpiotrowicz
Copy link
Contributor Author

FYI : @jczaja @LeoZhao-Intel

Copy link
Contributor

@kbinias kbinias left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Please add only more details about reproduction of the problem.

@pawelpiotrowicz
Copy link
Contributor Author

The issue comes from.

${workspaceFolder}/build/paddle/fluid/inference/tests/api/test_analyzer_image_classification --infer_model=${workspaceFolder}/build/third_party/inference_demo/googlenet/model --gtest_filter=Analyzer_resnet50.profile --use_ngraph --use_analysis=false --repeat=100 --paddle_num_threads=4 --num_threads=2

  • APPLY SANATIZER

@luotao1
Copy link
Contributor

luotao1 commented Jul 2, 2019

test_analyzer_image_classification do not have use_ngraph in develop branch. Do you change the related unit-test?

@luotao1
Copy link
Contributor

luotao1 commented Jul 2, 2019

Could you add more description of this memory-leak?

  • for multi-instance only or for both multi-instance and one-instance?
  • for use_ngraph=true only or for both --use_ngraph=false and --use_ngraph=true?
  • for --use_analysis=false only or for both --use_analysis=false and --use_analysis=true?

#18382 (comment), if you change the unit-tests, please paste the patch here. We will have a double check.

@LeoZhao-Intel
Copy link
Contributor

LeoZhao-Intel commented Jul 5, 2019

@pawelpiotrowicz my understanding is you find this memory leak with applying SANATIZER, can you give more details or general description on why/what this leak happens, it will help us to understand what's real issue.

For op_info, there is no such code to delete pointer ? so you use shared_ptr to keep it released at appropriate time for all those places, right?

@kbinias
Copy link
Contributor

kbinias commented Jul 12, 2019

Answers

for multi-instance only or for both multi-instance and one-instance?

only for one instance

for use_ngraph=true only or for both --use_ngraph=false and --use_ngraph=true?

both, --with use_ngraph=false and --use_ngraph=true

for --use_analysis=false only or for both --use_analysis=false and --use_analysis=true?

both, with --use_analysis=false and --use_analysis=true

For op_info, there is no such code to delete pointer ? so you use shared_ptr to keep it released at appropriate time for all those places, right?

Yes, destructor doesn't exist.

To reproduce these memory leak

  1. Merge the latest develop branch with PR [PROPOSAL] Add support for dynamic code analysis (Sanitizers) #18303 e.g. git merge kbinias/add-sanitizers
  2. Build PaddlePaddle with AddressSanitizer: cmake .. -DCMAKE_BUILD_TYPE=Debug -DWITH_TESTING=ON -DWITH_INFERENCE_API_TEST=ON -DON_INFER=ON -DWITH_PYTHON=ON -DWITH_NGRAPH=ON -DWITH_GPU=OFF -DSANITIZER_TYPE="Address"
  3. Run cmd: ./paddle/fluid/inference/tests/api/test_analyzer_image_classification --infer_model=./third_party/inference_demo/googlenet/model --gtest_filter=Analyzer_resnet50.profile --use_analysis=false --repeat=100 --paddle_num_threads=4 --num_threads=1

Important after applying the current PR, there will still be 2 errors that you can eliminate with PR #18622

@bingyanghuang
Copy link
Contributor

Answers

for multi-instance only or for both multi-instance and one-instance?

only for one instance

for use_ngraph=true only or for both --use_ngraph=false and --use_ngraph=true?

both, --with use_ngraph=false and --use_ngraph=true

for --use_analysis=false only or for both --use_analysis=false and --use_analysis=true?

both, with --use_analysis=false and --use_analysis=true

For op_info, there is no such code to delete pointer ? so you use shared_ptr to keep it released at appropriate time for all those places, right?

Yes, destructor doesn't exist.

To reproduce these memory leak

  1. Merge the latest develop branch with PR [PROPOSAL] Add support for dynamic code analysis (Sanitizers) #18303 e.g. git merge kbinias/add-sanitizers
  2. Build PaddlePaddle with AddressSanitizer: cmake .. -DCMAKE_BUILD_TYPE=Debug -DWITH_TESTING=ON -DWITH_INFERENCE_API_TEST=ON -DON_INFER=ON -DWITH_PYTHON=ON -DWITH_NGRAPH=ON -DWITH_GPU=OFF -DSANITIZER_TYPE="Address"
  3. Run cmd: ./paddle/fluid/inference/tests/api/test_analyzer_image_classification --infer_model=./third_party/inference_demo/googlenet/model --gtest_filter=Analyzer_resnet50.profile --use_analysis=false --repeat=100 --paddle_num_threads=4 --num_threads=1

Important after applying the current PR, there will still be 2 errors that you can eliminate with PR #18622

@luotao1 Here are the reproducible test case for sanitizers.

@LeoZhao-Intel
Copy link
Contributor

LGTM

@luotao1
Copy link
Contributor

luotao1 commented Aug 1, 2019

@zhupengyang Could you help reproduce the problem and verify whether PR fix the error?

@zhupengyang
Copy link
Contributor

zhupengyang commented Aug 1, 2019

@zhupengyang Could you help reproduce the problem and verify whether PR fix the error?

  • reproduce the problem:

1

2

@luotao1

@luotao1
Copy link
Contributor

luotao1 commented Aug 1, 2019

Thanks for @zhupengyang double-check work!
It seems the error reported in this PR is fixed, but still have some other errors. @kbinias @LeoZhao-Intel @pawelpiotrowicz

@luotao1 luotao1 requested review from Xreki and chengduoZH August 1, 2019 04:33
@chengduoZH
Copy link
Contributor

chengduoZH commented Aug 1, 2019

@pawelpiotrowicz Thanks for your work.
I am curious that does this really leed to memory leak? Because OpInfoFiller is called only when registering the operators, and those actions only happen only once for each operator during the training or testing.

So that I don't think we should use smart point here.

@pawelpiotrowicz
Copy link
Contributor Author

@chengduoZH @luotao1 In my opinion issue definitely must be solved, in future if we got advanced issue we will play with the noises from this issue, so the perspective to find real reason of error might be more complex for developer. If we can fix any warning/issue we should do this asap to make process pure. Smart pointer should be used here to prevent a way when use case will be changed.

@chengduoZH
Copy link
Contributor

@pawelpiotrowicz Thanks for your reply, if you want to fix this warning, I prefer to add destructive function in OpInfo but not use smart point here.

@pawelpiotrowicz
Copy link
Contributor Author

@chengduoZH

I prefer to add destructive function in OpInfo

Why? Do we have any reason to do this?
I don't see any advantages to applying this approach, moreover, the solution provides a way for making mistake in future. If a smart pointer can be used should be used - good practice.

@LeoZhao-Intel
Copy link
Contributor

@chengduoZH

I prefer to add destructive function in OpInfo

Why? Do we have any reason to do this?
I don't see any advantages to applying this approach, moreover, the solution provides a way for making mistake in future. If a smart pointer can be used should be used - good practice.

Since paddlepaddle is cross-platform framework, there was a concern that some compilers on specific platform may not support smart pointer well. That's one of the reason to use deconstruction I guess.

@luotao1
Copy link
Contributor

luotao1 commented Aug 28, 2019

there was a concern that some compilers on specific platform may not support smart pointer well.

Yes, it is. We may support some other language interface later.

@bingyanghuang
Copy link
Contributor

@pawelpiotrowicz Hi, pawel Do you still want to merge this PR? If yes, please update your PR by following Baidu's comments. If not, please close this PR. @kbinias

@pawelpiotrowicz
Copy link
Contributor Author

close

@paddle-bot-old
Copy link

paddle-bot-old bot commented Apr 6, 2021

Since you haven't replied for more than a year, we have closed this issue/pr.
If the problem is not solved or there is a follow-up one, please reopen it at any time and we will continue to follow up.
由于您超过一年未回复,我们将关闭这个issue/pr。
若问题未解决或有后续问题,请随时重新打开,我们会继续跟进。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants