Skip to content

use static variable to do cache instead of thread local in thread frequent switching case#18428

Merged
luotao1 merged 10 commits intoPaddlePaddle:developfrom
LeoZhao-Intel:mt_memoryleak
Jul 8, 2019
Merged

use static variable to do cache instead of thread local in thread frequent switching case#18428
luotao1 merged 10 commits intoPaddlePaddle:developfrom
LeoZhao-Intel:mt_memoryleak

Conversation

@LeoZhao-Intel
Copy link
Contributor

@LeoZhao-Intel LeoZhao-Intel commented Jul 1, 2019

This case is special, NativeExecutor Predictor.run is called in a thread frequent switching case, for mkldnn, TransferScope cache creation is unavoidable. While current cache mechanism is using thread local variable to store cache, if execution thread is changed every iteration, it will always create new memory which makes big memory leak.

The solution is to use a global static variable to handle this case by explicitly indicating from user. Here I reuse SetMkldnnThreadid.

fix #18372 (comment)

…uent switching case

to avoid memory leak

test=develop
@LeoZhao-Intel
Copy link
Contributor Author

Using static variable solution and API is to make it simple, otherwise if we use single globe structure, it makes more complicated, it need thread sync etc, and efficiency will drop.

@luotao1 luotao1 added the Intel label Jul 1, 2019
@luotao1 luotao1 requested a review from Superjomn July 1, 2019 06:04
@luotao1
Copy link
Contributor

luotao1 commented Jul 1, 2019

@LeoZhao-Intel, Discussed with @Superjomn :

  1. Do you have some comments for "-1", or do you have some unit-test to ensure this "-1"?
  2. Do you need add lock for static_transfer_scope_cache? When multi-instance use "-1" mode, it will fail on this.

@LeoZhao-Intel
Copy link
Contributor Author

@LeoZhao-Intel, Discussed with @Superjomn :

  1. Do you have some comments for "-1", or do you have some unit-test to ensure this "-1"?

-1 is just specific for mkldnn single instance but with thread frequent switching, so far we don't have unit-test.

  1. Do you need add lock for static_transfer_scope_cache? When multi-instance use "-1" mode, it will fail on this.

In mkldnn single instance but with thread frequent switching, it only supports one instance, yes, it will fail if with multi-instances. We do have plan to refine this solution in the future.

@jczaja
Copy link
Contributor

jczaja commented Jul 1, 2019

@luotao1 , @Superjomn Why GPU execution needs to store TransferScope in TLS, why each thread has to keep its own copy of scopes and data caches eg. what is wrong with having only one global data and scope cache?

@luotao1
Copy link
Contributor

luotao1 commented Jul 2, 2019

Why GPU execution needs to store TransferScope in TLS? why each thread has to keep its own copy of scopes and data caches eg. what is wrong with having only one global data and scope cache?

@jczaja CUDADeviceContext is thread_local, i.e. one GPU-card have one CUDADeviceContext.

@LeoZhao-Intel
Copy link
Contributor Author

I guess, it is similar with mkldnn cache, to support multi-instance, thread_local is used to simply the implementation.

luotao1
luotao1 previously approved these changes Jul 3, 2019
Copy link
Contributor

@luotao1 luotao1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@luotao1
Copy link
Contributor

luotao1 commented Jul 3, 2019

[14:29:15]	/home/teamcity/work/e84e6e698a3f913d/paddle/fluid/inference/tests/api/analyzer_bert_tester.cc:256:32: error: no member named 'set_cur_mkldnn_session_id' in namespace 'paddle::platform'
[14:29:15]	      if (is_static) platform::set_cur_mkldnn_session_id(1);
[14:29:15]	                     ~~~~~~~~~~^

In mac_ci

LeoZhao-Intel and others added 2 commits July 4, 2019 06:48
Since Mac don't support MKL and MKLDNN now, we should skip `TEST(Analyzer_bert, static_transfer_scope_cache)` ON Mac

test=develop
Copy link
Contributor

@luotao1 luotao1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@luotao1 luotao1 merged commit ce38bb5 into PaddlePaddle:develop Jul 8, 2019
@luotao1
Copy link
Contributor

luotao1 commented Jul 29, 2019

@LeoZhao-Intel Could you revert this PR since #18578 merge and there is no random fail CI on MKLDNN?

@LeoZhao-Intel
Copy link
Contributor Author

@LeoZhao-Intel Could you revert this PR since #18578 merge and there is no random fail CI on MKLDNN?

ok, will revert it soon.

LeoZhao-Intel added a commit to LeoZhao-Intel/Paddle that referenced this pull request Jul 29, 2019
…read frequent switching case (PaddlePaddle#18428)"

This reverts commit ce38bb5.

test=develop
@LeoZhao-Intel
Copy link
Contributor Author

see new PR #18879 for revert,

luotao1 pushed a commit that referenced this pull request Jul 30, 2019
…read frequent switching case (#18428)" (#18879)

This reverts commit ce38bb5.

test=develop
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants