use static variable to do cache instead of thread local in thread frequent switching case#18428
Conversation
…uent switching case to avoid memory leak test=develop
|
Using static variable solution and API is to make it simple, otherwise if we use single globe structure, it makes more complicated, it need thread sync etc, and efficiency will drop. |
|
@LeoZhao-Intel, Discussed with @Superjomn :
|
-1 is just specific for mkldnn single instance but with thread frequent switching, so far we don't have unit-test.
In mkldnn single instance but with thread frequent switching, it only supports one instance, yes, it will fail if with multi-instances. We do have plan to refine this solution in the future. |
|
@luotao1 , @Superjomn Why GPU execution needs to store TransferScope in TLS, why each thread has to keep its own copy of scopes and data caches eg. what is wrong with having only one global data and scope cache? |
@jczaja CUDADeviceContext is thread_local, i.e. one GPU-card have one CUDADeviceContext. |
test=develop
|
I guess, it is similar with mkldnn cache, to support multi-instance, thread_local is used to simply the implementation. |
test=develop
test=develop
test=develop
In mac_ci |
test=develop
Since Mac don't support MKL and MKLDNN now, we should skip `TEST(Analyzer_bert, static_transfer_scope_cache)` ON Mac test=develop
|
@LeoZhao-Intel Could you revert this PR since #18578 merge and there is no random fail CI on MKLDNN? |
ok, will revert it soon. |
…read frequent switching case (PaddlePaddle#18428)" This reverts commit ce38bb5. test=develop
|
see new PR #18879 for revert, |
This case is special, NativeExecutor Predictor.run is called in a thread frequent switching case, for mkldnn, TransferScope cache creation is unavoidable. While current cache mechanism is using thread local variable to store cache, if execution thread is changed every iteration, it will always create new memory which makes big memory leak.
The solution is to use a global static variable to handle this case by explicitly indicating from user. Here I reuse SetMkldnnThreadid.
fix #18372 (comment)