[DO NOT MERGE] detect model test2 for dynamic shape by luotao1 · Pull Request #18372 · PaddlePaddle/Paddle

luotao1 · 2019-06-27T10:11:28Z

combine #18331 and #18285
@jianhang-liu @LeoZhao-Intel

test=develop

… cache_clearing

test=develop

2. Few fix in concat/pool mkldnn kernel for key generation 3. Enable cache clearing mechanism test=develop

test=develop

into detect_model_test

luotao1 · 2019-06-27T10:21:59Z

634d8c6 is used to speedup find. Since Map's find is faster than vector.

test=develop

…m/pawelpiotrowicz/Paddle into detect_model_test

luotao1 · 2019-06-29T02:50:09Z

paddle/fluid/inference/tests/api/analyzer_detect_tester.cc

+}  // namespace inference
+}  // namespace paddle
+
+// following lines are used for pprof


You can remove followings main function if you do not want to use pprof.

luotao1 · 2019-06-29T02:55:27Z

6d5a841 support pprof to find memory leak.

install
- gperftools: https://github.com/gperftools/gperftools
- dot: yum install graphviz.x86_64
- ps2pdf: yum install ghostscript.x86_64
Heap-checking: HEAPCHECK=local ./paddle/fluid/inference/tests/api/test_analyzer_detect
Heap-profiling: pprof --pdf ./paddle/fluid/inference/tests/api/test_analyzer_detect /tmp/test_analyzer_detect.11261.test_foo-end.heap > test_sample10.pdf

luotao1 · 2019-06-29T03:51:42Z

pprof_sample100.pdf
pprof_sample1000.pdf
Is global_transfer_data_cache and global_transfer_scope_cache memory leak?

Same problem in #15032 (comment)

LeoZhao-Intel · 2019-06-29T05:50:35Z

mkldnn多线程（每个iteration都用新的线程执行）memory leak的分析：

这个应用场景很特殊，用户会不断地创建新线程去执行predictor.run，并且只执行这个函数。
只有涉及data transform的情况才会触发，因为会创建transferscope,。mkldnn是其中一个用例，mkl不会用，所以mkl没问题，见OperatorWithKernel::PrepareData（）
memory leak的地方在framework里面，global_transfer_data_cache（） in transfer_scope.cache.cc，原因在于这个函数用到了tread local的指针分配内存，而这个分配内存由于线程的不断切换，在线程退出后也没有机会释放。这个函数的目的是好的，为了提升性能，cache并且重用 transferscope，但是估计单测中的多线程case是这段代码在当时设计时没有考虑到的。

std::unordered_map<size_t, Scope*>& global_transfer_data_cache() {
  thread_local auto* x = new std::unordered_map<size_t, Scope*>;
  return *x;
}

解决方案讨论：
这个framework的问题和mkldnn自身cache机制的问题很相似，都是为了提高性能要充分reuse 之前的东西，并且还要支持单实例/多实例，顺序run/并行run，固定单线程run/切换多线程run的情况，case比较复杂。想通过一个solution在底层用户无感知的情况下解决问题难度很大并且会很复杂，建议可否考虑设置特定的API来让用户确定运行模式，这样既可以简化设计，同时也可以更明确地对特定场景进行优化。

…uent switching case to avoid memory leak test=develop

test=develop

… into detect_model_test

jczaja · 2019-07-04T13:58:55Z

@luotao1 , I'm writting here because I have a problem reproducing potential memory leak in single-threaded execution that I was told by @LeoZhao-Intel you are seeing eg.
Test_analyzer_detect for CAP=50 e.g. for samples=1000 max memory consumption is 2.6GB and for samples=5000 it is around 4GB.

I tried to reproduce this problem, but got difficulty in observing it. So perhaps you can advise on how to have it manifested.

I tested this PR (most recent update one and earlier ones) and for CAP=50 (cfg.EnableMKLDNN(50) ) for

samples=1000 , maximal memory consumption is ~2.1 GB
samples=5000 , maximal memory consumption is ~2.2 GB

I have disabled in-loop threads starting to do prediction, by commenting out thread.emplace_back...
e.g. test_analyzer_detect is single threaded (Predictor executions starts from the same thread as mother thread).

How to you check for max memory usage ?

On my side I check Maximal memory consumption by checking Max RSS value as reported by time program eg./usr/bin/time -v <cmd line>

typical output:

Command being timed: "./paddle/fluid/inference/tests/api/test_analyzer_detect --infer_model=/home/jczaja/DETECT/fluid/ --infer_data=/home/jczaja/DETECT/detect_input.txt --infer_shape=/home/jczaja/DETECT/shape.txt --gtest_filter=Analyzer_vis.profile_mkldnn --paddle_num_threads=1 --repeat=2 --batch_size=1 --sample=5000"
User time (seconds): 1019.35
System time (seconds): 156.12
Percent of CPU this job got: 98%
Elapsed (wall clock) time (h:mm:ss or m:ss): 19:55.67
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 2313368 # 2.3 GB
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 102711363
Voluntary context switches: 10037
Involuntary context switches: 1763
Swaps: 0
File system inputs: 0
File system outputs: 6312
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0

When observing a problem have you disabled pprof and sanitizers (they tend to increase memory consumption) ? If fully disabled (no linking to tcmalloc and building for sanitizers) do you still see this increased memory consumption?
What platform you used for observing memory leak?
Any other advice on how to observe this memory leak ?

luotao1 · 2019-07-04T14:29:39Z

@jczaja Thanks for your reproducing.

Test_analyzer_detect for CAP=50 e.g. for samples=1000 max memory consumption is 2.6GB and for samples=5000 it is around 4GB.

This is a result from @jianhang-liu. And my test result today is: config.EnableMKLDNN(10) with CAP=10:

samples=1000 (from 1k dataset) , maximal memory consumption is ~1.5 GB
samples=1000 (from the first 1k of 5k dataset) , maximal memory consumption is ~1.9 GB
samples=5000 (from 5k dataset) , maximal memory consumption is ~2.4 GB

I have disabled in-loop threads starting to do prediction, by commenting out thread.emplace_back

I don't disable this, but with #18428, I think the multi-instance memory leak is fixed now.

--paddle_num_threads=1

I use --paddle_num_threads=4, @jianhang-liu find OMP_NUM_THREADS may affect on it. But I don't test on different paddle_num_threads.

How to you check for max memory usage

I use top cmd to observe directly.

If fully disabled (no linking to tcmalloc and building for sanitizers) do you still see this increased memory consumption

Yes, it is.

What platform you used for observing memory leak

I use E5-2620 v3.

Any other advice on how to observe this memory leak

Maybe --paddle_num_threads=4 is the key point?

luotao1 · 2019-10-17T04:15:24Z

command:

./paddle/fluid/inference/tests/api/test_analyzer_detect --gtest_filter=Analyzer_vis.profile_mkldnn --batch_size=1 --warmup --repeat=1 --paddle_num_threads=4 --infer_model=third_party/inference_demo/face_model/densebox --infer_data=third_party/inference_demo/face_model/detect_input.txt --infer_shape=third_party/inference_demo/face_model/shape.txt  --sample=1000

2620 v3

Threads	%CPU	Max %Mem	sample latency（ms）	date	commit
4	~320%	1.1%	27.8061	10/17	`1d92544`

luotao1 · 2020-05-08T03:37:11Z

close due to #24336

LeoZhao-Intel and others added 12 commits June 24, 2019 12:22

clear cache when tid == 1 and cache size exceeds max capacity

4ea200e

test=develop

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

2088f4c

… cache_clearing

add more logs to print blob status

3d2e563

test=develop

1. Add new interface in AnalysisConfig to set mkldnn thread id

14c5b2e

2. Few fix in concat/pool mkldnn kernel for key generation 3. Enable cache clearing mechanism test=develop

change to use VLOG(2)

29ca760

test=develop

detect model test for dynamic shape

76db898

load input data one by one

6a97049

each iteration use new threads

1a47373

Merge branch 'develop' into detect_model

c0419de

fix input_slot_all memory leak

d27c757

Merge branch 'cache_clearing' of https://github.com/LeoZhao-Intel/Paddle

f6ed3d1

into detect_model_test

change KeyBlob from vector to map for speedup

634d8c6

pawelpiotrowicz and others added 10 commits June 27, 2019 14:20

PaddlePaddle memory leak test=develop

2de7f41

use input_shape to setBlob

4826196

Merge branch 'develop' into detect_model_test

29af2e3

refine device_context, make blobMap etc more cleaner

725f451

add MkldnnPreRun and MkldnnPostRun interface

e8305ea

add EnableMKLDNN(int mkldnn_input_shape_cache_size = 1) inferace

266a201

Merge branch 'develop' into detect_model_test

0d8f70a

test=develop

Merge branch 'pawepiot/paddlepaddle_memory_leak' of https://github.co…

0e9aec9

…m/pawelpiotrowicz/Paddle into detect_model_test

Revert "PaddlePaddle memory leak test=develop"

086f347

make unit-test support head-checker

6d5a841

luotao1 commented Jun 29, 2019

View reviewed changes

LeoZhao-Intel and others added 3 commits July 1, 2019 11:12

use static variable to do cache instead of tread local in thread freq…

d91c910

…uent switching case to avoid memory leak test=develop

Merge branch 'develop' into detect_model_test

1008c2a

use marco to control code given it is specific for mkldnn

d6597b9

test=develop

Merge branch 'mt_memoryleak' of https://github.com/LeoZhao-Intel/Paddle…

7251426

… into detect_model_test

LeoZhao-Intel mentioned this pull request Jul 1, 2019

use static variable to do cache instead of thread local in thread frequent switching case #18428

Merged

Merge branch 'develop' into detect_model_test

b03d25a

luotao1 mentioned this pull request Jul 3, 2019

[PROPOSAL] Add support for dynamic code analysis (Sanitizers) #18303

Merged

luotao1 added 2 commits July 3, 2019 13:47

Merge branch 'develop' into detect_model_test

91dec55

Merge branch 'develop' into detect_model_test

3a5cdc4

luotao1 added 2 commits July 8, 2019 10:02

Merge branch 'develop' into detect_model_test

be64967

fix conflict with develop

5480edf

luotao1 closed this in #18428 Jul 8, 2019

luotao1 reopened this Jul 8, 2019

Merge branch 'develop' into detect_model_test

1f9ca7d

This was referenced Jul 8, 2019

enhance config.EnableMKLDNN api for mkldnn cache clear strategy #18549

Closed

test_analyzer_* random crashs at compare_mkldnn #15032

Closed

luotao1 added 4 commits July 11, 2019 21:00

Merge branch 'develop' into detect_model_test

1ba6599

checkout develop paddle/fluid/operators/mkldnn/pool_mkldnn_op.cc

65e3865

clean detect unit-test

330207c

Merge branch 'develop' into detect_model_test

d4997de

luotao1 mentioned this pull request Oct 17, 2019

[MKL-DNN] Added mkl-dnn cache clearing when creating Executor instance #20241

Merged

luotao1 mentioned this pull request Mar 24, 2020

Hide globals & redesign tls #23066

Closed

PaddlePaddle locked and limited conversation to collaborators Apr 2, 2020

PaddlePaddle unlocked this conversation Apr 2, 2020

luotao1 closed this May 8, 2020

luotao1 mentioned this pull request May 28, 2021

config. SetMkldnnCacheCapacity is useless #33021

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DO NOT MERGE] detect model test2 for dynamic shape#18372

[DO NOT MERGE] detect model test2 for dynamic shape#18372
luotao1 wants to merge 36 commits intoPaddlePaddle:developfrom
luotao1:detect_model_test

luotao1 commented Jun 27, 2019

Uh oh!

luotao1 commented Jun 27, 2019

Uh oh!

luotao1 Jun 29, 2019

Uh oh!

luotao1 commented Jun 29, 2019 •

edited

Loading

Uh oh!

luotao1 commented Jun 29, 2019 •

edited

Loading

Uh oh!

LeoZhao-Intel commented Jun 29, 2019 •

edited

Loading

Uh oh!

jczaja commented Jul 4, 2019 •

edited

Loading

Uh oh!

luotao1 commented Jul 4, 2019

Uh oh!

luotao1 commented Oct 17, 2019

Uh oh!

luotao1 commented May 8, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

luotao1 commented Jun 27, 2019

Uh oh!

luotao1 commented Jun 27, 2019

Uh oh!

luotao1 Jun 29, 2019

Choose a reason for hiding this comment

Uh oh!

luotao1 commented Jun 29, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

luotao1 commented Jun 29, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

LeoZhao-Intel commented Jun 29, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jczaja commented Jul 4, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

luotao1 commented Jul 4, 2019

Uh oh!

luotao1 commented Oct 17, 2019

Uh oh!

luotao1 commented May 8, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

luotao1 commented Jun 29, 2019 •

edited

Loading

luotao1 commented Jun 29, 2019 •

edited

Loading

LeoZhao-Intel commented Jun 29, 2019 •

edited

Loading

jczaja commented Jul 4, 2019 •

edited

Loading