Skip to content

add mkldnn shapeblob cache clear strategy#18513

Merged
luotao1 merged 4 commits intoPaddlePaddle:developfrom
luotao1:shape_blob_clear_cache
Jul 8, 2019
Merged

add mkldnn shapeblob cache clear strategy#18513
luotao1 merged 4 commits intoPaddlePaddle:developfrom
luotao1:shape_blob_clear_cache

Conversation

@luotao1
Copy link
Contributor

@luotao1 luotao1 commented Jul 5, 2019

This PR

  1. add mkldnn shapeblob cache clear strategy: When shapeblob.size() == mkldnn_input_shape_cache_size, erase the first one of shapeblob.
  2. add a unit-test TestMkldnnCacheClear to ensure this cache clear strategy
    • If not use cache clear strategy, shapeblob.size() always = 1.
    • If use cache clear strategy, shapeblob.size() = mkldnn_input_shape_cache_size.
  3. remove unused platform::get_cur_input_shape_str function, and add size_t MKLDNNDeviceContext::GetShapeBlobSize(int mkldnn_session_id) to get the ShapeBlob size by mkldnn_session_id.

TODO

  1. explicit following settings such as platform::set_cur_input_shape_str(ss.str()); and platform::set_cur_mkldnn_session_id(platform::kMKLDNNSessionID_CacheClearing);, will be deprecated after enhance config.EnableMKLDNN() interface.
  2. for global function such as platform::set_cur_input_shape_str(ss.str());, platform::set_cur_mkldnn_session_id(platform::kMKLDNNSessionID_CacheClearing); etc, make them become Class function of MKLDNNDeviceContext.

@luotao1 luotao1 requested review from a user and jczaja July 5, 2019 04:16
@luotao1
Copy link
Contributor Author

luotao1 commented Jul 5, 2019

@LeoZhao-Intel Please take a review!

thread_local std::string cur_input_shape_str = "";
// the cache size of different input shapes for MKLDNN.
// Default 1 means fixed input shape, not dynamic shape.
thread_local int cur_input_shape_cache_size = 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about cur_input_shape_cache_capacity instead of size? size here is a kind of real data size, while it is for max cache size in my understanding.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. I will change.

cur_input_shape_str = input_shape_str;
}
std::string get_cur_input_shape_str(void) { return cur_input_shape_str; }
void set_cur_input_shape_cache_size(int input_shape_cache_size) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

set_cur_input_shape_cache_capacity?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. I will change.


void MKLDNNDeviceContext::ResetBlobMap() const { p_blobmap_->clear(); }

size_t MKLDNNDeviceContext::GetShapeBlobSize(int mkldnn_session_id) const {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why need a explicit input parameter mkldnn_session_id? can we just get it from thread_local? seems there is no case for user getting shapeblobsize for other session id with different id in thread_local

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we just get it from thread_local

Could you give more details on how to get it from thread_local?

why need a explicit input parameter mkldnn_session_id?

if we don't have mkldnn_session_id parameter, line428 pMap->find(mkldnn_session_id) could not find. Or do you mean there is only one session_id in pMap?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mkldnn_session_id is already stored in cur_mkldnn_session_id, it is a thread_local variable, so if user call it in same thread with predictor.run, it can easily get id from cur_mkldnn_session_id.

Copy link
Contributor

@LeoZhao-Intel LeoZhao-Intel Jul 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if user call getblobsize in same thread with predictor.run, then code is like:

  BlobMap* pMap = p_blobmap_.get();
  auto map_it = pMap->find(cur_mkldnn_session_id);
  if (map_it == pMap->end()) {
    LOG(FATAL) << "MKLDNNDeviceContext don't find mkldnn_session_id : "
               << mkldnn_session_id;
  }
  return map_it->second->size();
}

If we still want user to call this function in other threads, then this parameter is necessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. I will change it.

if (key_it == sBlob->end()) {
// In cache clearing mode, cur_input_shape_cache_size defines max pblob
// capacity
if ((tid == kMKLDNNSessionID_CacheClearing) &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better rename tid to sid (session id) to align with kMKLDNNSessionID_xxx

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. I will change.


// default mkldnn session id
constexpr size_t kMKLDNNSessionID_Default = 0;
constexpr int kMKLDNNSessionID_Default = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why change to int? size_t is used for aligning with std::hash(this_thread::gettid()), otherwise it may need a static_cast

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. I will change.

(sBlob->size() == static_cast<size_t>(cur_input_shape_cache_size))) {
VLOG(2) << "tid=" << tid
<< ", remove all head blob of shape: " << sBlob->begin()->first;
sBlob->erase(sBlob->begin()->first);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why here remove sBlob->begin()->first? seems it is wrong, should be erase(sBlob->begin()), sBlob->begin()->first is string.
BTW, it doesn't really mean removing the head or first one because sBlob is std::unordered_map and its index method is not same with vector or queue,.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

map could erase by key, see: https://www.geeksforgeeks.org/map-erase-function-in-c-stl/

it doesn't really mean removing the head or first one because sBlob is

Got it. I will change the LOG. For code, since we can erase any one of the sBlob, it runs successfully.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are right, there are 3 erase functions.

test=develop
@luotao1
Copy link
Contributor Author

luotao1 commented Jul 5, 2019

All done, please review again @LeoZhao-Intel

// In cache clearing mode, cur_input_shape_cache_capacity defines
// max pblob capacity
if ((sid == kMKLDNNSessionID_CacheClearing) &&
(sBlob->size() ==
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you think">=" is better here ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when ==, it will clean. Thus > does not work.
What's the case when > works?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so far I still not see cases for >, but from code level, it looks safer to use >=.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it.

@LeoZhao-Intel
Copy link
Contributor

LGTM

@luotao1
Copy link
Contributor Author

luotao1 commented Jul 5, 2019

@jczaja Please take a review!


size_t MKLDNNDeviceContext::GetShapeBlobSize() const {
BlobMap* pMap = p_blobmap_.get();
auto map_it = pMap->find(cur_mkldnn_session_id);
Copy link
Contributor

@jczaja jczaja Jul 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

currently it is only for UT purposes? Perhaps it would be safer to guard whole function with critical section , the same mutex as for SetBlob and GetBlob. In particular if there is plan that to use GetShapeBlobSize with parallel executor. Apart from that LGTM

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. You are right.

@luotao1
Copy link
Contributor Author

luotao1 commented Jul 6, 2019

@LeoZhao-Intel @jianhang-liu The number of analyzer_mm_dnn hang increases in this PR after adding a unit-test here.
http://ci.paddlepaddle.org/viewType.html?buildTypeId=Paddle_PrCi&branch_Paddle=pull%2F18513&tab=buildTypeStatusDiv
Is the reason of #18513 (comment)?

@luotao1
Copy link
Contributor Author

luotao1 commented Jul 6, 2019

@jczaja @LeoZhao-Intel I add the mutex, and analyzer_mm_dnn don't hang when I repeat 3 times.
http://ci.paddlepaddle.org/viewType.html?buildTypeId=Paddle_PrCi&branch_Paddle=pull%2F18513&tab=buildTypeStatusDiv
image
Please review again!

@LeoZhao-Intel
Copy link
Contributor

LGTM

Copy link

@ghost ghost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please consider whether we really need VLOG(2) considering those log are just detail information of cache usage.

@luotao1 luotao1 merged commit fe32879 into PaddlePaddle:develop Jul 8, 2019
@luotao1 luotao1 deleted the shape_blob_clear_cache branch July 8, 2019 01:55
@luotao1
Copy link
Contributor Author

luotao1 commented Jul 8, 2019

Please consider whether we really need VLOG(2) considering those logs are just detail information of cache usage.

Since those logs are used by MKLDNN only, what's your opinion about it? @LeoZhao-Intel @jczaja

@LeoZhao-Intel
Copy link
Contributor

I am fine with VLOG(2), but not sure if there is a clear rule for log level definition.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants