enhance config.EnableMKLDNN api for mkldnn cache clear strategy#18549
enhance config.EnableMKLDNN api for mkldnn cache clear strategy#18549luotao1 wants to merge 6 commits intoPaddlePaddle:developfrom luotao1:enable_mkldnn_enhance
Conversation
There was a problem hiding this comment.
Compared with #18372, the reason of don't use MkldnnPostRun is: if reset the mkldnn_session_id to 0, the unit-test dev_ctx->GetShapeBlobSize() could not get the correct shape_blob size.
There was a problem hiding this comment.
There may be a corner case, e.g. if thread is reuse with pool, in last execution, instance X config.mkldnn_input_shape_cache_capacity_ is set >0, then thread A is set thread local cache capacity and this variable is not cleared after execution, but when this thread A is reused by another instance B with config.mkldnn_input_shape_cache_capacity_ = 0, it will hit wrong branch.
There was a problem hiding this comment.
so suggest to do like below in MkldnnPreRun:
if (config_.mkldnn_input_shape_cache_capacity_ > 0) {
VLOG(2) << "In mkldnn cache clear mode.";
platform::set_cur_mkldnn_session_id(
platform::kMKLDNNSessionID_CacheClearing);
platform::set_cur_input_shape_cache_capacity(
config_.mkldnn_input_shape_cache_capacity_);
}
// Set current_input_shape .
std::stringstream ss;
for (size_t i = 0; i < inputs.size(); ++i) {
for (size_t j = 0; j < inputs[i].shape.size(); ++j) {
ss << inputs[i].shape[j] << "-";
}
}
VLOG(2) << "Set input shape=" << ss.str();
platform::set_cur_input_shape_str(ss.str());
There was a problem hiding this comment.
Enhance it for cur_input_shape_cache_capacity=1 and sBlob.size()==0
…#18532) * Fix Mask rcnn predictor 1. refine memory optim algorithm to support the model with the block op. 2. output diff : modify the affine channel fuse 3. add condition_block_infer op add interface for setting trt calib table dir test=develop * add the missing files. test=develop
test=develop
|
@LeoZhao-Intel @jczaja Please take a review! |
…ddle into luotao1-enable_mkldnn_enhance
There was a problem hiding this comment.
There may be a corner case, e.g. if thread is reuse with pool, in last execution, instance X config.mkldnn_input_shape_cache_capacity_ is set >0, then thread A is set thread local cache capacity and this variable is not cleared after execution, but when this thread A is reused by another instance B with config.mkldnn_input_shape_cache_capacity_ = 0, it will hit wrong branch.
| py::arg("mkldnn_input_shape_cache_capacity") = 0) | ||
| .def("mkldnn_enabled", &AnalysisConfig::mkldnn_enabled) | ||
| .def("set_cpu_math_library_num_threads", | ||
| &AnalysisConfig::SetCpuMathLibraryNumThreads) |
There was a problem hiding this comment.
there may be another failure in CI, see in my PR #18081. https://github.com/PaddlePaddle/Paddle/pull/18081/files?file-filters%5B%5D=.py#diff-876ea1bc109973488c161a657f79812fR74 , but it may be fixed in your PR.
There was a problem hiding this comment.
so suggest to do like below in MkldnnPreRun:
if (config_.mkldnn_input_shape_cache_capacity_ > 0) {
VLOG(2) << "In mkldnn cache clear mode.";
platform::set_cur_mkldnn_session_id(
platform::kMKLDNNSessionID_CacheClearing);
platform::set_cur_input_shape_cache_capacity(
config_.mkldnn_input_shape_cache_capacity_);
}
// Set current_input_shape .
std::stringstream ss;
for (size_t i = 0; i < inputs.size(); ++i) {
for (size_t j = 0; j < inputs[i].shape.size(); ++j) {
ss << inputs[i].shape[j] << "-";
}
}
VLOG(2) << "Set input shape=" << ss.str();
platform::set_cur_input_shape_str(ss.str());
|
guofei02 seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. You have signed the CLA already but the status is still pending? Let us recheck it. |
EnableMKLDNN(int mkldnn_input_shape_cache_capacity = 0)TEST(Analyzer_MM_DNN, mkldnn_cache_clear)with the enhancement api, and add output compare between no cache strategy and using cache strategy.