Skip to content

cache runtime_context#16002

Merged
luotao1 merged 15 commits intoPaddlePaddle:developfrom
luotao1:runtime_context
Mar 19, 2019
Merged

cache runtime_context#16002
luotao1 merged 15 commits intoPaddlePaddle:developfrom
luotao1:runtime_context

Conversation

@luotao1
Copy link
Contributor

@luotao1 luotao1 commented Mar 1, 2019

RuntimeContext is used to relate input/output names of Operator with the corresponding variables in Scope. Since the input/output names of Operator do not change in the execution, RuntimeContext could be created only at the first iteration of the execution to save the elapsed time.

In the inference of PyramidDNN (a small model):

2450 v2 (ms per sample) 2620 v3 (ms per sample)
before 0.305032 0.228627
after 0.239653 0.190147
speedup 21% 16%

@luotao1 luotao1 requested a review from chengduoZH March 5, 2019 02:01
@chengduoZH chengduoZH requested a review from panyx0718 March 5, 2019 02:06

/// Find whether a variable in the current scope.
/// Return false if cannot find.
bool HasLocalVar(const std::string& name) const;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where is this used?

Copy link
Contributor Author

@luotao1 luotao1 Mar 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to use scope.parent()->HasLocalVar(kLocalExecScopeName) to replace scope->FindVar(kLocalExecScopeName) in this PR, but it seems that speedup is not obvious. I will remove it later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove the used HasLocalVar function.

// in the execution, RuntimeContext could be created only at the first
// iteration of the execution to save the elapsed time.
// Note that the Scope should not be the local scope, since local scope
// would be cleaned regularly.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

making this default is dangerous because there is no restriction that global scope is not allowed to change.

Copy link
Contributor Author

@luotao1 luotao1 Mar 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ca34c90 add runtime_context_cache_pass to do it, and making this default false in analysis_config of inference . @panyx0718 @Superjomn

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are duplicated print enable_ir_optim before, see line74 and 75.

@luotao1 luotao1 requested a review from Superjomn March 13, 2019 09:59
@luotao1
Copy link
Contributor Author

luotao1 commented Mar 15, 2019

Fails on TensorRT, the reason is that there are two more runtime_context_cache_pass at the last. The reason is the same with #16175. Thus, this PR will set enable_runtime_context_cache_ false in default, and after fix #16175, enable_runtime_context_cache_ will be true in default.
http://ci.paddlepaddle.org/viewLog.html?buildId=70592&tab=buildLog&buildTypeId=Paddle_PrCi&logTab=tree&filter=all&_focus=18755

[06:53:00]	I0315 06:52:48.986371 71768 analysis_predictor.cc:367] TensorRT subgraph engine is enabled
[06:53:00]	--- Running analysis [ir_graph_build_pass]
[06:53:00]	--- Running analysis [ir_analysis_pass]
[06:53:00]	--- Running IR pass [infer_clean_graph_pass]
[06:53:00]	--- Running IR pass [identity_scale_op_clean_pass]
[06:53:00]	--- Running IR pass [tensorrt_subgraph_pass]
[06:53:00]	---  detect a sub-graph with 305 nodes
[06:53:00]	--- Running IR pass [conv_affine_channel_fuse_pass]
[06:53:00]	--- Running IR pass [conv_eltwiseadd_affine_channel_fuse_pass]
[06:53:00]	--- Running IR pass [conv_elementwise_add_act_fuse_pass]
[06:53:00]	--- Running IR pass [conv_elementwise_add2_act_fuse_pass]
[06:53:00]	--- Running IR pass [conv_elementwise_add_fuse_pass]
[06:53:00]	--- Running IR pass [transpose_flatten6_concat_fuse_pass]
[06:53:00]	--- Running IR pass [transpose_flatten5_concat_fuse_pass]
[06:53:00]	--- Running IR pass [transpose_flatten4_concat_fuse_pass]
[06:53:00]	--- Running IR pass [transpose_flatten3_concat_fuse_pass]
[06:53:00]	--- Running IR pass [runtime_context_cache_pass]
[06:53:00]	--- Running IR pass [runtime_context_cache_pass]
[06:53:00]	--- Running analysis [ir_params_sync_among_devices_pass]
[06:53:00]	I0315 06:52:49.501626 71768 ir_params_sync_among_devices_pass.cc:41] Sync params from CPU to GPU

@luotao1
Copy link
Contributor Author

luotao1 commented Mar 15, 2019

[12:10:22]	[Step 1/1] + APPROVALS=FALSE
[12:10:22]	[Step 1/1] + echo 'current pr 16002 got approvals: FALSE'
[12:10:22]	[Step 1/1] + '[' FALSE == FALSE ']'
[12:10:22]	[Step 1/1] + '[' paddle/fluid/framework/operator.h == paddle/fluid/API.spec ']'
[12:10:22]	[Step 1/1] + echo 'You must have panyx0718 approval for the api change! paddle/fluid/framework/operator.h'
[12:10:22]	[Step 1/1] + exit 1
[12:10:22]	[Step 1/1] current pr 16002 got approvals: FALSE
[12:10:22]	[Step 1/1] You must have panyx0718 approval for the api change! paddle/fluid/framework/operator.h

@luotao1
Copy link
Contributor Author

luotao1 commented Mar 15, 2019

@panyx0718 Could start review again since the latest commit passes all the CI except api change?

VLOG(3) << "Applies Runtime Context Cache strategy.";
for (const Node* n : graph->Nodes()) {
if (n->IsOp()) {
n->Op()->SetAttr(kEnableRuntimeContext, true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kEnableCacheRuntimeContext?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


/// RuntimeContext is used to relate input/output names of Operator with
/// the corresponding variables in Scope.
/// If an Op has attribute kEnableRuntimeContext, it means that in a same Scope,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in a name scope?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@Superjomn
Copy link
Contributor

Superjomn commented Mar 18, 2019

Make the contexts member of the operator is much simpler, no need to make them attributes or the operators will have too many attributes, with the runtime attributes mixed with the algorithms'.

Currently, the operator interface is over-implemented/designed, for inference, it should be much simpler, use MACRO to wrap them with the members?

Do we really need to change the runtime context or infer shape context? suffer that complexity?

@luotao1
Copy link
Contributor Author

luotao1 commented Mar 18, 2019

Make the contexts member of the operator is much simpler

I try this at first, but fail on ParallelExecutor unit-tests and distirbuted async unit-tests.

  • In ParallelExecutor, since it creates local scope and the local scope would be cleaned regularly, we should use scope.FindVar(details::kLocalExecScopeName) to detect whether it is a local scope.
    784826a
  • After fix ParallelExecutor unit-tests, I try to fix distirbuted async unit-tests. However, if context is the member of the operator, it will fails random on distirbuted async unit-tests. Discussed with @typhoonzero , distirbuted async unit-tests needs to create the runtime_context in a thread.

for inference, it should be much simpler, use MACRO to wrap them with the members

Do you mean use #ifdef ON_INFER with the contexts member?
I try this as well, but fail on Tensorrt unit-tests. The error is "Attribute 'subgraph' is required! at [/paddle/paddle/fluid/framework/attribute.h:276].
Do you mean use #ifdef ON_INFER and #ifndef PADDLE_WITH_TENSORRT? But it will not gain any speedup in an inference library with tensorrt.

Do we really need to change the runtime context or infer shape context? suffer that complexity?

The reason of changing runtime context is that cache it has a obvious speed up on small model of inference. Besides, we will cache kernel choose in #16004

@panyx0718 @Superjomn How do you think about using runtime_context_cache_pass or MACRO #ifdef ON_INFER and #ifndef PADDLE_WITH_TENSORRT?

@luotao1 luotao1 merged commit dbb92ee into PaddlePaddle:develop Mar 19, 2019
@luotao1 luotao1 deleted the runtime_context branch March 19, 2019 01:31
@luotao1 luotao1 restored the runtime_context branch March 19, 2019 14:01
@luotao1 luotao1 deleted the runtime_context branch May 10, 2019 09:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants