Add Fuse AllReduce Pass And Adam Pass#15497
Add Fuse AllReduce Pass And Adam Pass#15497chengduoZH wants to merge 40 commits intoPaddlePaddle:developfrom
Conversation
c568447 to
58a4f9d
Compare
test=develop
… fuse_gradient_space test=develop
test=develop
ef071b7 to
0ea6bee
Compare
test=develop
test=develop
|
|
||
| // Add op fusion. | ||
| if (strategy.fuse_relu_depthwise_conv_) { | ||
| VLOG(10) << "Add fuse_relu_depthwise_conv_pass"; |
There was a problem hiding this comment.
I'm thinking how about put build strategy in python side so that we can define strategies for different scenarios.
There was a problem hiding this comment.
Can you put them in compiler.py?
There was a problem hiding this comment.
Maybe all the fuse_xx_pass can be placed in Python side.
| auto iter = vars.find(p_g.second); | ||
| PADDLE_ENFORCE(iter != vars.end()); | ||
|
|
||
| // Set Persistable to prevent this var become reusable. |
There was a problem hiding this comment.
Since grad vars are persistable, the allocation of memory space could be done at startup, no additional op in the main program needed then.
but still, need to know the memory pieces to do allreduce.
There was a problem hiding this comment.
I put the alloc_continuous_space_for_grad_op in RunOnlyOnceProgram which is runned in multi_device_pass, maybe it is better.
test=develop
… fuse_gradient_space test=develop
test=develop
test=develop
test=develop
test=develop
… fuse_gradient_space test=develop
test=develop
… fuse_gradient_space test=develop
… fuse_gradient_space_debug test=develop
test=develop
test=develop
test=develop
|
|
||
| // Add automatically inplace. | ||
| if (strategy_.enable_inplace_) { | ||
| VLOG(10) << "Add inplace_pass"; |
There was a problem hiding this comment.
Can remove all these VLOG, can get pass names and print them from python side.
| std::vector<ir::Node *> opt_ops; | ||
| for (ir::Node *node : result.Nodes()) { | ||
| if (node->IsOp()) { | ||
| GetSpecifiedOpsAndVars(fuse_op_type, aux_var_names, node, &opt_ops, |
There was a problem hiding this comment.
Need to check whether all optimizers are the same, and only fuse gradient variable for same optimizers.
| for (auto &op_desc : program.Block(0).AllOps()) { | ||
| auto op = paddle::framework::OpRegistry::CreateOp(*op_desc); | ||
| VLOG(4) << op->DebugStringEx(local_scopes_[i]); | ||
| op->Run(*local_scopes_[i], places_[i]); |
There was a problem hiding this comment.
It's better to distinguish compile time and runtime, not to run anything in passes.
| static_cast<int>(OpRole::kLoss)) && | ||
| !loss_var_name_.empty(); // If loss_var is empty. This is test mode | ||
| static_cast<int>( | ||
| OpRole::kLoss)); // If loss_var is empty. This is test mode |
There was a problem hiding this comment.
can remove the comment here too.
| } | ||
| } | ||
|
|
||
| void FuseAllReduceSSAGraphBuilder::CheckGraph(const ir::Graph &graph) const { |
There was a problem hiding this comment.
Why need to add a new FuseAllReduceSSAGraphBuilder? maybe put fuse passes after multi device graph pass can reduce number of builders here?
test=develop
8d3184e to
57fc341
Compare
5b8fc64 to
d4e18bf
Compare
test=develop
… fuse_gradient_space9 test=develop
test=develop
d4e18bf to
487d31e
Compare
test=develop
79ad035 to
7b77ef9
Compare
test=develop
7b77ef9 to
57b2479
Compare
… fuse_gradient_space test=develop
| } | ||
|
|
||
| // for single card training, fuse_all_reduce_ops is unnecessary. | ||
| if (strategy.fuse_all_reduce_ops_) { |
There was a problem hiding this comment.
doesn't need build_strategy? Can be calculated automatically?
There was a problem hiding this comment.
I think we should have strategy.fuse_all_reduce_ops_ currently, and the default value can be set True.
| fuse_gradients = true; | ||
| } | ||
|
|
||
| if (strategy.fuse_all_optimizer_ops_) { |
There was a problem hiding this comment.
same here. This flag should be auto calculated.
| } | ||
|
|
||
| if (strategy.fuse_all_optimizer_ops_) { | ||
| if (!fuse_gradients) { |
| } | ||
| // NOTE: fuse_all_xx_ops will count the number of xx operator first, | ||
| // if the number is zero, fuse_all_reduce_ops will do nothing. | ||
| // Currently, only one type of optimization algorithm can be fused. |
There was a problem hiding this comment.
which one can't be fused? say it in doc?
There was a problem hiding this comment.
In theory, all the optimizer ops which are used update the dense parameters can be fused. But I only implement adam and sgd now.
|
|
||
| bool fuse_elewise_add_act_ops_{false}; | ||
|
|
||
| bool fuse_all_reduce_ops_{false}; |
There was a problem hiding this comment.
these 3 flags are not needed.
There was a problem hiding this comment.
We need those flags currently. When we test those pass on more models and ensure that those pass can make program faster, we can set the default value as True.
| #include "paddle/fluid/framework/details/variable_visitor.h" | ||
| #include "paddle/fluid/platform/profiler.h" | ||
|
|
||
| DEFINE_bool(skip_fused_all_reduce_check, false, ""); |
test=develop
8507cb0 to
71a314d
Compare
| SortParamsAndGrads(vars, ¶ms_grads); | ||
| SetGroupGradsAndParams(vars, params_grads, &group_params_grads); | ||
|
|
||
| // Set Gradients as Persistable to prevent this var becoming reusable. |
There was a problem hiding this comment.
Can you put your pass after memory optimize so you don't need to make them persistable?
There was a problem hiding this comment.
Maybe, but if I make them as persistable, I don't need care about the position of memory_opt_pass.
| SetGroupGradsAndParams(vars, params_grads, &group_params_grads); | ||
|
|
||
| // Set Gradients as Persistable to prevent this var becoming reusable. | ||
| auto dtype = static_cast<proto::VarType::Type>(0); |
There was a problem hiding this comment.
0 is framework::proto::VarType::Type::VarType_Type_BOOL, the dtype of the input could not be BOOL.
| } | ||
|
|
||
| // Create the fused variable name. | ||
| const std::string prefix(kFusedVarNamePrefix); |
There was a problem hiding this comment.
I think when we analyze the var's name, we can recognize the fused_var by the prefix. And it also prevents the name conflict with the existing names.
| } | ||
| result.Get<RunOnlyOnceProgram>(kRunOnlyOnceProgram).emplace_back(); | ||
| auto& program_desc = | ||
| result.Get<RunOnlyOnceProgram>(kRunOnlyOnceProgram).back(); |
There was a problem hiding this comment.
just run the operations in this pass. not need to save if for later?
| void ResetAttribute(const std::string& attr_name, ir::Graph* graph) const { | ||
| if (graph->Has(attr_name)) { | ||
| VLOG(10) << attr_name << " is reset."; | ||
| graph->Erase(attr_name); |
There was a problem hiding this comment.
throw error? It's the job of build_strategy to do this.
There was a problem hiding this comment.
I think if this pass is run many times, the result is should be the same.
| OpProtoAndCheckerMaker::OpRoleAttrName()))); | ||
| } | ||
|
|
||
| // NOTE: fused_var is only exist in scope, so the graph doesn't have fused_var |
| // of all the optimizer ops separately. | ||
| // And alloc_continuous_space ops are placed in RunOnlyOnceProgram, | ||
| // which is executed before running the model with ParallelExecutor. | ||
| if (!result.Has(kRunOnlyOnceProgram)) { |
| } | ||
| } | ||
|
|
||
| if (need_collection_ops_) { |
| */ | ||
| AddOutputToLeafOps(&result); | ||
|
|
||
| /* |
There was a problem hiding this comment.
the following thing doesn't seem to related to this pass
| constexpr char kPlaces[] = "places"; | ||
| constexpr char kLocalScopes[] = "local_scopes"; | ||
| constexpr char kStrategy[] = "strategy"; | ||
| constexpr char kNRanks[] = "nranks"; |
There was a problem hiding this comment.
try to avoid exposing a lot of this global names
No description provided.