Add Fuse AllReduce Pass And Adam Pass by chengduoZH · Pull Request #15497 · PaddlePaddle/Paddle

chengduoZH · 2019-01-23T13:22:05Z

No description provided.

test=develop

… fuse_gradient_space test=develop

test=develop

typhoonzero · 2019-01-30T02:29:16Z

paddle/fluid/framework/details/build_strategy.cc


    // Add op fusion.
    if (strategy.fuse_relu_depthwise_conv_) {
+      VLOG(10) << "Add fuse_relu_depthwise_conv_pass";


I'm thinking how about put build strategy in python side so that we can define strategies for different scenarios.

Can you put them in compiler.py?

Maybe all the fuse_xx_pass can be placed in Python side.

typhoonzero · 2019-01-30T03:11:43Z

paddle/fluid/framework/details/fuse_gradient_space_pass.cc

+    auto iter = vars.find(p_g.second);
+    PADDLE_ENFORCE(iter != vars.end());
+
+    // Set Persistable to prevent this var become reusable.


Since grad vars are persistable, the allocation of memory space could be done at startup, no additional op in the main program needed then.

but still, need to know the memory pieces to do allreduce.

I put the alloc_continuous_space_for_grad_op in RunOnlyOnceProgram which is runned in multi_device_pass, maybe it is better.

test=develop

… fuse_gradient_space test=develop

test=develop

… fuse_gradient_space test=develop

test=develop

… fuse_gradient_space test=develop

… fuse_gradient_space_debug test=develop

test=develop

typhoonzero · 2019-02-14T10:01:31Z

paddle/fluid/framework/details/build_strategy.cc


    // Add automatically inplace.
    if (strategy_.enable_inplace_) {
+      VLOG(10) << "Add inplace_pass";


Can remove all these VLOG, can get pass names and print them from python side.

typhoonzero · 2019-02-14T10:20:36Z

paddle/fluid/framework/details/fuse_optimizer_op_pass.cc

+  std::vector<ir::Node *> opt_ops;
+  for (ir::Node *node : result.Nodes()) {
+    if (node->IsOp()) {
+      GetSpecifiedOpsAndVars(fuse_op_type, aux_var_names, node, &opt_ops,


Need to check whether all optimizers are the same, and only fuse gradient variable for same optimizers.

Done, thanks.

typhoonzero · 2019-02-14T10:37:48Z

paddle/fluid/framework/details/multi_devices_graph_pass.cc

+        for (auto &op_desc : program.Block(0).AllOps()) {
+          auto op = paddle::framework::OpRegistry::CreateOp(*op_desc);
+          VLOG(4) << op->DebugStringEx(local_scopes_[i]);
+          op->Run(*local_scopes_[i], places_[i]);


It's better to distinguish compile time and runtime, not to run anything in passes.

typhoonzero · 2019-02-14T10:39:01Z

paddle/fluid/framework/details/multi_devices_graph_pass.cc

-              static_cast<int>(OpRole::kLoss)) &&
-         !loss_var_name_.empty();  // If loss_var is empty. This is test mode
+              static_cast<int>(
+                  OpRole::kLoss));  // If loss_var is empty. This is test mode


can remove the comment here too.

typhoonzero · 2019-02-14T10:42:36Z

paddle/fluid/framework/details/multi_devices_graph_pass.cc

  }
 }

+void FuseAllReduceSSAGraphBuilder::CheckGraph(const ir::Graph &graph) const {


Why need to add a new FuseAllReduceSSAGraphBuilder? maybe put fuse passes after multi device graph pass can reduce number of builders here?

test=develop

… fuse_gradient_space9 test=develop

test=develop

… fuse_gradient_space test=develop

panyx0718 · 2019-02-25T06:21:30Z

paddle/fluid/framework/details/build_strategy.cc

+    }
+
+    // for single card training, fuse_all_reduce_ops is unnecessary.
+    if (strategy.fuse_all_reduce_ops_) {


doesn't need build_strategy? Can be calculated automatically?

I think we should have strategy.fuse_all_reduce_ops_ currently, and the default value can be set True.

panyx0718 · 2019-02-25T06:22:44Z

paddle/fluid/framework/details/build_strategy.cc

+      fuse_gradients = true;
+    }
+
+    if (strategy.fuse_all_optimizer_ops_) {


same here. This flag should be auto calculated.

panyx0718 · 2019-02-25T06:24:37Z

paddle/fluid/framework/details/build_strategy.cc

+    }
+
+    if (strategy.fuse_all_optimizer_ops_) {
+      if (!fuse_gradients) {


this feels strange.

panyx0718 · 2019-02-25T06:25:37Z

paddle/fluid/framework/details/build_strategy.cc

+      }
+      // NOTE: fuse_all_xx_ops will count the number of xx operator first,
+      // if the number is zero, fuse_all_reduce_ops will do nothing.
+      // Currently, only one type of optimization algorithm can be fused.


which one can't be fused? say it in doc?

In theory, all the optimizer ops which are used update the dense parameters can be fused. But I only implement adam and sgd now.

panyx0718 · 2019-02-25T06:29:52Z

paddle/fluid/framework/details/build_strategy.h


  bool fuse_elewise_add_act_ops_{false};

+  bool fuse_all_reduce_ops_{false};


these 3 flags are not needed.

We need those flags currently. When we test those pass on more models and ensure that those pass can make program faster, we can set the default value as True.

panyx0718 · 2019-02-25T06:32:42Z

paddle/fluid/framework/details/fused_all_reduce_op_handle.cc

+#include "paddle/fluid/framework/details/variable_visitor.h"
+#include "paddle/fluid/platform/profiler.h"
+
+DEFINE_bool(skip_fused_all_reduce_check, false, "");


test=develop

panyx0718 · 2019-02-25T08:02:11Z

paddle/fluid/framework/details/alloc_continuous_space_for_grad_pass.cc

+    SortParamsAndGrads(vars, &params_grads);
+    SetGroupGradsAndParams(vars, params_grads, &group_params_grads);
+
+    // Set Gradients as Persistable to prevent this var becoming reusable.


Can you put your pass after memory optimize so you don't need to make them persistable?

Maybe, but if I make them as persistable, I don't need care about the position of memory_opt_pass.

panyx0718 · 2019-02-25T08:02:59Z

paddle/fluid/framework/details/alloc_continuous_space_for_grad_pass.cc

+    SetGroupGradsAndParams(vars, params_grads, &group_params_grads);
+
+    // Set Gradients as Persistable to prevent this var becoming reusable.
+    auto dtype = static_cast<proto::VarType::Type>(0);


0 is framework::proto::VarType::Type::VarType_Type_BOOL, the dtype of the input could not be BOOL.

panyx0718 · 2019-02-25T08:07:38Z

paddle/fluid/framework/details/alloc_continuous_space_for_grad_pass.cc

+    }
+
+    // Create the fused variable name.
+    const std::string prefix(kFusedVarNamePrefix);


is prefix needed?

I think when we analyze the var's name, we can recognize the fused_var by the prefix. And it also prevents the name conflict with the existing names.

panyx0718 · 2019-02-25T08:10:35Z

paddle/fluid/framework/details/alloc_continuous_space_for_grad_pass.cc

+    }
+    result.Get<RunOnlyOnceProgram>(kRunOnlyOnceProgram).emplace_back();
+    auto& program_desc =
+        result.Get<RunOnlyOnceProgram>(kRunOnlyOnceProgram).back();


just run the operations in this pass. not need to save if for later?

panyx0718 · 2019-02-25T08:11:52Z

paddle/fluid/framework/details/alloc_continuous_space_for_grad_pass.cc

+  void ResetAttribute(const std::string& attr_name, ir::Graph* graph) const {
+    if (graph->Has(attr_name)) {
+      VLOG(10) << attr_name << " is reset.";
+      graph->Erase(attr_name);


throw error? It's the job of build_strategy to do this.

I think if this pass is run many times, the result is should be the same.

panyx0718 · 2019-02-25T09:16:39Z

paddle/fluid/framework/details/fuse_adam_op_pass.cc

+                                   OpProtoAndCheckerMaker::OpRoleAttrName())));
+  }
+
+  // NOTE: fused_var is only exist in scope, so the graph doesn't have fused_var


panyx0718 · 2019-02-25T09:23:28Z

paddle/fluid/framework/details/fuse_optimizer_op_pass.cc

+  // of all the optimizer ops separately.
+  // And alloc_continuous_space ops are placed in RunOnlyOnceProgram,
+  // which is executed before running the model with ParallelExecutor.
+  if (!result.Has(kRunOnlyOnceProgram)) {


run it in your own pass

panyx0718 · 2019-02-25T09:26:58Z

paddle/fluid/framework/details/multi_devices_graph_pass.cc

    }
  }

+  if (need_collection_ops_) {


don't use class member

panyx0718 · 2019-02-25T09:27:55Z

paddle/fluid/framework/details/multi_devices_graph_pass.cc

   */
  AddOutputToLeafOps(&result);
+
+  /*


the following thing doesn't seem to related to this pass

panyx0718 · 2019-02-25T09:28:28Z

paddle/fluid/framework/details/multi_devices_helper.h

+constexpr char kPlaces[] = "places";
+constexpr char kLocalScopes[] = "local_scopes";
+constexpr char kStrategy[] = "strategy";
+constexpr char kNRanks[] = "nranks";


try to avoid exposing a lot of this global names

add fuse_grad_op

58a4f9d

chengduoZH force-pushed the fuse_gradient_space branch from c568447 to 58a4f9d Compare January 23, 2019 14:28

chengduozh added 5 commits January 24, 2019 19:01

Add Fused AllReduceOpHandle

a65d5a7

add unit test

c98880e

Fuse Gradient Space

faffa13

test=develop

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

ce7a584

… fuse_gradient_space test=develop

enable fuse all_reduce

0ea6bee

test=develop

chengduoZH force-pushed the fuse_gradient_space branch from ef071b7 to 0ea6bee Compare January 29, 2019 03:26

chengduozh added 3 commits January 29, 2019 16:01

refine fused_all_reduce_op_handle

59247a8

Refine alloc_space_for_var_op

e8cbcb7

test=develop

polish code

e5cc50b

test=develop

chengduoZH changed the title ~~[WIP] Add Fuse Gradient Pass~~ Add Fuse Gradient Pass Jan 29, 2019

chengduoZH changed the title ~~Add Fuse Gradient Pass~~ Add Fuse AllReduce Pass Jan 29, 2019

chengduoZH requested review from Yancey0623, panyx0718 and typhoonzero January 30, 2019 02:09

typhoonzero reviewed Jan 30, 2019

View reviewed changes

chengduozh added 13 commits January 30, 2019 15:39

remove ophandle from graph attribute

f0ec72e

test=develop

NeedCollectiveOps

d7d582d

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

aabe6cf

… fuse_gradient_space test=develop

refine NeedCollectiveOps

b24acdc

test=develop

Fuse Adam Pass

20c7364

test=develop

Fix bug

e4008e2

test=develop

Add RunOnlyOnceProgram

3debcf3

test=develop

polish code

7bd42bd

test=develop

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

54d21e9

… fuse_gradient_space test=develop

polish code

23f9ce3

test=develop

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

ee07038

… fuse_gradient_space test=develop

fix lambda func

c527f36

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

80b439f

… fuse_gradient_space_debug test=develop

chengduozh added 3 commits February 13, 2019 17:24

Add FuseOptimizerOpPass

544bdfe

test=develop

Add sgd_fuse_pass

195ce18

test=develop

Polish code

fcb9c81

test=develop

typhoonzero reviewed Feb 14, 2019

View reviewed changes

create fuse_all_reduce_op_pass

57fc341

test=develop

chengduoZH force-pushed the fuse_gradient_space branch from 8d3184e to 57fc341 Compare February 17, 2019 05:18

chengduozh added 3 commits February 17, 2019 13:52

Polish code

46cccba

test=develop

Add group for gradient

97be9b6

test=develop

fix bug

79dff0b

chengduoZH force-pushed the fuse_gradient_space branch from 5b8fc64 to d4e18bf Compare February 19, 2019 06:20

chengduozh added 5 commits February 19, 2019 15:41

Polish code

6696b7c

test=develop

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

58df23b

… fuse_gradient_space9 test=develop

Add block for OpDesc

e136462

add test_alloc_continuous_space_op

82210b1

Polish code

487d31e

test=develop

chengduoZH force-pushed the fuse_gradient_space branch from d4e18bf to 487d31e Compare February 20, 2019 04:59

Add alloc_continuous_params_pass

f450247

test=develop

chengduoZH force-pushed the fuse_gradient_space branch 2 times, most recently from 79ad035 to 7b77ef9 Compare February 20, 2019 12:57

add test_alloc_continuous_param_pass.py

57b2479

test=develop

chengduoZH force-pushed the fuse_gradient_space branch from 7b77ef9 to 57b2479 Compare February 20, 2019 14:54

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

c8f1316

… fuse_gradient_space test=develop

panyx0718 reviewed Feb 25, 2019

View reviewed changes

remove alloc_continuous_space_for_grad_op.

71a314d

test=develop

chengduoZH force-pushed the fuse_gradient_space branch from 8507cb0 to 71a314d Compare February 25, 2019 07:51

panyx0718 reviewed Feb 25, 2019

View reviewed changes

This was referenced Feb 25, 2019

Add alloc_continuous_space_op #15900

Merged

Fuse AllReduce #15921

Merged

Fuse Adam And SGD ops #15933

Merged

chengduoZH closed this Mar 28, 2019


		bool fuse_elewise_add_act_ops_{false};

		bool fuse_all_reduce_ops_{false};

Conversation

chengduoZH commented Jan 23, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels