ParallelDyGraph with GPU collective mode by Yancey0623 · Pull Request #16827 · PaddlePaddle/Paddle

Yancey0623 · 2019-04-12T08:28:52Z

Implement dygraph.parallel.DataParallel to hook collective ops in bk progress.

TODO:

add bcast parameters interface to bcast params from node0.
Refine the DataParallel API to make it simpler.
Implement varbase hook so that we can hook op's output grad filtered by parameter type.

…_dygraph_gpu test=develop

velconia · 2019-04-18T14:39:16Z

Cool job~

junjun315

LGTM, some comment

python/paddle/fluid/tests/unittests/test_dist_base.py

velconia · 2019-04-24T02:08:09Z

paddle/fluid/imperative/layer.cc

 }

-void OpBase::RegisterBackwardHooks(const py::object& callable) {
+void OpBase::RegisterBackwardHooks(const py::object& callable, bool front) {


why need front?

trace.py append the release_op as the first grad hook, and we can get the op object in other hooks, maybe we can store the hooks using stack instead of vector ?

velconia · 2019-04-24T02:09:37Z

paddle/fluid/pybind/pybind.cc

      .def("_grad_ivar",
           [](const imperative::VarBase &self) { return self.grads_; },
           py::return_value_policy::reference)
+      .def("_set_grad_ivar", [](imperative::VarBase &self,


remove unused method

Will do that, thx.

velconia · 2019-04-24T02:11:34Z

python/paddle/fluid/dygraph/parallel.py



-def prepare_context(parallel_strategy, place):
+def prepare_context(parallel_strategy, place=None):


do we need place here?

Not necessary so far, can remove it.

velconia · 2019-04-24T02:26:33Z

python/paddle/fluid/dygraph/parallel.py

+                    collective._allreduce(g_var, g_var, sync_mode=True)
+
+        outs = self._layers(*inputs, **kwargs)
+        for _, op in six.iteritems(_dygraph_tracer()._ops):


should all op's output be collectived?

maybe not? The trainable parameters are only the inputs?

velconia · 2019-04-24T02:28:18Z

python/paddle/fluid/dygraph/parallel.py

+                        name=ivar._grad_name(),
+                        stop_gradient=True,
+                        ivar=g)
+                    collective._allreduce(g_var, g_var, sync_mode=True)


why the output be g_var?

will collective the grad var inplace.

velconia

LGTM with little comments

velconia · 2019-04-25T09:09:58Z

backward hooks should be supported in parameters
op's output gradient should be filtered with parameter type?

Yancey0623 · 2019-04-25T09:26:26Z

backward hooks should be supported in parameters
op's output gradient should be filtered with parameter type?

Added in the TODO list.

# The first commit's message is: remove ut test_dist_word2vec in mac ci, will fix it in private, test=develop (PaddlePaddle#17066) # This is the 2nd commit message: Fleet unify distributed training (PaddlePaddle#16791) * implement distributed transpiler with fleet # This is the 3rd commit message: ParallelDyGraph with GPU collective mode (PaddlePaddle#16827) implement dygraph.parallel.DataParallel to hook reduce op. # This is the 4th commit message: Init mixed precision training interface (PaddlePaddle#16856) * Init mixed precision training interface * Add fp16 test script test=develop * All initializers support float16 test=develop * Code cleanup & add more code annotations test=develop * Update API spec test=develop * Add usage example in doc test=develop # This is the 5th commit message: fix reference_count_pass,test=develop (PaddlePaddle#17060) test=develop # This is the 6th commit message: Speedup roi_perspective_transform op by caching the information of linear interpolation in forward (PaddlePaddle#17090) * Cache the information of linear interpolation in forward and use it in backward. test=develop * Fix cuda kernel. test=develop # This is the 7th commit message: remove unnecessary prepare_data (PaddlePaddle#17080) test=develop # This is the 8th commit message: fix interpolate cu. test=develop (PaddlePaddle#17101) # This is the 9th commit message: test=develop, double backward leaky_relu (PaddlePaddle#17067) backward of backward: leaky_relu # This is the 10th commit message: fix fuse optimizer ops (PaddlePaddle#17102) test=develop # This is the 11th commit message: truncated_gaussian_random supported in distributed training, test=develop (PaddlePaddle#17091) # This is the 12th commit message: Detailed coordinate description for yolov3 loss (PaddlePaddle#17007) * Detailed coordinate description for yolov3 loss test=develop * modified api.spec test=develop * modified loss name * fix api.spec test=develop * polish description test=develop * modified api.spec test=develop # This is the 13th commit message: fix test_weight_decay (PaddlePaddle#17109) test=develop # This is the 14th commit message: Path flag (PaddlePaddle#17105) * fix python/paddle/fluid/__init__.py detecting problems

* refine_dropout_mem,test=develop * # This is a combination of 14 commits. # The first commit's message is: remove ut test_dist_word2vec in mac ci, will fix it in private, test=develop (#17066) # This is the 2nd commit message: Fleet unify distributed training (#16791) * implement distributed transpiler with fleet # This is the 3rd commit message: ParallelDyGraph with GPU collective mode (#16827) implement dygraph.parallel.DataParallel to hook reduce op. # This is the 4th commit message: Init mixed precision training interface (#16856) * Init mixed precision training interface * Add fp16 test script test=develop * All initializers support float16 test=develop * Code cleanup & add more code annotations test=develop * Update API spec test=develop * Add usage example in doc test=develop # This is the 5th commit message: fix reference_count_pass,test=develop (#17060) test=develop # This is the 6th commit message: Speedup roi_perspective_transform op by caching the information of linear interpolation in forward (#17090) * Cache the information of linear interpolation in forward and use it in backward. test=develop * Fix cuda kernel. test=develop # This is the 7th commit message: remove unnecessary prepare_data (#17080) test=develop # This is the 8th commit message: fix interpolate cu. test=develop (#17101) # This is the 9th commit message: test=develop, double backward leaky_relu (#17067) backward of backward: leaky_relu # This is the 10th commit message: fix fuse optimizer ops (#17102) test=develop # This is the 11th commit message: truncated_gaussian_random supported in distributed training, test=develop (#17091) # This is the 12th commit message: Detailed coordinate description for yolov3 loss (#17007) * Detailed coordinate description for yolov3 loss test=develop * modified api.spec test=develop * modified loss name * fix api.spec test=develop * polish description test=develop * modified api.spec test=develop # This is the 13th commit message: fix test_weight_decay (#17109) test=develop # This is the 14th commit message: Path flag (#17105) * fix python/paddle/fluid/__init__.py detecting problems

Yancey1989 added 5 commits April 12, 2019 16:26

add collective hooks

541a3fc

Merge branch 'develop' of github.com:PaddlePaddle/Paddle into prallel…

fd792a5

…_dygraph_gpu test=develop

update

2f57ab4

add unittest test=develop

7bd25d2

polish code test=develop

bd76c99

Yancey0623 changed the title ~~[WIP] ParallelDyGraph with GPU collective mode~~ ParallelDyGraph with GPU collective mode Apr 18, 2019

polish code test=develop

e685097

Yancey0623 requested review from junjun315 and velconia April 18, 2019 11:21

Yancey1989 added 4 commits April 22, 2019 16:40

fix ci test=develop

81b3b07

fix ci test=develop

13a6e1c

fix build with cpu test=develop

fdedfa5

fix py3 test=develop

72de42d

junjun315 previously approved these changes Apr 24, 2019

View reviewed changes

python/paddle/fluid/tests/unittests/test_dist_base.py Show resolved Hide resolved

junjun315 self-requested a review April 24, 2019 02:27

velconia reviewed Apr 24, 2019

View reviewed changes

update by comment test=develop

b354e6b

Yancey0623 dismissed junjun315’s stale review via b354e6b April 24, 2019 09:18

velconia approved these changes Apr 25, 2019

View reviewed changes

Yancey0623 merged commit 0b07eef into PaddlePaddle:develop Apr 25, 2019

Yancey0623 deleted the prallel_dygraph_gpu branch April 25, 2019 09:41



		def prepare_context(parallel_strategy, place):
		def prepare_context(parallel_strategy, place=None):

Conversation

Yancey0623 commented Apr 12, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

velconia commented Apr 18, 2019

Uh oh!

junjun315 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Yancey0623 Apr 24, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

velconia left a comment

Choose a reason for hiding this comment

Uh oh!

velconia commented Apr 25, 2019

Uh oh!

Yancey0623 commented Apr 25, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Yancey0623 commented Apr 12, 2019 •

edited

Loading

Yancey0623 Apr 24, 2019 •

edited

Loading