Add Stream for fetch op handle by chengduoZH · Pull Request #16600 · PaddlePaddle/Paddle

chengduoZH · 2019-04-01T12:12:09Z

No description provided.

test=develop

chengduoZH · 2019-04-02T01:38:25Z

paddle/fluid/framework/details/fetch_op_handle.cc

 #ifdef PADDLE_WITH_CUDA
-      TensorCopySync(t, cpu, &tensors_[i]);
+      TensorCopy(t, cpu, *dev_ctxes_.at(t.place()), &tensors_[i]);
+      dev_ctxes_.at(t.place())->Wait();


Don't use default stream here, it may make program slow. https://devblogs.nvidia.com/gpu-pro-tip-cuda-7-streams-simplify-concurrency/

Yancey0623 · 2019-04-02T04:51:21Z

paddle/fluid/framework/details/multi_devices_graph_pass.cc

 void ReduceSSAGraphBuilder::InsertPostprocessOps(ir::Graph *result) const {
  if (UseGPU()) {
-    if (strategy_.fuse_broadcast_op_) {
+    if (strategy_.fuse_broadcast_ops_) {


Maybe we need to let users know when to turn this flag on? If it always works better with Reduce strategy, please keep it on. In additional, we can update the best practices of GPU distributed training.

Maybe we need to let users know when to turn this flag on?

I have added the doc for fuse_broadcast_ops in pybind.cc.

In additional, we can update the best practices of GPU distributed training.

I quite agree with you that we should have a best practice doc.

Yancey0623

lgtm

chengduoZH force-pushed the add_delay_ops_for_threaded_executor branch from 7e0a41c to f2bed8f Compare April 1, 2019 15:07

expose fuse broadcast ops

90b3e94

test=develop

chengduoZH force-pushed the add_delay_ops_for_threaded_executor branch from f2bed8f to 90b3e94 Compare April 1, 2019 15:32

Add doc

1804b19

test=develop

chengduoZH force-pushed the add_delay_ops_for_threaded_executor branch from 747322e to 1804b19 Compare April 2, 2019 01:36

chengduoZH commented Apr 2, 2019

View reviewed changes

chengduoZH requested review from Yancey0623 and typhoonzero April 2, 2019 01:38

Yancey0623 reviewed Apr 2, 2019

View reviewed changes

Yancey0623 approved these changes Apr 2, 2019

View reviewed changes

chengduoZH merged commit b75a69b into PaddlePaddle:develop Apr 2, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Stream for fetch op handle#16600

Add Stream for fetch op handle#16600
chengduoZH merged 2 commits intoPaddlePaddle:developfrom
chengduoZH:add_delay_ops_for_threaded_executor

chengduoZH commented Apr 1, 2019

Uh oh!

chengduoZH Apr 2, 2019

Uh oh!

Yancey0623 Apr 2, 2019 •

edited

Loading

Uh oh!

chengduoZH Apr 2, 2019

Uh oh!

Yancey0623 left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

chengduoZH commented Apr 1, 2019

Uh oh!

chengduoZH Apr 2, 2019

Choose a reason for hiding this comment

Uh oh!

Yancey0623 Apr 2, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chengduoZH Apr 2, 2019

Choose a reason for hiding this comment

Uh oh!

Yancey0623 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Yancey0623 Apr 2, 2019 •

edited

Loading