Skip to content

Add fuse momenutum ops#16745

Merged
chengduoZH merged 7 commits intoPaddlePaddle:developfrom
chengduoZH:add_fuse_momentum
Apr 23, 2019
Merged

Add fuse momenutum ops#16745
chengduoZH merged 7 commits intoPaddlePaddle:developfrom
chengduoZH:add_fuse_momentum

Conversation

@chengduoZH
Copy link
Contributor

@chengduoZH chengduoZH commented Apr 9, 2019

Test script:

Env:

  • V100-SXM2, cuda driver 418.39
  • CUDA 9.0, cudnn 7.1
  1 card throughput(img/s) 4 card throughput(img/s)
before fuse optimizer ops 242.7921093 1555.285541
after fuse optimizer ops 246.7232074 1720.430108
speedup ratio 0.01619121 0.106182796

For 1 card, because the momentum ops are fused, so the throughput is increased.
For 4 card, because the momentum ops are fused, so the execution of momentum ops are delayed, so the throughput is increased.

test=develop
test=develop
chengduozh added 2 commits April 12, 2019 20:16
@chengduoZH chengduoZH requested review from Xreki and tensor-tang April 12, 2019 12:34
// NOTE: fused_var is only exist in scope, so the graph doesn't have
// fused_var node.

VLOG(7) << "Insert adam to graph ";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe use 5 ?

}

VLOG(10) << "Find " << fuse_op_type << " operators: " << opt_ops.size();
VLOG(6) << "Find " << fuse_op_type << " operators: " << opt_ops.size();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sometimes 5, sometimes 6?
Any reason?

Copy link
Contributor

@tensor-tang tensor-tang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@chengduoZH chengduoZH merged commit a2be4b4 into PaddlePaddle:develop Apr 23, 2019
ceci3 pushed a commit to ceci3/Paddle that referenced this pull request Apr 24, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants