[hybrid performance] all reduce fusion for sharding by FeixLiu · Pull Request #34480 · PaddlePaddle/Paddle

FeixLiu · 2021-07-29T06:34:19Z

PR types

New features

PR changes

Others

Describe

allreduce fuse supports for sharding_optimizer

Throughput test

Using GPT model, 8 * V100, fuse_grad_in_size=32MB

dp=4 sharding=2

	No Fuse	Fused	Gain
throughput	135503 tokens/s	138892 tokens/s	+2.7%
allreduce number	57	15	-73%

Loss curve

dp_hybrid_sharding

dp_pp_hybrid_sharding

paddle-bot-old · 2021-07-29T06:34:43Z

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

python/paddle/distributed/fleet/meta_optimizers/sharding/utils.py

wangxicoding

LGTM

wangxicoding reviewed Jul 29, 2021

View reviewed changes

python/paddle/distributed/fleet/meta_optimizers/sharding/utils.py Outdated Show resolved Hide resolved

all reduce fusion for shardinug, test=develop

e6e26a5

FeixLiu force-pushed the allreduce_fuse_sharding branch from 862ffb4 to e6e26a5 Compare July 29, 2021 11:23

wangxicoding approved these changes Jul 30, 2021

View reviewed changes

wangxicoding merged commit 423ea97 into PaddlePaddle:develop Jul 30, 2021

FeixLiu deleted the allreduce_fuse_sharding branch July 30, 2021 02:06

FeixLiu changed the title ~~all reduce fusion for shardinug~~ [hybrid performance] all reduce fusion for shardinug Oct 11, 2021

FeixLiu changed the title ~~[hybrid performance] all reduce fusion for shardinug~~ [hybrid performance] all reduce fusion for sharding Oct 11, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[hybrid performance] all reduce fusion for sharding#34480

[hybrid performance] all reduce fusion for sharding#34480
wangxicoding merged 1 commit intoPaddlePaddle:developfrom
FeixLiu:allreduce_fuse_sharding

FeixLiu commented Jul 29, 2021 •

edited

Loading

Uh oh!

paddle-bot-old bot commented Jul 29, 2021

Uh oh!

Uh oh!

wangxicoding left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

FeixLiu commented Jul 29, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR types

PR changes

Describe

Throughput test

Loss curve

Uh oh!

paddle-bot-old bot commented Jul 29, 2021

Uh oh!

Uh oh!

wangxicoding left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

FeixLiu commented Jul 29, 2021 •

edited

Loading