[HybridParallel]Support 1f1b for PipelineParallel by ForFishes · Pull Request #34483 · PaddlePaddle/Paddle

ForFishes · 2021-07-29T08:52:22Z

PR types

New features

PR changes

Others

Describe

[HybridParallel]Support 1f1b for PipelineParallel

修改当前流水线并行的调度方式，采用更省显存的1f1b的调度方式,类似于Megatron的 https://github.com/NVIDIA/Megatron-LM/blob/main/megatron/schedules.py。
具体的调度图如下：

GPT-117M模型，V100-32G，PP=8， mircrobatch=2

global batch	优化前显存	优化后显存
128	OOM	5876
512	OOM	5882
1024	OOM	5886

paddle-bot-old · 2021-07-29T08:52:26Z

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

wangxicoding

LGTM

python/paddle/distributed/fleet/meta_parallel/pipeline_parallel.py

wangxicoding · 2021-08-02T06:44:14Z

python/paddle/distributed/fleet/meta_parallel/pipeline_parallel.py

-                paddle.autograd.backward(
-                    self.scaler.scale(self.caches['outputs'][cache_id]))
+            input_tensor_grad = self._backward_step(input_tensor, output_tensor,
+                                                    output_tensor_grad)


output_tensor和output_tensor_grad用完了，貌似可以先手工释放一下

应该不能手动设置为None，让它释放吧。host端提前释放，可能device还没开始计算。

没问题的，gpu kernel调度拿到了地址，运行时候不被覆盖及被别人覆盖就行，可以试试🌚

python/paddle/distributed/fleet/meta_parallel/pipeline_parallel.py

wangxicoding · 2021-08-02T07:35:17Z

python/paddle/distributed/fleet/meta_parallel/pp_utils/p2p_communication.py

+        paddle.distributed.send(dtype, dst=1, group=group)
+
+    def send_meta(self, tensor, group):
+        if isinstance(tensor, paddle.Tensor):


提个建议，在pipeline_parallel.py里也有一大堆isinstance(tensor, tuple)的逻辑，不如把单个的paddle.Tensor封装成tuple，统一走tuple的逻辑

确实！这个后面重写代码的时候，可以改的优美一些。

python/paddle/distributed/fleet/meta_parallel/pp_utils/p2p_communication.py

sandyhouse

LGTM

ForFishes added 8 commits July 21, 2021 20:18

support 1f1b

9ea9841

support 1f1b for pipeline

9449f28

add train_loss for pipeline

e59aea2

rm part of code

0928542

add send_recv_meta

f076ac0

support tuple

c636a31

add utest

1a55ccb

add send_partial/recv_partial

7a2af57

ForFishes added 6 commits July 29, 2021 20:19

send/recv

31d1b66

send/recv

d6abdb7

support amp for pp

d8fa25f

support amp for pp

bff3839

fix bug

c15c141

fix logger

aa97008

wangxicoding self-requested a review August 2, 2021 05:21

wangxicoding approved these changes Aug 2, 2021

View reviewed changes

sandyhouse approved these changes Aug 2, 2021

View reviewed changes

ForFishes merged commit 9e0bb91 into PaddlePaddle:develop Aug 2, 2021

ForFishes deleted the support_1f1b branch August 2, 2021 14:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[HybridParallel]Support 1f1b for PipelineParallel#34483

[HybridParallel]Support 1f1b for PipelineParallel#34483
ForFishes merged 14 commits intoPaddlePaddle:developfrom
ForFishes:support_1f1b

ForFishes commented Jul 29, 2021 •

edited

Loading

Uh oh!

paddle-bot-old bot commented Jul 29, 2021

Uh oh!

wangxicoding left a comment

Uh oh!

Uh oh!

wangxicoding Aug 2, 2021

Uh oh!

ForFishes Aug 2, 2021

Uh oh!

wangxicoding Aug 2, 2021

Uh oh!

Uh oh!

wangxicoding Aug 2, 2021

Uh oh!

ForFishes Aug 2, 2021

Uh oh!

Uh oh!

sandyhouse left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ForFishes commented Jul 29, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR types

PR changes

Describe

Uh oh!

paddle-bot-old bot commented Jul 29, 2021

Uh oh!

wangxicoding left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

wangxicoding Aug 2, 2021

Choose a reason for hiding this comment

Uh oh!

ForFishes Aug 2, 2021

Choose a reason for hiding this comment

Uh oh!

wangxicoding Aug 2, 2021

Choose a reason for hiding this comment

Uh oh!

Uh oh!

wangxicoding Aug 2, 2021

Choose a reason for hiding this comment

Uh oh!

ForFishes Aug 2, 2021

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sandyhouse left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ForFishes commented Jul 29, 2021 •

edited

Loading