Skip to content

Conversation

@wangxicoding
Copy link
Contributor

@wangxicoding wangxicoding commented Jul 16, 2021

PR types

Function optimization

PR changes

Others

Describe

优化流水线并行显存占用。
优化前显存会随着global batch size增大而增大,优化后显存不会随gbs增大而增大,保持不变

测试

V100 32GB 单机8卡。
gpt2-medium-en 345MB模型,pipeline_stage=8, micro_batch=4,

gbs 卡号 develop(MB) PR(MB) 显存变化量(MB)
32 0 24402 24424 +22
1 21376 21398 +22
7 7830 7834 +4
64 0 24660 24682 +22
1 21634 21398 -236
7 7830 7834 +4
256 0 24660 不变 +22
1 22408 不变 -1010
7 8168 不变 -334
1024 0 24660 不变 +22
1 25504 不变 -4106
7 11770 不变 -3936
2048 0 24600 不变 +22
1 29632 不变 -8234
7 15710 不变 -7876
3072 0 OOM 不变
1 OOM 不变
7 OOM 不变

测试结果

PR显存不随global batch size增大而增大,理论可增大到无穷

至于0卡会比develop多22MB显存的原因。是因为PR #34086 中将send的释放放到了Backward的recv之后,而本PR将send的释放放到了Backward之后,在生命周期上会长那么一点点时间。这部分也可以进行优化,将send的释放放到Backward的recv之后,但静态图中需要对pipeline执行器有较多改动(动态图实现简单),工程实现麻烦,故先不优化。

@paddle-bot-old
Copy link

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@wangxicoding wangxicoding force-pushed the optimize_pipeline_memory branch from eaf73d0 to 4a806a7 Compare July 16, 2021 13:33
@wangxicoding wangxicoding changed the title Optimize pipeline memory [hybrid parallel] Optimize pipeline memory Jul 17, 2021
@wangxicoding wangxicoding deleted the optimize_pipeline_memory branch July 20, 2021 01:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant