-
Notifications
You must be signed in to change notification settings - Fork 5.9k
[HybridParallel]Add Recompute for PipeLineParallel #34607
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[HybridParallel]Add Recompute for PipeLineParallel #34607
Conversation
|
Thanks for your contribution! |
f28cc8a to
ae6ac75
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Offload in dygraph is Sooooo easy!!! lol
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the fleet.utils.recompute could do in the same way
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, currently support hybrid_parallel first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should sync here ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wait the H2D copy finish before conduct the following computation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cpu() is sync operation, we don't need do this
JZ-LIANG
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
f8eedf2 to
88ee4ad
Compare
88ee4ad to
5e50e53
Compare
JZ-LIANG
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
zhiqiu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for op_function_generator
PR types
New features
PR changes
Others
Describe
Add Recompute for PipeLineParallel
1、接口形式
2、功能支持
相比paddle原生的recompute,有以下几处不同:
3、性能对比
GPT-117M模型,V100-32G, FP32,MP=4, PP=2, mircrobatch=2, global_batch_size=128,中间卡显存
?? recompute + offload + MP切分的组合显存相比更大?
nvidia-smi显示的显存,可以已经释放但被paddle缓存住了。
4、精度对比
在GPT-117M,MP2_PP2下验证精度
DP2_MP2_PP2 + AMP
5、TODO