[3D-parallelism] Hybrid Model Parallelism#32074
[3D-parallelism] Hybrid Model Parallelism#32074wangxicoding merged 18 commits intoPaddlePaddle:developfrom
Conversation
|
Thanks for your contribution! |
|
✅ This PR's description meets the template requirements! |
| optional int32 sharding_degree = 3 [ default = 8 ]; | ||
| optional int32 mp_degree = 4 [ default = 1 ]; | ||
| optional string sharding_segment_strategy = 5 | ||
| optional string sharding_segment_strategy = 1 |
There was a problem hiding this comment.
recorded, document will be added in fluiddoc and fleetx
There was a problem hiding this comment.
Also need add comments to this code.
| optional bool hybrid_dp = 7 [ default = false ]; | ||
| optional int32 gradient_merge_acc_step = 8 [ default = 1 ]; | ||
| optional bool optimize_offload = 9 [ default = false ]; | ||
| optional bool pp_allreduce_in_optimize = 10 [ default = false ]; |
There was a problem hiding this comment.
Add some comments, in 3d or 4d parallel, allreduce_in_optimize=True can reduce communication, allreduce_in_optimize=False can reduce memory
There was a problem hiding this comment.
recorded, document will be added in fluiddoc and fleetx and .py file where the feature is called.
but I think this should be a feature for internal project now, and we should not expose It to users ?
PR types
New features
PR changes
APIs
Describe
new features
performance optimization:
performance-related
example
assume we have 4 nodes with 8 gpus per node:
mp-sharding-pp 3D parallelism