-
Notifications
You must be signed in to change notification settings - Fork 6k
[3D-parallelism] Hybrid Model Parallelism #32074
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 16 commits
9e76409
1145089
0c36b59
86660d4
ec1040e
de68776
b77cafb
fa0b181
f4fc44d
ab6c51d
88523fa
b5b7ef9
e65fdcb
0e0b341
c15f53a
f3ebcc6
efc5e84
0c28209
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -29,14 +29,18 @@ message RecomputeConfig { | |
| } | ||
|
|
||
| message ShardingConfig { | ||
| optional float segment_broadcast_MB = 1 [ default = 32.0 ]; | ||
| optional bool hybrid_dp = 2 [ default = false ]; | ||
| optional int32 sharding_degree = 3 [ default = 8 ]; | ||
| optional int32 mp_degree = 4 [ default = 1 ]; | ||
| optional string sharding_segment_strategy = 5 | ||
| optional string sharding_segment_strategy = 1 | ||
| [ default = 'segment_broadcast_MB' ]; | ||
| repeated string segment_anchors = 6; | ||
| optional int32 gradient_merge_acc_step = 7 [ default = 1 ]; | ||
| optional float segment_broadcast_MB = 2 [ default = 32.0 ]; | ||
| repeated string segment_anchors = 3; | ||
| optional int32 sharding_degree = 4 [ default = 8 ]; | ||
| optional int32 mp_degree = 5 [ default = 1 ]; | ||
| optional int32 dp_degree = 6 [ default = 1 ]; | ||
| optional bool hybrid_dp = 7 [ default = false ]; | ||
| optional int32 gradient_merge_acc_step = 8 [ default = 1 ]; | ||
| optional bool optimize_offload = 9 [ default = false ]; | ||
| optional bool pp_allreduce_in_optimize = 10 [ default = false ]; | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Add some comments, in 3d or 4d parallel, allreduce_in_optimize=True can reduce communication, allreduce_in_optimize=False can reduce memory
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. recorded, document will be added in fluiddoc and fleetx and .py file where the feature is called. but I think this should be a feature for internal project now, and we should not expose It to users ? |
||
| optional int32 pp_degree = 11 [ default = 1 ]; | ||
| } | ||
|
|
||
| message AMPConfig { | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Enum comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
recorded, document will be added in fluiddoc and fleetx
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also need add comments to this code.