-
Notifications
You must be signed in to change notification settings - Fork 5.9k
[pir+auto parallel] add reshard op for input when needed #63072
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
你的PR提交成功,感谢你对开源项目的贡献! |
| ) | ||
|
|
||
| def forward(self, x): | ||
| x.stop_gradient = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not need to make x require for gradient, the relu_grad in backward will trigger the partial-->replicated allreduce
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is needed, otherwise, relu_grad is not executed.
| op.operands(), op.dist_attr().operand_dist_attrs() | ||
| ): | ||
| if ( | ||
| var.source().is_dist_dense_tensor_type() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In scenario where src_dist_attr and dst_dist_attr have different mesh (e.g. Pipeline Parallelism), it would be better to insert two reshard ops.
one reshard op's mesh = src_dist_attr's mesh
the other's mesh = dst_dist_attr's mesh
therefore in the following (pipeline stage) pruning pass, different stage will keep the reshard op by the mesh it need and remove the other one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could be refined in the next PR
pkuzyc
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for spmd rule
jeff41404
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for API
sunzhongkai588
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR Category
Auto Parallel
PR Types
New features
Description
[pir+auto parallel] add reshard op for input when needed
This PR adds a pass named
apply_partition_pass, which will add reshard op for input when the value's dist_attr is not equal to the use_op's operand dist_attrPcard-76459
The program before,
The program after,