-
Notifications
You must be signed in to change notification settings - Fork 5.9k
[AutoParallel] fix the grad_clip logic of auto_hybrid_pp #74409
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AutoParallel] fix the grad_clip logic of auto_hybrid_pp #74409
Conversation
|
你的PR提交成功,感谢你对开源项目的贡献! |
Codecov Report❌ Patch coverage is
❌ Your patch status has failed because the patch coverage (55.55%) is below the target coverage (90.00%). You can increase the patch coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## develop #74409 +/- ##
==========================================
Coverage ? 55.55%
==========================================
Files ? 1
Lines ? 9
Branches ? 0
==========================================
Hits ? 5
Misses ? 4
Partials ? 0 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
/re-run all-failed |
| global_norm_var.process_mesh, | ||
| global_norm_var.placements, | ||
| ) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
感觉 is_pp_enable 逻辑不够简洁,可以改成下方代码:
# Check for auto hybrid pipeline parallelism and source mesh existence
if flag_auto_hybrid_pp and src_mesh is not None:
g_mesh = dist.get_mesh()
# Check if mesh exists and pipeline parallelism is enabled ("pp" dim size > 1)
if g_mesh and "pp" in g_mesh.dim_names and g_mesh.get_dim_size("pp") > 1:
# Get the pipeline parallelism subgroup for communication
pp_group = g_mesh.get_submesh_with_dim("pp").get_group("pp")
# Perform all-reduce on the local tensor value across the PP group
global_norm_var_local = global_norm_var._local_value()
dist.all_reduce(
global_norm_var_local,
op=dist.ReduceOp.SUM,
group=pp_group,
)
# Re-shard the tensor with the reduced value
global_norm_var = dist.shard_tensor(
global_norm_var_local,
global_norm_var.process_mesh,
global_norm_var.placements,
)
|
/re-run all-failed |
xuxinyi389
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
From00
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…e#74409) * fix the grad clip performance * add test * empty commit to rerun CI * modify the note * Simplify code logic
…e#74409) * fix the grad clip performance * add test * empty commit to rerun CI * modify the note * Simplify code logic
…dlePaddle#74409)'\n This reverts commit 8f77fa2.
…dlePaddle#74409)'\n This reverts commit 8f77fa2.
…dlePaddle#74409)" This reverts commit 8f77fa2.


PR Category
Auto Parallel
PR Types
Performance
Description