[Auto-parallel] Fix sharding all_gather overlap in auto_dy#73717

Merged

Xing-lil merged 14 commits intoPaddlePaddle:developfrom

Xing-lil:fix_sharding_allgather_overlap

Jul 8, 2025

Contributor

Xing-lil commented Jun 30, 2025 •

edited

Loading

PR Category

Auto Parallel

PR Types

Bug fixes

Description

Launching all all_gather at once blocks overlap with other sync/comm ops.
Fix: Prefetch 1 buffer ahead by hook to enable overlap.
Ref: Same fix in dynamic_hand #73406
Pcard-70448


          Update api.py

fe3d757

Xing-lil mentioned this pull request

[Auto-parallel] Fix sharding all_gather overlap in auto_dy PaddlePaddle/PaddleNLP#10782

Merged

2 tasks


          Update api.py

24d448e

liym27 reviewed

View reviewed changes

python/paddle/distributed/auto_parallel/api.py Outdated

+                          def fuse_all_gather_hook_func(param_storage, comm_group):
+                              @paddle.autograd.no_grad()
+                              def fuse_comm(*_):
+                                  shard_size = param_storage._numel() // comm_group.nranks

Contributor

liym27 Jul 1, 2025

这里，如果 param_storage._numel() 不能被整除，会怎么处理

Contributor Author

Xing-lil Jul 1, 2025

在 _build_fuse_param_view 中的 get_padded_size 确保了param_storage._numel() 是 comm_group.nranks 整数倍，故不会出现这种情况。

python/paddle/distributed/auto_parallel/api.py Outdated

+                                  task = paddle.distributed.all_gather(
+                                      param_storage,
+                                      slice_buffer,
+                                      group=self._sharding_group,

Contributor

liym27 Jul 1, 2025

为什么传了 comm_group 但实际用的 self._sharding_group？

Contributor Author

Xing-lil Jul 1, 2025

已做修改，感谢！

python/paddle/distributed/auto_parallel/api.py

+                  def _set_sharding_overlap(self, enable_sharding_overlap, layers):
+                      self.enable_sharding_overlap = enable_sharding_overlap
+                      self._layers = layers

Contributor

liym27 Jul 1, 2025

1、后续要用到 self._layers 做参数查找和注册 hook，这里需要对 layers 参数做检查，比如，类型是 paddle.nn.Layer
2、这个函数本身就是 enable_sharding_overlap 为 True 时才会调用吧，是有有必要再传这个参数？

Contributor Author

Xing-lil Jul 1, 2025

1和2均已做修改，感谢！

python/paddle/distributed/auto_parallel/api.py

+                                  'param'
+                              ]
+                              layer = _find_layer_containing_param(first_param)
+                              layer.register_forward_pre_hook(

Contributor

liym27 Jul 1, 2025

这里每次调用 _find_layer_containing_param 都会遍历所有子layer，建议缓存 param2layer 的关系
考虑 layer 为 None 的情况

Contributor Author

Xing-lil Jul 1, 2025

已修改为用局部变量 param2layer = {} 缓存，已有 self._layers 为 None 时的报错提醒。

python/paddle/distributed/auto_parallel/api.py Outdated

                               )
+                  def _set_tensor_fusion(self, enable_tensor_fusion):
+                      self.enable_tensor_fusion = enable_tensor_fusion

Contributor

liym27 Jul 1, 2025

这个函数本身就是 enable_tensor_fusion 为 True，不需再传参数 enable_tensor_fusion 了。建议：

def _enable_tensor_fusion(self):
self.enable_tensor_fusion = True

Contributor Author

Xing-lil Jul 1, 2025

已做修改，感谢！

python/paddle/distributed/auto_parallel/api.py Outdated

+                                  )
+                              for layer in self._layers.sublayers():
+                                  for p in layer.parameters(include_sublayers=False):
+                                      if param.name == p.name:

Contributor

liym27 Jul 1, 2025

这里只能通过 name 来判断吗？是否参数名会被用户修改？

Contributor Author

Xing-lil Jul 1, 2025

修改为根据 param 的id判断


          Update api.py

02f206f

liym27 reviewed

View reviewed changes

python/paddle/distributed/auto_parallel/api.py Outdated

		@@ -1516,6 +1531,16 @@ def _reduce_scatter_gradients(self, grad_storage):
		).wait()

		def _async_reduce_scatter(self):

Contributor

liym27 Jul 1, 2025

如线下沟通，还有以下问题：

函数命名
增加注释

Contributor Author

Xing-lil Jul 1, 2025

已做相应修改，感谢！


          Update api.py

ea1ec63

liym27 previously approved these changes

View reviewed changes

Contributor

liym27 left a comment

LGTM

ZHUI closed this in PaddlePaddle/PaddleNLP#10782

ZHUI reopened this


          Update semi_auto_parallel_sharding_stage_1.py

a00a2c5

Xing-lil dismissed liym27’s stale review via

a00a2c5

July 2, 2025 09:24


          Update semi_auto_parallel_sharding_stage_1.py

6bb3f44

codecov-commenter commented Jul 2, 2025 •

edited

Loading

Codecov Report

Attention: Patch coverage is 55.81395% with 19 lines in your changes missing coverage. Please review.

Please upload report for BASE (develop@5624a3d). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
python/paddle/distributed/auto_parallel/api.py	55.81%	19 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             develop   #73717   +/-   ##
==========================================
  Coverage           ?   55.81%           
==========================================
  Files              ?        1           
  Lines              ?       43           
  Branches           ?        0           
==========================================
  Hits               ?       24           
  Misses             ?       19           
  Partials           ?        0

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Xing-lil added 7 commits

July 3, 2025 10:28


          Update api.py

c06c077


          Update api.py

13c3d8f


          Merge branch 'PaddlePaddle:develop' into fix_sharding_allgather_overlap

58ded16


          Update api.py

93d2dc6


          Update api.py

e8e50da


          Update api.py

3c9457f


          Update api.py

74a19bf

liym27 approved these changes

View reviewed changes

Contributor

liym27 left a comment

LGTM

swgu98 added the skip-ci: approval label


          rerun commit

bcd98bd

XieYunshen added the skip-ci: coverage label

Xing-lil merged commit 3d95ceb into PaddlePaddle:develop

74 of 76 checks passed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

skip-ci: approval skip-ci: coverage