[RL] Support cpu tensor broadcast by Sunny-bot1 · Pull Request #7833 · PaddlePaddle/FastDeploy

Sunny-bot1 · 2026-05-15T10:20:25Z

Motivation

paddle 建立通信组时默认的 backend 是NCCL，此时 paddle.distributed.broadcast 不支持广播CPU tensor，paddle.distributed.broadcast_object_list 仍会调用GPU kernel并引入DtoH同步拷贝。

Modifications

当我们只需要广播CPU tensor时可以用 gloo backend 建组：

group = dist.new_group(list(range(world_size), backend="gloo")
paddle.distributed.broadcast(signal_tensor, src=0, group=group)

gloo 是纯 CPU socket 实现，完全绕开 NCCL 和 GPU, nsys 上不会有任何 CUDA kernel 和 DtoH/HtoD。
代价：gloo的带宽和延迟比NCCL差，但广播一个信号值本身数据量极小，实际影响可忽略。

paddle.distributed.shutdown_process_group() 不传参时会遍历所有 process group 调 .shutdown()，但 ProcessGroupGloo 没有实现这个方法，导致触发 AttributeError。

解决方案：在调用 paddle.distributed.shutdown_process_group() 前，先把 gloo group 从 paddle 的全局 group 注册表里移除。在调全量 shutdown_process_group() 前，遍历 paddle 全局 group 注册表，把没有 shutdown 方法（即 gloo 等 CPU backend）的 group 条目先删掉，这样后续遍历就跳过它们，不会触发AttributeError。

gloo group 本身不需要显式 shutdown，进程退出时会自动清理。

Usage or Command

Accuracy Tests

Checklist

Add at least a tag in the PR title.
- Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
- You can add new tags based on the PR content, but the semantics must be clear.
Format your code, run pre-commit before commit.
Add unit tests. Please write the reason in this PR if no unit tests.
Provide accuracy results.
If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

paddle-bot · 2026-05-15T10:20:34Z

Thanks for your contribution!

PaddlePaddle-bot · 2026-05-15T10:38:41Z

🤖 Paddle-CI-Agent | ci_status_monitor | 2026-05-18 15:04:19

CI报告基于以下代码生成（30分钟更新一次）:

PR commit: c9f30fd
Merge base: 9139986 (branch: develop)
查看完整 Diff
CI 详情

1 任务总览

存在 1 个 required 失败任务，需处理后方可合并。

总执行（rerun次数）	总任务	✅ 通过	❌ 失败	⏳ 运行中	⏸️ 等待中	跳过
42(0)	42	37	4	1	0	0

2 任务状态汇总

2.1 Required任务 : 9/10 通过

必选任务阻塞合并，失败需优先处理。

状态	任务	耗时	根因	修复建议	日志	重跑
❌	`run_tests_with_coverage`	1h22m	PR问题：新增代码覆盖率 25%，未达 80% 阈值	为 dynamic_weight_manager.py L353-357 等添加单测	Job	-
✅	其余 9 个必选任务通过	-	-	-	-	-

2.2 可选任务 — 28/32 通过

可选任务不阻塞合并，失败仅供参考。

状态	任务	耗时	日志	重跑
❌	`Run iluvatar Tests / run_iluvatar_cases`	11m48s	Job	-
❌	`Check PR Template`	11s	Job	-
❌	`Trigger Jenkins for PR`	11m35s	Job	-
⏳	`CI_HPU`（运行中）	-	-	-
✅	其余 28 个可选任务通过	-	-	-

3 失败详情（仅 required）

Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage — 覆盖率不达标（置信度: 高）

Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage

状态: ❌ 失败
错误类型: 覆盖率不达标
置信度: 高
根因摘要: 新增代码整体覆盖率 25%，未达 80% 阈值
分析器: ci_analyze_unittest_fastdeploy

覆盖率详情:

文件	覆盖率	未覆盖行
`fastdeploy/rl/dynamic_weight_manager.py`	0.0%	L353, L355, L356, L357
`fastdeploy/worker/worker_process.py`	37.5%	L319, L322, L323, L324, L475
合计	25%	9/12 行未覆盖

根因详情:
本次 PR "[RL] Support cpu tensor broadcast" 新增/修改了 fastdeploy/rl/dynamic_weight_manager.py 和 fastdeploy/worker/worker_process.py，但未提供对应的单元测试覆盖。dynamic_weight_manager.py 新增代码的覆盖率为 0%（L353-357 全部未覆盖），worker_process.py 覆盖率仅 37.5%（L319、L322-324、L475 未覆盖），导致 diff 总覆盖率仅 25%，远低于 CI 要求的 80% 阈值。

关键日志:

COVERAGE_EXIT_CODE: 9
Coverage generation failed (exit code 9)
GPU Patch Coverage Details:
  total_num_lines: 12, total_num_violations: 9, total_percent_covered: 25
  fastdeploy/rl/dynamic_weight_manager.py: 0.0% (violations: L353,355,356,357)
  fastdeploy/worker/worker_process.py: 37.5% (violations: L319,322,323,324,475)
##[error]Process completed with exit code 9.

修复建议:

为 fastdeploy/rl/dynamic_weight_manager.py L353, L355-L357 的新增代码添加单元测试（当前覆盖率 0%）
为 fastdeploy/worker/worker_process.py L319, L322-L324, L475 的新增代码添加单元测试（当前覆盖率 37.5%）
若新增代码确实难以通过单测覆盖，可在 CI 配置中为对应文件申请豁免

修复建议摘要: 为 dynamic_weight_manager.py L353-357 新增代码添加单元测试

关联变更: fastdeploy/rl/dynamic_weight_manager.py (L353-357), fastdeploy/worker/worker_process.py (L319-324, L475)

链接: 查看日志

codecov-commenter · 2026-05-15T11:24:02Z

Codecov Report

❌ Patch coverage is 25.00000% with 9 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@9139986). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
fastdeploy/worker/worker_process.py	37.50%	5 Missing ⚠️
fastdeploy/rl/dynamic_weight_manager.py	0.00%	4 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             develop    #7833   +/-   ##
==========================================
  Coverage           ?   63.31%           
==========================================
  Files              ?      462           
  Lines              ?    64284           
  Branches           ?     9854           
==========================================
  Hits               ?    40700           
  Misses             ?    20815           
  Partials           ?     2769

Flag	Coverage Δ
GPU	`72.43% <25.00%> (?)`
XPU	`7.11% <0.00%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

PaddlePaddle-bot · 2026-05-15T12:02:36Z

🤖 Paddle-CI-Agent | ci_status_monitor | 2026-05-15 20:00:38

CI报告基于以下代码生成（30分钟更新一次）:

PR commit: f60365d
Merge base: 9139986 (branch: develop)
查看完整 Diff
CI 详情

1 任务总览

⚠️ 存在 2 个 Required 任务失败，1 个 Required 任务运行中，请优先处理。

总执行（rerun次数）	总任务	✅ 通过	❌ 失败	⏳ 运行中	⏸️ 等待中	跳过
41(0)	41	35	4	1	1	0

2 任务状态汇总

2.1 Required任务 : 7/10 通过

必选任务阻塞合并，失败需优先处理。

状态	任务	耗时	根因	修复建议	日志	重跑
❌	`Run Stable Tests / stable_tests`	2m8s	PR问题：worker broadcast 改用 CPU tensor+gloo，疑似影响 stable 测试	检查 gloo backend 是否可用，或确认测试环境支持	Job	-
❌	`Extracted partial CE model tasks to run in CI. / run_ce_cases`	23m15s	PR问题：event_loop 中循环创建 gloo group，疑似导致 CE 测试失败	将 gloo group 创建移至循环外，避免重复创建	Job	-
⏳	`Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage`	-	运行中	-	Job	-
✅	其余 7 个必选任务通过	-	-	-	-	-

2.2 可选任务 — 28/31 通过

可选任务不阻塞合并，失败仅供参考。

状态	任务	耗时	日志	重跑
❌	`Run iluvatar Tests / run_iluvatar_cases`	10m36s	Job	-
❌	`Check PR Template`	10s	Job	-
⏸️	`CI_HPU`	-	-	-
✅	其余 28 个可选任务通过	-	-	-

3 失败详情（仅 required）

Run Stable Tests / stable_tests — 测试失败（置信度: 低）

Run Stable Tests / stable_tests

状态: ❌ 失败
错误类型: 测试失败
置信度: 低
根因摘要: PR 将 worker broadcast 改为 CPU tensor + gloo group，疑影响 stable 测试
分析器: ci_analyze_unittest_fastdeploy（日志获取失败，基于 PR diff 分析）

根因详情:
本次 PR 修改了 fastdeploy/worker/worker_process.py 中的 _broadcast_model_weights_signal 方法，将原来的 paddle.distributed.broadcast_object_list(group=None) 改为使用 CPU tensor + paddle.distributed.broadcast + 显式 gloo group。由于无法获取实际日志（日志下载失败），无法确认具体失败的测试用例。stable_tests 仅运行了 2m8s 即失败，疑似在初始化或早期测试阶段即出错。

关键日志:

（日志获取失败，无法提取错误信息）
失败步骤: Run FastDeploy Stable Tests

修复建议:

确认 gloo backend 在 CI 环境中可用；检查 worker_process.py _broadcast_model_weights_signal 中新增的 gloo group 是否与测试环境兼容
若 gloo 不可用，考虑在创建 group 前增加环境检测或回退机制

修复建议摘要: 确认 gloo backend 可用，或添加环境兼容性检查

关联变更: fastdeploy/worker/worker_process.py L312-L323（_broadcast_model_weights_signal）
链接: 查看日志

Extracted partial CE model tasks to run in CI. / run_ce_cases — 测试失败（置信度: 低）

Extracted partial CE model tasks to run in CI. / run_ce_cases

状态: ❌ 失败
错误类型: 测试失败
置信度: 低
根因摘要: event_loop 循环内重复调用 dist.new_group()，疑导致 CE 测试失败
分析器: ci_analyze_unittest_fastdeploy（日志获取失败，基于 PR diff 分析）

根因详情:
在 event_loop_normal 的 while 循环体内（约 L533），PR 新增了 group = dist.new_group(list(range(self.ranks)), backend="gloo") 语句。此语句在每次循环迭代中都会创建一个新的进程组，这是不正确的——new_group 应仅调用一次，否则可能导致进程组资源耗尽或分布式协调错误，进而导致 CE 测试在运行约 23 分钟后失败。

关键日志:

（日志获取失败，无法提取错误信息）
失败步骤: Run CI unittest

修复建议:

将 worker_process.py L533 附近 while 循环内的 group = dist.new_group(...) 移到 while 循环外部，仅创建一次
参考同文件 L468 处的正确写法（group 在循环外创建后传入）

修复建议摘要: 将 while 循环内的 new_group() 移至循环外，避免重复创建

关联变更: fastdeploy/worker/worker_process.py L529-L540（event_loop_normal while 循环）
链接: 查看日志

PaddlePaddle-bot · 2026-05-16T04:38:44Z

🤖 Paddle-CI-Agent | ci_status_monitor | 2026-05-16 12:37:37

CI报告基于以下代码生成（30分钟更新一次）:

PR commit: f60365d
Merge base: 9139986 (branch: develop)
查看完整 Diff
CI 详情

1 任务总览

所有 Required 任务全部通过 ✅，建议合并（3 个 Optional 任务失败，不阻塞合并）。

总执行（rerun次数）	总任务	✅ 通过	❌ 失败	⏳ 运行中	⏸️ 等待中	跳过
24(0)	24	21	3	0	0	0

2 任务状态汇总

2.1 Required任务 : 4/4 通过

必选任务阻塞合并，失败需优先处理。

状态	任务	耗时	根因	修复建议	日志	重跑
✅	其余 4 个必选任务通过	-	-	-	-	-

2.2 可选任务 — 17/20 通过

可选任务不阻塞合并，失败仅供参考。

状态	任务	耗时	日志	重跑
❌	`Run iluvatar Tests / run_iluvatar_cases`	10m36s	Job	-
❌	`Check PR Template`	10s	Job	-
❌	`CI_HPU`	1h6m	Job	-
✅	其余 17 个可选任务通过	-	-	-

3 失败详情（仅 required）

无 required 失败任务。

PaddlePaddle-bot

🤖 Paddle-CI-Agent | pr_review | 2026-05-18 12:15:33

📋 Review 摘要

PR 概述：将 RL 场景下 CPU tensor 信号广播从 NCCL/broadcast_object_list 路径切换到独立 gloo backend，消除无谓的 GPU kernel 调用和 D2H 同步拷贝

变更范围：fastdeploy/rl/dynamic_weight_manager.py、fastdeploy/worker/worker_process.py

影响面 Tag：[RL]

问题

级别	文件	概述
🟡 建议	`fastdeploy/rl/dynamic_weight_manager.py:357`	循环内重复调用私有 API `_get_group_map_by_name()`，建议缓存引用
📝 PR 规范	—	`## Modifications`、`## Usage or Command`、`## Accuracy Tests` 段落内容为空，Checklist 全部未勾选

📝 PR 规范检查

标题格式合规（含官方 [RL] Tag）。描述中 ## Modifications、## Usage or Command、## Accuracy Tests 三个段落仅有占位注释、无实际内容，Checklist 全部未勾选，需补全。

PR 描述建议（可直接复制，必须复刻 checklist §D2 模板的完整结构）：

## Motivation

paddle 建立通信组时默认的 backend 是 NCCL，此时 `paddle.distributed.broadcast` 不支持广播 CPU tensor，`paddle.distributed.broadcast_object_list` 仍会调用 GPU kernel 并引入 DtoH 同步拷贝。通过改用 gloo backend 建组，实现纯 CPU socket 广播，完全绕开 NCCL 和 GPU，nsys 上不会有任何 CUDA kernel 和 DtoH/HtoD 操作。

## Modifications

- `fastdeploy/worker/worker_process.py`：
  - `__init__` 中（当 `ranks > 1` 时）使用 `gloo` backend 创建独立通信组 `self.gloo_group`
  - `_broadcast_model_weights_signal` 改用 `paddle.full(..., device="cpu")` 创建 CPU tensor + `paddle.distributed.broadcast` 广播，替换原来的 `broadcast_object_list`
  - `event_loop_normal` 中两处调用从 `group=None` 改为 `group=self.gloo_group`
- `fastdeploy/rl/dynamic_weight_manager.py`：
  - `clear_parameters` 中在调用全局 `shutdown_process_group()` 前，先将不具备 `shutdown()` 方法的 ProcessGroupGloo 从 paddle 注册表中移除，避免 AttributeError

## Usage or Command

N/A

## Accuracy Tests

N/A — 此 PR 不涉及模型前向计算变更，仅修改进程间通信机制。

## Checklist

- [x] Add at least a tag in the PR title.
  - Tag list: [`[FDConfig]`,`[APIServer]`,`[Engine]`, `[Scheduler]`, `[PD Disaggregation]`, `[Executor]`, `[Graph Optimization]`, `[Speculative Decoding]`, `[RL]`, `[Models]`, `[Quantization]`, `[Loader]`, `[OP]`, `[KVCache]`, `[DataProcessor]`, `[BugFix]`, `[Docs]`, `[CI]`, `[Optimization]`, `[Feature]`, `[Benchmark]`, `[Others]`, `[XPU]`, `[HPU]`, `[GCU]`, `[DCU]`, `[Iluvatar]`, `[Metax]`]
  - You can add new tags based on the PR content, but the semantics must be clear.
- [ ] Format your code, run `pre-commit` before commit.
- [ ] Add unit tests. Please write the reason in this PR if no unit tests.
- [x] Provide accuracy results.
- [ ] If the current PR is submitting to the `release` branch, make sure the PR has been submitted to the `develop` branch, then cherry-pick it to the `release` branch with the `[Cherry-Pick]` PR tag.

总体评价

整体方案思路清晰正确，gloo backend 确实能完全绕开 NCCL 避免 GPU 同步；cleanup 代码中使用私有 API 且循环内重复调用建议小幅改进，无阻塞性问题。

PaddlePaddle-bot · 2026-05-18T04:19:54Z

+
+            for name, pg in list(_get_group_map_by_name().items()):
+                if pg.process_group is not None and not hasattr(pg.process_group, "shutdown"):
+                    _get_group_map_by_name().pop(name, None)


🟡 建议 在循环内重复调用 _get_group_map_by_name() 且使用私有 API

当前代码在循环内调用了两次 _get_group_map_by_name()——第一次 .items() 用于遍历，第二次 .pop() 用于删除。若 Paddle 内部实现返回同一个可变 dict 引用，则当前代码可正常工作；但若内部实现将来改为返回副本，则 pop 会静默失效，gloo group 无法被清除，shutdown_process_group() 仍会抛出 AttributeError。

同时，_get_group_map_by_name 是 Paddle 内部私有 API（以 _ 开头），可能在 Paddle 版本升级时无预警地变更或删除。

建议缓存引用，同时做防御性注释：

group_map = _get_group_map_by_name() # internal API, cache once for name, pg in list(group_map.items()): if pg.process_group is not None and not hasattr(pg.process_group, "shutdown"): group_map.pop(name, None)

* support cpu tensor broadcast * fix place * fix group * fix init * fix shutdown process group

support cpu tensor broadcast

14e1c4b

Sunny-bot1 had a problem deploying to Metax_ci May 15, 2026 10:20 — with GitHub Actions Error

fix place

f60365d

Sunny-bot1 temporarily deployed to Metax_ci May 15, 2026 10:35 — with GitHub Actions Inactive

This comment was marked as outdated.

Sign in to view

fix group

731f06d

Sunny-bot1 had a problem deploying to Metax_ci May 18, 2026 02:46 — with GitHub Actions Error

This comment was marked as outdated.

Sign in to view

fix init

d15c1e5

Sunny-bot1 temporarily deployed to Metax_ci May 18, 2026 03:04 — with GitHub Actions Inactive

This comment was marked as outdated.

Sign in to view

fix shutdown process group

c9f30fd

Sunny-bot1 had a problem deploying to Metax_ci May 18, 2026 04:03 — with GitHub Actions Failure

PaddlePaddle-bot mentioned this pull request May 18, 2026

[Cherry-Pick][RL] Support cpu tensor broadcast(#7833) #7840

Merged

5 tasks

PaddlePaddle-bot reviewed May 18, 2026

View reviewed changes

Deleter-D approved these changes May 18, 2026

View reviewed changes

liyonghua0910 reviewed May 18, 2026

View reviewed changes

Comment thread fastdeploy/rl/dynamic_weight_manager.py

Jiang-Jia-Jun pushed a commit that referenced this pull request May 18, 2026

[Cherry-Pick][RL] Support cpu tensor broadcast(#7833) (#7840)

9894b32

* support cpu tensor broadcast * fix place * fix group * fix init * fix shutdown process group

Jiang-Jia-Jun merged commit 6045f04 into PaddlePaddle:develop May 18, 2026
50 of 56 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RL] Support cpu tensor broadcast#7833

[RL] Support cpu tensor broadcast#7833
Jiang-Jia-Jun merged 5 commits into
PaddlePaddle:developfrom
Sunny-bot1:broadcast_cpu

Sunny-bot1 commented May 15, 2026 •

edited

Loading

Uh oh!

paddle-bot Bot commented May 15, 2026

Uh oh!

PaddlePaddle-bot commented May 15, 2026 •

edited

Loading

Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage

Uh oh!

This comment was marked as outdated.

Uh oh!

codecov-commenter commented May 15, 2026 •

edited

Loading

Uh oh!

PaddlePaddle-bot commented May 15, 2026

Run Stable Tests / stable_tests

Extracted partial CE model tasks to run in CI. / run_ce_cases

Uh oh!

PaddlePaddle-bot commented May 16, 2026

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

PaddlePaddle-bot left a comment

Uh oh!

PaddlePaddle-bot May 18, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

Sunny-bot1 commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Usage or Command

Accuracy Tests

Checklist

Uh oh!

paddle-bot Bot commented May 15, 2026

Uh oh!

PaddlePaddle-bot commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

1 任务总览

2 任务状态汇总

2.1 Required任务 : 9/10 通过

2.2 可选任务 — 28/32 通过

3 失败详情（仅 required）

Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage

Uh oh!

This comment was marked as outdated.

Uh oh!

codecov-commenter commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

PaddlePaddle-bot commented May 15, 2026

1 任务总览

2 任务状态汇总

2.1 Required任务 : 7/10 通过

2.2 可选任务 — 28/31 通过

3 失败详情（仅 required）

Run Stable Tests / stable_tests

Extracted partial CE model tasks to run in CI. / run_ce_cases

Uh oh!

PaddlePaddle-bot commented May 16, 2026

1 任务总览

2 任务状态汇总

2.1 Required任务 : 4/4 通过

2.2 可选任务 — 17/20 通过

3 失败详情（仅 required）

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

PaddlePaddle-bot left a comment

Choose a reason for hiding this comment

📋 Review 摘要

问题

📝 PR 规范检查

总体评价

Uh oh!

PaddlePaddle-bot May 18, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Sunny-bot1 commented May 15, 2026 •

edited

Loading

PaddlePaddle-bot commented May 15, 2026 •

edited

Loading

codecov-commenter commented May 15, 2026 •

edited

Loading