Skip to content

Commit 28f6e4a

Browse files
authored
[doc]fix: optimize ascend docs (#3063)
### What does this PR do? - 修复ascend_quick_start.rst中一些依赖软件的版本匹配错误。 - 支持现状表格中增加对actor.strategy和rollout.name的说明。 - 重命名ascend_profiling_en.rst和ascend_profiling_zh.rst,使文档标题看起来更美观些。 <img width="402" height="103" alt="image" src="https://github.com/user-attachments/assets/8f9ece22-315e-4f80-8157-04838f7467a3" /> ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
1 parent bd756c1 commit 28f6e4a

File tree

3 files changed

+41
-41
lines changed

3 files changed

+41
-41
lines changed

docs/ascend_tutorial/ascend_profiling_en.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
Data collection based on FSDP (Fully Sharded Data Parallel) backend on Ascend devices(NPU)
1+
Data collection based on FSDP backend on Ascend devices(en)
22
==========================================================================================
33

44
Last updated: 07/24/2025.

docs/ascend_tutorial/ascend_profiling.rst renamed to docs/ascend_tutorial/ascend_profiling_zh.rst

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,8 @@
1-
在昇腾设备上基于FSDP后端进行数据采集
1+
Data collection based on FSDP backend on Ascend devices(zh)
22
====================================
33

4+
在昇腾设备上基于FSDP后端进行数据采集
5+
46
Last updated: 07/24/2025.
57

68
这是一份在昇腾设备上基于FSDP后端使用GRPO或DAPO算法进行数据采集的教程。

docs/ascend_tutorial/ascend_quick_start.rst

Lines changed: 37 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
verl x Ascend
22
===================================
33

4-
Last updated: 06/17/2025.
4+
Last updated: 08/15/2025.
55

66
我们在 verl 上增加对华为昇腾设备的支持。
77

@@ -28,9 +28,10 @@ Atlas 900 A2 PODc
2828
+-----------+-------------+
2929
| torch | == 2.5.1 |
3030
+-----------+-------------+
31-
| torch_npu | == 2.5.1.RC1|
31+
| torch_npu | == 2.5.1 |
3232
+-----------+-------------+
3333

34+
基础环境准备请参照这份 `文档 <https://gitee.com/ascend/pytorch>`_ 。
3435

3536
vllm & vllm-ascend
3637
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -80,14 +81,11 @@ vllm & vllm-ascend
8081
+--------------+---------------+
8182
| liger-kernel | not supported |
8283
+--------------+---------------+
83-
| tensordict | 0.8.3 (ARM) |
84-
+--------------+---------------+
8584

86-
1. 支持通过 transformers 使能 --flash_attention_2, transformers 需大于等于 4.52.0版本
85+
1. 支持通过 transformers 使能 --flash_attention_2, transformers 需等于 4.52.4版本
8786
2. 不支持通过 flash_attn 使能 flash attention 加速。
8887
3. 不支持 liger-kernel 使能。
89-
4. 针对 ARM 服务器,tensordict 要求 0.8.3,可在依赖安装完成后再手动安装 tensordict。
90-
5. 针对 x86 服务器,需要安装 cpu 版本的 torchvision。
88+
4. 针对 x86 服务器,需要安装 cpu 版本的 torchvision。
9189

9290
.. code-block:: bash
9391
@@ -153,50 +151,50 @@ vllm & vllm-ascend
153151
trainer.total_epochs=1 \
154152
trainer.device=npu $@
155153
156-
MindSpeed 训练后端
154+
(可选) 设置MindSpeed训练后端指导
157155
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
158-
1. 参考 `MindSpeed Readme <https://gitee.com/ascend/MindSpeed>`_ 说明安装 MindSpeed 加速库。
156+
1. 参考 `MindSpeed README <https://gitee.com/ascend/MindSpeed>`_ 说明安装 MindSpeed 加速库。
159157

160-
2. 使能 Verl worker 模型 ``strategy`` 配置为 ``megatron`` ,例如 ``actor_rollout_ref.actor.strategy=megatron``。
158+
2. 使能 verl worker 模型 ``strategy`` 配置为 ``megatron`` ,例如 ``actor_rollout_ref.actor.strategy=megatron``。
161159

162160
3. MindSpeed 自定义入参可通过 ``override_transformer_config`` 参数传入,例如对 actor 模型开启 FA 特性可使用 ``+actor_rollout_ref.actor.megatron.override_transformer_config.use_flash_attn=True``。
163161

164-
4. 更多特性信息可参考 `MindSpeed Verl 文档 <https://gitee.com/ascend/MindSpeed/blob/master/docs/user-guide/verl.md>`_ 。
162+
4. 更多特性信息可参考 `MindSpeed+verl 文档 <https://gitee.com/ascend/MindSpeed/blob/master/docs/user-guide/verl.md>`_ 。
165163

166164
支持现状
167165
-----------------------------------
168166

169167
**表1** RL类算法
170168

171-
+-----------+-------------------------+-------------+-------------------+----------------------+
172-
| algorithm | model | rewards mae | throughput ratio | hardware |
173-
+-----------+-------------------------+-------------+-------------------+----------------------+
174-
| GRPO | Qwen2.5-7B-instruct | 0.38% | 0.588 | Atlas 200T A2 Box16 |
175-
+-----------+-------------------------+-------------+-------------------+----------------------+
176-
| GRPO | Qwen2.5-32B-instruct | 0.30% | 0.685 | Atlas 200T A2 Box16 |
177-
+-----------+-------------------------+-------------+-------------------+----------------------+
178-
| GRPO | Qwen2.5-VL-3B-instruct | 3.14% | 0.470 | Atlas 200T A2 Box16 |
179-
+-----------+-------------------------+-------------+-------------------+----------------------+
180-
| GRPO | Qwen2.5-VL-7B-instruct | 3.30% | 0.380 | Atlas 200T A2 Box16 |
181-
+-----------+-------------------------+-------------+-------------------+----------------------+
182-
| GRPO | Qwen2.5-VL-32B-instruct | 0.79% | 0.568 | Atlas 200T A2 Box16 |
183-
+-----------+-------------------------+-------------+-------------------+----------------------+
184-
| DAPO | Qwen2.5-7B-instruct | 3.83% | pending | Atlas 200T A2 Box16 |
185-
+-----------+-------------------------+-------------+-------------------+----------------------+
186-
| DAPO | Qwen3-8B-base | 5.3% | pending | Atlas 200T A2 Box16 |
187-
+-----------+-------------------------+-------------+-------------------+----------------------+
188-
| DAPO | Qwen3-14B-base | 5.9% | pending | Atlas 200T A2 Box16 |
189-
+-----------+-------------------------+-------------+-------------------+----------------------+
169+
+-----------+-------------------------+-------------+-------------------+-------------------+-------------------+--------------------------+
170+
| algorithm | model | rewards mae | throughput ratio | actor.strategy | rollout.name | hardware |
171+
+-----------+-------------------------+-------------+-------------------+-------------------+-------------------+--------------------------|
172+
| GRPO | Qwen2.5-7B-instruct | 0.38% | 0.588 | FSDP | vllm-ascend | Atlas 200T A2 Box16 |
173+
+-----------+-------------------------+-------------+-------------------+-------------------+-------------------+--------------------------|
174+
| GRPO | Qwen2.5-32B-instruct | 0.30% | 0.685 | FSDP | vllm-ascend | Atlas 200T A2 Box16 |
175+
+-----------+-------------------------+-------------+-------------------+-------------------+-------------------+--------------------------|
176+
| GRPO | Qwen2.5-VL-3B-instruct | 3.14% | 0.470 | FSDP | vllm-ascend | Atlas 200T A2 Box16 |
177+
+-----------+-------------------------+-------------+-------------------+-------------------+-------------------+--------------------------|
178+
| GRPO | Qwen2.5-VL-7B-instruct | 3.30% | 0.380 | FSDP | vllm-ascend | Atlas 200T A2 Box16 |
179+
+-----------+-------------------------+-------------+-------------------+-------------------+-------------------+--------------------------|
180+
| GRPO | Qwen2.5-VL-32B-instruct | 0.79% | 0.568 | FSDP | vllm-ascend | Atlas 200T A2 Box16 |
181+
+-----------+-------------------------+-------------+-------------------+-------------------+-------------------+--------------------------|
182+
| DAPO | Qwen2.5-7B-instruct | 3.83% | pending | FSDP | vllm-ascend | Atlas 200T A2 Box16 |
183+
+-----------+-------------------------+-------------+-------------------+-------------------+-------------------+--------------------------+
184+
| DAPO | Qwen3-8B-base | 5.3% | pending | FSDP | vllm-ascend | Atlas 200T A2 Box16 |
185+
+-----------+-------------------------+-------------+-------------------+-------------------+-------------------+--------------------------+
186+
| DAPO | Qwen3-14B-base | 5.9% | pending | FSDP | vllm-ascend | Atlas 200T A2 Box16 |
187+
+-----------+-------------------------+-------------+-------------------+-------------------+-------------------+--------------------------+
190188

191189
**表2** SFT类算法
192190

193-
+-----------+-------------------------+----------------+-------------------+----------------------+
194-
| algorithm | model | loss value mae | total time ratio | hardware |
195-
+-----------+-------------------------+----------------+-------------------+----------------------+
196-
| SFT-PEFT | Qwen3-8B | 0.09% | 0.618 | Atlas 900 A2 PODc |
197-
+-----------+-------------------------+----------------+-------------------+----------------------+
198-
| ReTool-SFT| Qwen2.5-7B-instruct | 0.08% | 0.775 | Atlas 900 A2 PODc |
199-
+-----------+-------------------------+----------------+-------------------+----------------------+
191+
+-----------+-------------------------+----------------+-------------------+-------------------+----------------------+
192+
| algorithm | model | train loss mae | total time ratio | actor.strategy | hardware |
193+
+-----------+-------------------------+----------------+-------------------+-------------------+----------------------+
194+
| SFT-PEFT | Qwen3-8B | 0.09% | 0.618 | FSDP | Atlas 900 A2 PODc |
195+
+-----------+-------------------------+----------------+-------------------+-------------------+----------------------+
196+
| ReTool-SFT| Qwen2.5-7B-instruct | 0.08% | 0.775 | FSDP | Atlas 900 A2 PODc |
197+
+-----------+-------------------------+----------------+-------------------+-------------------+----------------------+
200198

201199
精度对比说明
202200
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -217,10 +215,10 @@ Ascend npu 和 A100 分别取日志中前4个 step 的 "perf/throughput" 做平
217215
计划
218216
-----------------------------------
219217

220-
查看 `roadmap <https://github.com/volcengine/verl/discussions/900>`_ 获取更多特性的支持进度。
218+
查看 `roadmap <https://github.com/volcengine/verl/discussions/2171>`_ 获取更多特性的支持进度。
221219

222220

223221

224222
声明
225223
-----------------------------------
226-
verl中提供的ascend支持代码皆为参考样例,商业使用请通过官方正式途径沟通,谢谢。
224+
verl中提供的ascend支持代码皆为参考样例,如在生产环境中使用请通过官方正式途径沟通,谢谢。

0 commit comments

Comments
 (0)