Skip to content

Commit f854bf6

Browse files
tongtong0613oseyosey
authored andcommitted
[perf] feat: add optional role selection in discrete mode for NPU Profiler (volcengine#2750)
### What does this PR do? Currently, whether in `end-to-end` mode or `discrete` mode, all roles are fully collected. As the sequence length continues to increase, the volume of collected data becomes large, leading to slow parsing. Therefore, we introduce a new feature in the NPU Profiler that allows optional role selection in `discrete` mode, enabling quick collection of specific roles. We have added a new roles parameter in `npu_profile.yaml` to specify the roles to be collected. The currently supported options are: `all`, `rollout_generate`, `actor_compute_log_prob`, `actor_update` and `ref_compute_log_prob`. Setting roles to `["all"]` means all roles will be collected. Other options can be freely combined, for example: `["actor_update", "ref_compute_log_prob"]` ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
1 parent dd0b934 commit f854bf6

File tree

9 files changed

+99
-13
lines changed

9 files changed

+99
-13
lines changed

docs/ascend_tutorial/ascend_profiling.rst

Lines changed: 26 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
在昇腾设备上基于FSDP后端进行数据采集
22
====================================
33

4-
Last updated: 07/14/2025.
4+
Last updated: 07/24/2025.
55

66
这是一份在昇腾设备上基于FSDP后端使用GRPO或DAPO算法进行数据采集的教程。
77

@@ -32,6 +32,14 @@ Last updated: 07/14/2025.
3232
通过 npu_profile.yaml 中的参数控制具体采集行为:
3333

3434
- save_path:采集数据的存放路径
35+
- roles: 采集的角色,下列为可选项
36+
37+
- rollout_generate:采集rollout的generate_sequences阶段
38+
- actor_compute_log_prob:采集actor的compute_log_prob阶段
39+
- actor_update:采集actor的update_actor阶段
40+
- ref_compute_log_prob:采集ref的compute_ref_log_prob阶段
41+
- all: 采集以上所有阶段
42+
3543
- level:采集等级,可选项为level_none、level0、level1和level2
3644

3745
- level_none:不采集所有Level层级控制的数据,即关闭profiler_level
@@ -86,6 +94,23 @@ Last updated: 07/14/2025.
8694
ranks: [0, 1]
8795
8896
97+
离散模式采集actor
98+
~~~~~~~~~~~~~~~~~~
99+
100+
.. code:: yaml
101+
102+
trainer:
103+
profile_steps: [1, 2, 5]
104+
npu_profile:
105+
options:
106+
roles: ["actor_compute_log_prob", "actor_update"]
107+
actor_rollout_ref:
108+
profiler:
109+
discrete: True
110+
all_ranks: False
111+
ranks: [0, 1]
112+
113+
89114
可视化
90115
------
91116

docs/ascend_tutorial/ascend_profiling_en.rst

Lines changed: 29 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
Data collection based on FSDP (Fully Sharded Data Parallel) backend on Ascend devices(NPU)
22
==========================================================================================
33

4-
Last updated: 07/14/2025.
4+
Last updated: 07/24/2025.
55

66
This is a tutorial for data collection using the GRPO or DAPO algorithm
77
based on FSDP on Ascend devices.
@@ -35,6 +35,17 @@ and steps.
3535
Use parameters in npu_profile.yaml to control collection behavior:
3636

3737
- save_path: Storage path for collected data.
38+
- roles: Roles to collect. The following options are available
39+
40+
- rollout_generate: Collect the `generate_sequences` phase
41+
of rollout worker.
42+
- actor_compute_log_prob: Collect the `compute_log_prob` phase
43+
of the actor worker.
44+
- actor_update: Collect the `update_actor` phase of the actor worker.
45+
- ref_compute_log_prob: Collect the `compute_ref_log_prob` phase
46+
of the ref worker.
47+
- all: Collect all of the above phases.
48+
3849
- level: Collection level—options are level_none, level0, level1, and
3950
level2
4051

@@ -94,6 +105,23 @@ Discrete Mode Collection
94105
ranks: [0, 1]
95106
96107
108+
Enable actor collection in discrete mode
109+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
110+
111+
.. code:: yaml
112+
113+
trainer:
114+
profile_steps: [1, 2, 5]
115+
npu_profile:
116+
options:
117+
roles: ["actor_compute_log_prob", "actor_update"]
118+
actor_rollout_ref:
119+
profiler:
120+
discrete: True
121+
all_ranks: False
122+
ranks: [0, 1]
123+
124+
97125
Visualization
98126
-------------
99127

examples/grpo_trainer/run_qwen2_5_7b_grpo_discrete_prof_npu.sh

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@ WITH_CPU=True
1616
WITH_MODULE=False
1717
WITH_STACK=False
1818
ANALYSIS=True
19+
ROLES=["all"]
1920

2021
python3 -m verl.trainer.main_ppo \
2122
algorithm.adv_estimator=grpo \
@@ -59,6 +60,7 @@ python3 -m verl.trainer.main_ppo \
5960
trainer.npu_profile.options.with_module=$WITH_MODULE \
6061
trainer.npu_profile.options.with_stack=$WITH_STACK \
6162
trainer.npu_profile.options.analysis=$ANALYSIS \
63+
trainer.npu_profile.options.roles=$ROLES \
6264
trainer.critic_warmup=0 \
6365
trainer.logger=console \
6466
trainer.project_name='verl_grpo_example_gsm8k' \

tests/trainer/config/legacy_ppo_megatron_trainer.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -450,6 +450,7 @@ trainer:
450450
npu_profile:
451451
options:
452452
save_path: ./profiler_data
453+
roles: ["all"]
453454
level: level1
454455
with_memory: False
455456
record_shapes: False

tests/trainer/config/legacy_ppo_trainer.yaml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -995,6 +995,11 @@ trainer:
995995
# Storage path of collected data.
996996
save_path: ./profiler_data
997997

998+
# The roles that will be profiled. Only takes effect in discrete mode.
999+
# optional values: all, rollout_generate, actor_compute_log_prob, actor_update and ref_compute_log_prob.
1000+
# "all" means all roles will be profiled.
1001+
roles: ["all"]
1002+
9981003
# Collection level, optional values: level_none, level0, level1, level2.
9991004
level: level1
10001005

verl/trainer/config/_generated_ppo_megatron_trainer.yaml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -206,6 +206,8 @@ trainer:
206206
npu_profile:
207207
options:
208208
save_path: ./profiler_data
209+
roles:
210+
- all
209211
level: level1
210212
with_memory: false
211213
record_shapes: false

verl/trainer/config/_generated_ppo_trainer.yaml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -174,6 +174,8 @@ trainer:
174174
npu_profile:
175175
options:
176176
save_path: ./profiler_data
177+
roles:
178+
- all
177179
level: level1
178180
with_memory: false
179181
record_shapes: false

verl/trainer/config/npu_profile/npu_profile.yaml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,11 @@ options:
44
# Storage path of collected data.
55
save_path: ./profiler_data
66

7+
# The roles that will be profiled. Only takes effect in discrete mode.
8+
# optional values: all, rollout_generate, actor_compute_log_prob, actor_update and ref_compute_log_prob.
9+
# "all" means all roles will be profiled.
10+
roles: ["all"]
11+
712
# Collection level, optional values: level_none, level0, level1, level2.
813
level: level1
914

verl/utils/profiler/mstx_profile.py

Lines changed: 27 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -202,20 +202,36 @@ def decorator(func):
202202
@functools.wraps(func)
203203
def wrapper(self, *args, **kwargs):
204204
profile_name = message or func.__name__
205-
206-
if self.profiler.this_step and self.profile_option is not None:
207-
if self.profiler.discrete:
208-
profile_npu = get_npu_profiler(option=self.profile_option, role=role)
209-
profile_npu.start()
210-
mark_range = mark_start_range(message=profile_name)
205+
profile_this_role = True
206+
discrete_mode = self.profiler.discrete
207+
profile_enable = self.profiler.this_step and self.profile_option is not None
208+
209+
if not profile_enable:
210+
return func(self, *args, **kwargs)
211+
212+
if profile_enable and role is not None:
213+
target_roles = self.profile_option.get("roles", [])
214+
profile_this_role = "all" in target_roles or role in target_roles
215+
216+
if profile_enable:
217+
if not discrete_mode:
218+
mark_range = mark_start_range(message=profile_name)
219+
else:
220+
if profile_this_role:
221+
profile_npu = get_npu_profiler(option=self.profile_option, role=role)
222+
profile_npu.start()
223+
mark_range = mark_start_range(message=profile_name)
211224

212225
result = func(self, *args, **kwargs)
213226

214-
if self.profiler.this_step and self.profile_option is not None:
215-
mark_end_range(mark_range)
216-
if self.profiler.discrete:
217-
profile_npu.step()
218-
profile_npu.stop()
227+
if profile_enable:
228+
if not discrete_mode:
229+
mark_end_range(mark_range)
230+
else:
231+
if profile_this_role:
232+
mark_end_range(mark_range)
233+
profile_npu.step()
234+
profile_npu.stop()
219235

220236
return result
221237

0 commit comments

Comments
 (0)