[perf] feat: add optional role selection in discrete mode for NPU Profiler (volcengine#2750)

tongtong0613 · oseyosey · commit f854bf605459 · 2025-07-27T17:06:48.000-07:00
### What does this PR do? Currently, whether in `end-to-end` mode or `discrete` mode, all roles are fully collected. As the sequence length continues to increase, the volume of collected data becomes large, leading to slow parsing. Therefore, we introduce a new feature in the NPU Profiler that allows optional role selection in `discrete` mode, enabling quick collection of specific roles. We have added a new roles parameter in `npu_profile.yaml` to specify the roles to be collected. The currently supported options are: `all`, `rollout_generate`, `actor_compute_log_prob`, `actor_update` and `ref_compute_log_prob`. Setting roles to `["all"]` means all roles will be collected. Other options can be freely combined, for example: `["actor_update", "ref_compute_log_prob"]` ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
diff --git a/docs/ascend_tutorial/ascend_profiling.rst b/docs/ascend_tutorial/ascend_profiling.rst
@@ -1,7 +1,7 @@
 在昇腾设备上基于FSDP后端进行数据采集
 ====================================
 
-Last updated: 07/14/2025.
+Last updated: 07/24/2025.
 
 这是一份在昇腾设备上基于FSDP后端使用GRPO或DAPO算法进行数据采集的教程。
 
@@ -32,6 +32,14 @@ Last updated: 07/14/2025.
 通过 npu_profile.yaml 中的参数控制具体采集行为：
 
 -  save_path：采集数据的存放路径
+-  roles: 采集的角色，下列为可选项
+
+   -  rollout_generate：采集rollout的generate_sequences阶段
+   -  actor_compute_log_prob：采集actor的compute_log_prob阶段
+   -  actor_update：采集actor的update_actor阶段
+   -  ref_compute_log_prob：采集ref的compute_ref_log_prob阶段
+   -  all： 采集以上所有阶段
+
 -  level：采集等级，可选项为level_none、level0、level1和level2
 
    -  level_none：不采集所有Level层级控制的数据，即关闭profiler_level
@@ -86,6 +94,23 @@ Last updated: 07/14/2025.
                 ranks: [0, 1]
 
 
+离散模式采集actor
+~~~~~~~~~~~~~~~~~~
+
+.. code:: yaml
+
+       trainer:
+           profile_steps: [1, 2, 5]
+           npu_profile:
+                options:
+                    roles: ["actor_compute_log_prob", "actor_update"]
+       actor_rollout_ref:
+            profiler:
+                discrete: True
+                all_ranks: False
+                ranks: [0, 1]
+
+
 可视化
 ------
 
diff --git a/docs/ascend_tutorial/ascend_profiling_en.rst b/docs/ascend_tutorial/ascend_profiling_en.rst
@@ -1,7 +1,7 @@
 Data collection based on FSDP (Fully Sharded Data Parallel) backend on Ascend devices(NPU)
 ==========================================================================================
 
-Last updated: 07/14/2025.
+Last updated: 07/24/2025.
 
 This is a tutorial for data collection using the GRPO or DAPO algorithm
 based on FSDP on Ascend devices.
@@ -35,6 +35,17 @@ and steps.
 Use parameters in npu_profile.yaml to control collection behavior:
 
 -  save_path: Storage path for collected data.
+-  roles: Roles to collect. The following options are available
+
+   -  rollout_generate: Collect the `generate_sequences` phase 
+      of rollout worker.
+   -  actor_compute_log_prob: Collect the `compute_log_prob` phase 
+      of the actor worker.
+   -  actor_update:  Collect the `update_actor` phase of the actor worker.
+   -  ref_compute_log_prob: Collect the `compute_ref_log_prob` phase 
+      of the ref worker.
+   -  all: Collect all of the above phases.
+
 -  level: Collection level—options are level_none, level0, level1, and
    level2
 
@@ -94,6 +105,23 @@ Discrete Mode Collection
                 ranks: [0, 1]
 
 
+Enable actor collection in discrete mode
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. code:: yaml
+
+       trainer:
+           profile_steps: [1, 2, 5]
+           npu_profile:
+                options:
+                    roles: ["actor_compute_log_prob", "actor_update"]
+       actor_rollout_ref:
+            profiler:
+                discrete: True
+                all_ranks: False
+                ranks: [0, 1]
+
+
 Visualization
 -------------
 
diff --git a/examples/grpo_trainer/run_qwen2_5_7b_grpo_discrete_prof_npu.sh b/examples/grpo_trainer/run_qwen2_5_7b_grpo_discrete_prof_npu.sh
@@ -16,6 +16,7 @@ WITH_CPU=True
 WITH_MODULE=False
 WITH_STACK=False
 ANALYSIS=True
+ROLES=["all"]
 
 python3 -m verl.trainer.main_ppo \
     algorithm.adv_estimator=grpo \
@@ -59,6 +60,7 @@ python3 -m verl.trainer.main_ppo \
     trainer.npu_profile.options.with_module=$WITH_MODULE \
     trainer.npu_profile.options.with_stack=$WITH_STACK \
     trainer.npu_profile.options.analysis=$ANALYSIS \
+    trainer.npu_profile.options.roles=$ROLES \
     trainer.critic_warmup=0 \
     trainer.logger=console \
     trainer.project_name='verl_grpo_example_gsm8k' \
diff --git a/tests/trainer/config/legacy_ppo_megatron_trainer.yaml b/tests/trainer/config/legacy_ppo_megatron_trainer.yaml
@@ -450,6 +450,7 @@ trainer:
   npu_profile:
     options:
       save_path: ./profiler_data
+      roles: ["all"]
       level: level1
       with_memory: False
       record_shapes: False
diff --git a/tests/trainer/config/legacy_ppo_trainer.yaml b/tests/trainer/config/legacy_ppo_trainer.yaml
@@ -995,6 +995,11 @@ trainer:
       # Storage path of collected data.
       save_path: ./profiler_data
 
+      # The roles that will be profiled. Only takes effect in discrete mode.
+      # optional values: all, rollout_generate, actor_compute_log_prob, actor_update and ref_compute_log_prob.
+      # "all" means all roles will be profiled.
+      roles: ["all"]
+
       # Collection level, optional values: level_none, level0, level1, level2.
       level: level1
 
diff --git a/verl/trainer/config/_generated_ppo_megatron_trainer.yaml b/verl/trainer/config/_generated_ppo_megatron_trainer.yaml
@@ -206,6 +206,8 @@ trainer:
   npu_profile:
     options:
       save_path: ./profiler_data
+      roles:
+      - all
       level: level1
       with_memory: false
       record_shapes: false
diff --git a/verl/trainer/config/_generated_ppo_trainer.yaml b/verl/trainer/config/_generated_ppo_trainer.yaml
@@ -174,6 +174,8 @@ trainer:
   npu_profile:
     options:
       save_path: ./profiler_data
+      roles:
+      - all
       level: level1
       with_memory: false
       record_shapes: false
diff --git a/verl/trainer/config/npu_profile/npu_profile.yaml b/verl/trainer/config/npu_profile/npu_profile.yaml
@@ -4,6 +4,11 @@ options:
   # Storage path of collected data.
   save_path: ./profiler_data
 
+  # The roles that will be profiled. Only takes effect in discrete mode.
+  # optional values: all, rollout_generate, actor_compute_log_prob, actor_update and ref_compute_log_prob.
+  # "all" means all roles will be profiled.
+  roles: ["all"]
+
   # Collection level, optional values: level_none, level0, level1, level2.
   level: level1
 
diff --git a/verl/utils/profiler/mstx_profile.py b/verl/utils/profiler/mstx_profile.py
@@ -202,20 +202,36 @@ def decorator(func):
             @functools.wraps(func)
             def wrapper(self, *args, **kwargs):
                 profile_name = message or func.__name__
-
-                if self.profiler.this_step and self.profile_option is not None:
-                    if self.profiler.discrete:
-                        profile_npu = get_npu_profiler(option=self.profile_option, role=role)
-                        profile_npu.start()
-                    mark_range = mark_start_range(message=profile_name)
+                profile_this_role = True
+                discrete_mode = self.profiler.discrete
+                profile_enable = self.profiler.this_step and self.profile_option is not None
+
+                if not profile_enable:
+                    return func(self, *args, **kwargs)
+
+                if profile_enable and role is not None:
+                    target_roles = self.profile_option.get("roles", [])
+                    profile_this_role = "all" in target_roles or role in target_roles
+
+                if profile_enable:
+                    if not discrete_mode:
+                        mark_range = mark_start_range(message=profile_name)
+                    else:
+                        if profile_this_role:
+                            profile_npu = get_npu_profiler(option=self.profile_option, role=role)
+                            profile_npu.start()
+                            mark_range = mark_start_range(message=profile_name)
 
                 result = func(self, *args, **kwargs)
 
-                if self.profiler.this_step and self.profile_option is not None:
-                    mark_end_range(mark_range)
-                    if self.profiler.discrete:
-                        profile_npu.step()
-                        profile_npu.stop()
+                if profile_enable:
+                    if not discrete_mode:
+                        mark_end_range(mark_range)
+                    else:
+                        if profile_this_role:
+                            mark_end_range(mark_range)
+                            profile_npu.step()
+                            profile_npu.stop()
 
                 return result