[trainer] refactor: make main_ppo TaskRunner more modular #2885

eric-haibin-lin · 2025-08-03T18:37:50Z

What does this PR do?

Added __init__() method to initialize self.role_worker_mapping = {}
Extracted worker setup logic into dedicated methods:
- add_actor_rollout_worker() - handles strategy-specific worker imports and setup (lines 130-153)
- add_critic_worker() - sets up critic worker role mapping (lines 170-176)
- init_resource_pool_mgr() - creates resource pool specifications (lines 178-187)
- add_reward_model_worker() - conditionally adds reward model workers (lines 195-203)
- add_ref_policy_worker() - conditionally adds reference policy workers (lines 205-208)

Checklist Before Starting

Search for similar PRs. Paste at least one query link here: ...
Format the PR title as [{modules}] {type}: {description} (This will be checked by the CI)
- {modules} include fsdp, megatron, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data
- If this PR involves multiple modules, separate them with , like [megatron, fsdp, doc]
- {type} is in feat, fix, refactor, chore, test
- If this PR breaks any API (CLI arguments, config, function signature, etc.), add [BREAKING] to the beginning of the title.
- Example: [BREAKING][fsdp, megatron] feat: dynamic batching

Test

relying on existing unit tests

Checklist Before Submitting

Important

Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.

Read the Contribute Guide.
Apply pre-commit checks: pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always
Add / Update the documentation.
Add unit or end-to-end test(s) to the CI workflow to cover all the code. If not feasible, explain why: ...
Once your PR is ready for CI, send a message in the ci-request channel in the verl Slack workspace. (If not accessible, please try the Feishu group (飞书群).)

- Add __init__ method to initialize self.role_worker_mapping - Extract add_actor_rollout_worker and add_critic_worker methods - Extract add_ref_policy_worker and add_reward_model_worker methods - Extract init_resource_pool_mgr method - Maintain same logic flow and variable dependencies Co-Authored-By: H <[email protected]>

- add_actor_rollout_worker no longer returns CriticWorker - add_critic_worker now imports CriticWorker based on config strategies - Proper separation of concerns between actor and critic worker setup Co-Authored-By: H <[email protected]>

…meters - Add self.mapping initialization in __init__ with documentation - Remove mapping parameter from add_reward_model_worker and add_ref_policy_worker - Update method calls to use self.mapping instead of passing mapping argument - Improve encapsulation by making mapping an instance variable Co-Authored-By: H <[email protected]>

- Fix F821 undefined name 'ref_policy' error in add_ref_policy_worker method - Change ref_policy to ref_policy_cls to match the parameter name - Resolves pre-commit (3.12) CI failure Co-Authored-By: H <[email protected]>

gemini-code-assist

Code Review

This pull request refactors the TaskRunner class in verl/trainer/main_ppo.py to improve modularity by extracting worker setup logic into separate methods. The changes are well-structured and enhance code organization. However, I've identified a critical bug in the init_resource_pool_mgr method where self.mapping is incorrectly re-assigned, which would discard previously set mappings for reward and reference policy workers, leading to a KeyError during runtime. A fix is provided to address this issue.

verl/trainer/main_ppo.py

…#2885) ### What does this PR do? - Added `__init__()` method to initialize `self.role_worker_mapping = {}` - Extracted worker setup logic into dedicated methods: - `add_actor_rollout_worker()` - handles strategy-specific worker imports and setup (lines 130-153) - `add_critic_worker()` - sets up critic worker role mapping (lines 170-176) - `init_resource_pool_mgr()` - creates resource pool specifications (lines 178-187) - `add_reward_model_worker()` - conditionally adds reward model workers (lines 195-203) - `add_ref_policy_worker()` - conditionally adds reference policy workers (lines 205-208) ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test relying on existing unit tests ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) --------- Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>

devin-ai-integration bot and others added 5 commits August 3, 2025 18:03

Update main_ppo.py

d9b1254

fix: Correct undefined variable ref_policy to ref_policy_cls

fa9ad12

- Fix F821 undefined name 'ref_policy' error in add_ref_policy_worker method - Change ref_policy to ref_policy_cls to match the parameter name - Resolves pre-commit (3.12) CI failure Co-Authored-By: H <[email protected]>

eric-haibin-lin requested review from PeterSH6, tongyx361 and vermouth1992 as code owners August 3, 2025 18:37

gemini-code-assist bot reviewed Aug 3, 2025

View reviewed changes

verl/trainer/main_ppo.py Outdated Show resolved Hide resolved

eric-haibin-lin added 3 commits August 5, 2025 10:18

Update main_ppo.py

65fb902

Merge branch 'main' into devin/1754244044-refactor-taskrunner-methods

fcc6c87

fix lint

da3ac9b

ccclyu approved these changes Aug 5, 2025

View reviewed changes

vermouth1992 approved these changes Aug 8, 2025

View reviewed changes

vermouth1992 merged commit 01b4a29 into volcengine:main Aug 8, 2025
35 checks passed

eric-haibin-lin deleted the devin/1754244044-refactor-taskrunner-methods branch August 8, 2025 16:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[trainer] refactor: make main_ppo TaskRunner more modular #2885

[trainer] refactor: make main_ppo TaskRunner more modular #2885

Uh oh!

eric-haibin-lin commented Aug 3, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[trainer] refactor: make main_ppo TaskRunner more modular #2885

[trainer] refactor: make main_ppo TaskRunner more modular #2885

Uh oh!

Conversation

eric-haibin-lin commented Aug 3, 2025

What does this PR do?

Checklist Before Starting

Test

Checklist Before Submitting

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants