-
Notifications
You must be signed in to change notification settings - Fork 2.4k
[trainer] refactor: make main_ppo TaskRunner more modular #2885
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[trainer] refactor: make main_ppo TaskRunner more modular #2885
Conversation
- Add __init__ method to initialize self.role_worker_mapping - Extract add_actor_rollout_worker and add_critic_worker methods - Extract add_ref_policy_worker and add_reward_model_worker methods - Extract init_resource_pool_mgr method - Maintain same logic flow and variable dependencies Co-Authored-By: H <[email protected]>
- add_actor_rollout_worker no longer returns CriticWorker - add_critic_worker now imports CriticWorker based on config strategies - Proper separation of concerns between actor and critic worker setup Co-Authored-By: H <[email protected]>
…meters - Add self.mapping initialization in __init__ with documentation - Remove mapping parameter from add_reward_model_worker and add_ref_policy_worker - Update method calls to use self.mapping instead of passing mapping argument - Improve encapsulation by making mapping an instance variable Co-Authored-By: H <[email protected]>
- Fix F821 undefined name 'ref_policy' error in add_ref_policy_worker method - Change ref_policy to ref_policy_cls to match the parameter name - Resolves pre-commit (3.12) CI failure Co-Authored-By: H <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request refactors the TaskRunner class in verl/trainer/main_ppo.py to improve modularity by extracting worker setup logic into separate methods. The changes are well-structured and enhance code organization. However, I've identified a critical bug in the init_resource_pool_mgr method where self.mapping is incorrectly re-assigned, which would discard previously set mappings for reward and reference policy workers, leading to a KeyError during runtime. A fix is provided to address this issue.
…#2885) ### What does this PR do? - Added `__init__()` method to initialize `self.role_worker_mapping = {}` - Extracted worker setup logic into dedicated methods: - `add_actor_rollout_worker()` - handles strategy-specific worker imports and setup (lines 130-153) - `add_critic_worker()` - sets up critic worker role mapping (lines 170-176) - `init_resource_pool_mgr()` - creates resource pool specifications (lines 178-187) - `add_reward_model_worker()` - conditionally adds reward model workers (lines 195-203) - `add_ref_policy_worker()` - conditionally adds reference policy workers (lines 205-208) ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test relying on existing unit tests ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) --------- Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
…#2885) ### What does this PR do? - Added `__init__()` method to initialize `self.role_worker_mapping = {}` - Extracted worker setup logic into dedicated methods: - `add_actor_rollout_worker()` - handles strategy-specific worker imports and setup (lines 130-153) - `add_critic_worker()` - sets up critic worker role mapping (lines 170-176) - `init_resource_pool_mgr()` - creates resource pool specifications (lines 178-187) - `add_reward_model_worker()` - conditionally adds reward model workers (lines 195-203) - `add_ref_policy_worker()` - conditionally adds reference policy workers (lines 205-208) ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test relying on existing unit tests ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) --------- Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
…#2885) ### What does this PR do? - Added `__init__()` method to initialize `self.role_worker_mapping = {}` - Extracted worker setup logic into dedicated methods: - `add_actor_rollout_worker()` - handles strategy-specific worker imports and setup (lines 130-153) - `add_critic_worker()` - sets up critic worker role mapping (lines 170-176) - `init_resource_pool_mgr()` - creates resource pool specifications (lines 178-187) - `add_reward_model_worker()` - conditionally adds reward model workers (lines 195-203) - `add_ref_policy_worker()` - conditionally adds reference policy workers (lines 205-208) ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test relying on existing unit tests ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) --------- Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
…#2885) ### What does this PR do? - Added `__init__()` method to initialize `self.role_worker_mapping = {}` - Extracted worker setup logic into dedicated methods: - `add_actor_rollout_worker()` - handles strategy-specific worker imports and setup (lines 130-153) - `add_critic_worker()` - sets up critic worker role mapping (lines 170-176) - `init_resource_pool_mgr()` - creates resource pool specifications (lines 178-187) - `add_reward_model_worker()` - conditionally adds reward model workers (lines 195-203) - `add_ref_policy_worker()` - conditionally adds reference policy workers (lines 205-208) ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test relying on existing unit tests ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) --------- Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
What does this PR do?
__init__()method to initializeself.role_worker_mapping = {}add_actor_rollout_worker()- handles strategy-specific worker imports and setup (lines 130-153)add_critic_worker()- sets up critic worker role mapping (lines 170-176)init_resource_pool_mgr()- creates resource pool specifications (lines 178-187)add_reward_model_worker()- conditionally adds reward model workers (lines 195-203)add_ref_policy_worker()- conditionally adds reference policy workers (lines 205-208)Checklist Before Starting
[{modules}] {type}: {description}(This will be checked by the CI){modules}includefsdp,megatron,sglang,vllm,rollout,trainer,ci,training_utils,recipe,hardware,deployment,ray,worker,single_controller,misc,perf,model,algo,env,tool,ckpt,doc,data,like[megatron, fsdp, doc]{type}is infeat,fix,refactor,chore,test[BREAKING]to the beginning of the title.[BREAKING][fsdp, megatron] feat: dynamic batchingTest
relying on existing unit tests
Checklist Before Submitting
Important
Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.
pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=alwaysci-requestchannel in theverlSlack workspace. (If not accessible, please try the Feishu group (飞书群).)