Skip to content

Conversation

@frrad
Copy link
Contributor

@frrad frrad commented Jul 23, 2025

What does this PR do?

currently the vanilla policy loss mode is special cased. this moves vanilla onto the shared interface and stops speical-casing it.

Checklist Before Submitting

Important

Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the policy loss computation by removing the special case for the 'vanilla' mode. The vanilla policy loss is now registered and accessed through the same shared interface as other loss functions, which improves modularity and reduces code duplication in the actor workers.

My review focuses on the new shared interface. I've identified a couple of high-severity issues related to incorrect optional type hinting and a redundant assertion in the new compute_policy_loss_vanilla function. Addressing these will improve code clarity, maintainability, and static analysis correctness.

Comment on lines 723 to 730
def compute_policy_loss_vanilla(
old_log_prob: torch.Tensor,
log_prob: torch.Tensor,
advantages: torch.Tensor,
response_mask: torch.Tensor,
loss_agg_mode: str = "token-mean",
):
config: Optional[AlgoConfig] = None,
) -> tuple[torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The type hint for the config parameter is Optional[AlgoConfig], but this appears to be incorrect. The function accesses attributes like config.clip_ratio, which are defined in the AlgoConfig class. The config object passed from the callers in dp_actor.py and megatron_actor.py is the actor's configuration, which contains the algorithm's configuration.

Additionally, the parameter is marked as Optional with a default of None, but it's treated as mandatory within the function (see the assertion on line 750). Since callers always provide this configuration, it's better to make it a required, non-optional parameter.

Suggested change
def compute_policy_loss_vanilla(
old_log_prob: torch.Tensor,
log_prob: torch.Tensor,
advantages: torch.Tensor,
response_mask: torch.Tensor,
loss_agg_mode: str = "token-mean",
):
config: Optional[AlgoConfig] = None,
) -> tuple[torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor]:
def compute_policy_loss_vanilla(
old_log_prob: torch.Tensor,
log_prob: torch.Tensor,
advantages: torch.Tensor,
response_mask: torch.Tensor,
loss_agg_mode: str = "token-mean",
config: AlgoConfig,
) -> tuple[torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor]:

Aggregation mode for `agg_loss`. Defaults to "token-mean".
"""

assert config is not None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This assertion is redundant if the config parameter in the function signature is made non-optional, as suggested in a separate comment. Since callers always provide the configuration, making the parameter mandatory in the signature is a cleaner approach and removes the need for this runtime check.

@tongyx361 tongyx361 changed the title [algo: refactor]: don't special-case compute_policy_loss [algo] refactor: don't special-case compute_policy_loss Jul 23, 2025
@zhaochenyang20
Copy link
Collaborator

@frederrx let me help on the merge! Thanks!

cliprange=None,
cliprange_low=None,
cliprange_high=None,
clip_ratio_c=3.0,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

despite that verl does not list compute_policy_loss as an official API, we do notice that many verl extensions import this function in their recipe. I'd recommend mark it with verl.utils.import_utils.deprecated, add a new function compute_policy_loss_vanilla, note it in #2744. We then remove compute_policy_loss in the next release.

@eric-haibin-lin eric-haibin-lin self-assigned this Jul 24, 2025
@frrad
Copy link
Contributor Author

frrad commented Jul 25, 2025

changed. ptal

@eric-haibin-lin eric-haibin-lin merged commit ea44424 into volcengine:main Jul 26, 2025
48 of 53 checks passed
@frrad frrad deleted the compute_policy_loss branch July 26, 2025 17:30
oseyosey pushed a commit to oseyosey/verl that referenced this pull request Jul 28, 2025
…#2701)

### What does this PR do?

currently the vanilla policy loss mode is special cased. this moves
vanilla onto the shared interface and stops speical-casing it.


### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [X ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

---------

Co-authored-by: Fred <[email protected]>
SumanthRH pushed a commit to SumanthRH/verl that referenced this pull request Jul 29, 2025
…#2701)

### What does this PR do?

currently the vanilla policy loss mode is special cased. this moves
vanilla onto the shared interface and stops speical-casing it.


### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [X ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

---------

Co-authored-by: Fred <[email protected]>
yellowbee686 pushed a commit to yellowbee686/verl that referenced this pull request Jul 31, 2025
…#2701)

### What does this PR do?

currently the vanilla policy loss mode is special cased. this moves
vanilla onto the shared interface and stops speical-casing it.


### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [X ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

---------

Co-authored-by: Fred <[email protected]>
Juniper1021 pushed a commit to Juniper1021/verl that referenced this pull request Aug 7, 2025
…#2701)

### What does this PR do?

currently the vanilla policy loss mode is special cased. this moves
vanilla onto the shared interface and stops speical-casing it.


### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [X ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

---------

Co-authored-by: Fred <[email protected]>
whatadayG pushed a commit to whatadayG/verl that referenced this pull request Sep 5, 2025
…#2701)

### What does this PR do?

currently the vanilla policy loss mode is special cased. this moves
vanilla onto the shared interface and stops speical-casing it.


### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [X ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

---------

Co-authored-by: Fred <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants