Skip to content

Conversation

@Mighten
Copy link
Contributor

@Mighten Mighten commented Sep 2, 2025

Allow user customization of repetition_penalty to avoid watchdog timeout during GRPO rollout

What does this PR do?

This PR adds an interface for users to specify repetition_penalty, which helps avoid repetition in LLM generation and prevents watchdog timeouts during GRPO rollout. If not specified, repetition_penalty will remain at its default value of 1.0.

Checklist Before Starting

  • Search for similar PRs. No similar PRs found.
  • Format the PR title as [{modules}] {type}: {description} (This will be checked by the CI)
    • {modules} include fsdp, megatron, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data
    • If this PR involves multiple modules, separate them with , like [megatron, fsdp, doc]
    • {type} is in feat, fix, refactor, chore, test
    • If this PR breaks any API (CLI arguments, config, function signature, etc.), add [BREAKING] to the beginning of the title.
    • Example: [BREAKING][fsdp, megatron] feat: dynamic batching

Test

This PR can be vetted by existing CI test cases.

API and Usage Example

Previously, users could not specify repetition_penalty, but this PR adds support for it.

For example, users can now start GRPO training with a command like:

python -m verl.trainer.main_ppo \
    +actor_rollout_ref.rollout.repetition_penalty=1.05 \
    # other params here...

Design & Code Changes

This PR adds an interface allowing users to specify the repetition_penalty (e.g., 1.05), while maintaining backward compatibility with the default value of 1.0.

Checklist Before Submitting

Important

Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.

… to avoid watchdog timeout during GRPO rollout
@Mighten Mighten marked this pull request as ready for review September 2, 2025 10:20
@vermouth1992
Copy link
Collaborator

Could you also modify the args of server mode and sglang? Thanks!

@Mighten
Copy link
Contributor Author

Mighten commented Sep 2, 2025

Could you also modify the args of server mode and sglang? Thanks!

Sure, let me read the related codes.

@Mighten Mighten changed the title [rollout, vllm] fix: allow user customization of repetition_penalty to avoid watchdog timeout during GRPO rollout [rollout, vllm, sglang] fix: allow user customization of repetition_penalty to avoid watchdog timeout during GRPO rollout Sep 2, 2025
@Mighten
Copy link
Contributor Author

Mighten commented Sep 2, 2025

Could you also modify the args of server mode and sglang? Thanks!

Could you grant me CI test permissions to execute CI pipeline test cases?

And please take another look, thank you.

@Mighten
Copy link
Contributor Author

Mighten commented Sep 3, 2025

/gemini how to fix the failed sgl CI testcase?

@gemini-code-assist
Copy link
Contributor

Hi Mighten,

I understand you're looking for help with a failed sgl CI test case. As an AI assistant, I don't have the ability to directly access or debug your CI environment, nor can I execute commands or grant permissions.

However, I can offer some general guidance on how to approach debugging CI failures, especially for sglang related tests:

  1. Examine the CI Logs: The most crucial first step is to thoroughly review the detailed logs of the failed sgl CI run. Look for specific error messages, stack traces, or any output that indicates why the test failed. This often points directly to the problematic line of code or configuration.
  2. Reproduce Locally: If possible, try to reproduce the exact CI failure on your local development environment. This allows for more interactive debugging, setting breakpoints, and inspecting variables. Ensure your local environment matches the CI environment as closely as possible (e.g., dependencies, Python version).
  3. Review Relevant Test Files: Based on the changes you've made, specifically in verl/workers/rollout/sglang_rollout/, you should look at the corresponding test files. For sglang related tests, you might find relevant tests in directories like tests/workers/rollout/rollout_sglang/ or tests/special_e2e/ that cover sglang functionality.
  4. Check the sgl.yml Workflow: The CI workflow definition file, located at .github/workflows/sgl.yml, can provide insights into the exact steps and commands being executed during the sgl CI run. Understanding these steps might help you pinpoint where the failure occurs.

Focusing on the error messages in the CI logs will be key to understanding the root cause. Let me know if you have specific code snippets or error messages you'd like me to review from a code correctness or best practices perspective, but I cannot directly debug the CI system itself.

@Mighten
Copy link
Contributor Author

Mighten commented Sep 3, 2025

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a configurable repetition_penalty for SGLang and vLLM rollouts, which is a useful addition to prevent generation loops. The implementation correctly reads the new configuration in the initialization paths. However, in the asynchronous server implementations for both SGLang and vLLM, the new logic incorrectly overrides any repetition_penalty provided by the caller, instead of using the configured value as a default. I've added critical comments with suggestions to fix this behavior to ensure that per-request customization is still possible.

@vermouth1992 vermouth1992 merged commit 0b533f7 into volcengine:main Sep 5, 2025
55 of 59 checks passed
cczitong123 pushed a commit to cczitong123/verl that referenced this pull request Sep 5, 2025
…penalty` to avoid watchdog timeout during GRPO rollout (volcengine#3309)

Allow user customization of `repetition_penalty` to avoid watchdog
timeout during GRPO rollout

### What does this PR do?

This PR adds an interface for users to specify `repetition_penalty`,
which helps avoid repetition in LLM generation and prevents watchdog
timeouts during GRPO rollout. If not specified, `repetition_penalty`
will remain at its default value of `1.0`.


### Checklist Before Starting

- [X] Search for similar PRs. No similar PRs found.
- [X] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

This PR can be vetted by existing CI test cases.

### API and Usage Example

Previously, users could not specify `repetition_penalty`, but this PR
adds support for it.

For example, users can now start GRPO training with a command like:

```bash
python -m verl.trainer.main_ppo \
    +actor_rollout_ref.rollout.repetition_penalty=1.05 \
    # other params here...
```

### Design & Code Changes

This PR adds an interface allowing users to specify the
`repetition_penalty` (e.g., `1.05`), while maintaining backward
compatibility with the default value of `1.0`.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [X] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [X] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [X] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [X] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
yellowbee686 pushed a commit to yellowbee686/verl that referenced this pull request Sep 5, 2025
…penalty` to avoid watchdog timeout during GRPO rollout (volcengine#3309)

Allow user customization of `repetition_penalty` to avoid watchdog
timeout during GRPO rollout

### What does this PR do?

This PR adds an interface for users to specify `repetition_penalty`,
which helps avoid repetition in LLM generation and prevents watchdog
timeouts during GRPO rollout. If not specified, `repetition_penalty`
will remain at its default value of `1.0`.


### Checklist Before Starting

- [X] Search for similar PRs. No similar PRs found.
- [X] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

This PR can be vetted by existing CI test cases.

### API and Usage Example

Previously, users could not specify `repetition_penalty`, but this PR
adds support for it.

For example, users can now start GRPO training with a command like:

```bash
python -m verl.trainer.main_ppo \
    +actor_rollout_ref.rollout.repetition_penalty=1.05 \
    # other params here...
```

### Design & Code Changes

This PR adds an interface allowing users to specify the
`repetition_penalty` (e.g., `1.05`), while maintaining backward
compatibility with the default value of `1.0`.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [X] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [X] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [X] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [X] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
WncFht pushed a commit to WncFht/verl that referenced this pull request Oct 10, 2025
…penalty` to avoid watchdog timeout during GRPO rollout (volcengine#3309)

Allow user customization of `repetition_penalty` to avoid watchdog
timeout during GRPO rollout

### What does this PR do?

This PR adds an interface for users to specify `repetition_penalty`,
which helps avoid repetition in LLM generation and prevents watchdog
timeouts during GRPO rollout. If not specified, `repetition_penalty`
will remain at its default value of `1.0`.


### Checklist Before Starting

- [X] Search for similar PRs. No similar PRs found.
- [X] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

This PR can be vetted by existing CI test cases.

### API and Usage Example

Previously, users could not specify `repetition_penalty`, but this PR
adds support for it.

For example, users can now start GRPO training with a command like:

```bash
python -m verl.trainer.main_ppo \
    +actor_rollout_ref.rollout.repetition_penalty=1.05 \
    # other params here...
```

### Design & Code Changes

This PR adds an interface allowing users to specify the
`repetition_penalty` (e.g., `1.05`), while maintaining backward
compatibility with the default value of `1.0`.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [X] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [X] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [X] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [X] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
masoudhashemi pushed a commit to masoudhashemi/verl that referenced this pull request Oct 19, 2025
…penalty` to avoid watchdog timeout during GRPO rollout (volcengine#3309)

Allow user customization of `repetition_penalty` to avoid watchdog
timeout during GRPO rollout

### What does this PR do?

This PR adds an interface for users to specify `repetition_penalty`,
which helps avoid repetition in LLM generation and prevents watchdog
timeouts during GRPO rollout. If not specified, `repetition_penalty`
will remain at its default value of `1.0`.


### Checklist Before Starting

- [X] Search for similar PRs. No similar PRs found.
- [X] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

This PR can be vetted by existing CI test cases.

### API and Usage Example

Previously, users could not specify `repetition_penalty`, but this PR
adds support for it.

For example, users can now start GRPO training with a command like:

```bash
python -m verl.trainer.main_ppo \
    +actor_rollout_ref.rollout.repetition_penalty=1.05 \
    # other params here...
```

### Design & Code Changes

This PR adds an interface allowing users to specify the
`repetition_penalty` (e.g., `1.05`), while maintaining backward
compatibility with the default value of `1.0`.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [X] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [X] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [X] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [X] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
techkang pushed a commit to techkang/verl that referenced this pull request Oct 31, 2025
…penalty` to avoid watchdog timeout during GRPO rollout (volcengine#3309)

Allow user customization of `repetition_penalty` to avoid watchdog
timeout during GRPO rollout

### What does this PR do?

This PR adds an interface for users to specify `repetition_penalty`,
which helps avoid repetition in LLM generation and prevents watchdog
timeouts during GRPO rollout. If not specified, `repetition_penalty`
will remain at its default value of `1.0`.


### Checklist Before Starting

- [X] Search for similar PRs. No similar PRs found.
- [X] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

This PR can be vetted by existing CI test cases.

### API and Usage Example

Previously, users could not specify `repetition_penalty`, but this PR
adds support for it.

For example, users can now start GRPO training with a command like:

```bash
python -m verl.trainer.main_ppo \
    +actor_rollout_ref.rollout.repetition_penalty=1.05 \
    # other params here...
```

### Design & Code Changes

This PR adds an interface allowing users to specify the
`repetition_penalty` (e.g., `1.05`), while maintaining backward
compatibility with the default value of `1.0`.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [X] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [X] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [X] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [X] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants