[rollout, vllm, sglang] fix: allow user customization of `repetition_penalty` to avoid watchdog timeout during GRPO rollout #3309

Mighten · 2025-09-02T10:07:56Z

Allow user customization of repetition_penalty to avoid watchdog timeout during GRPO rollout

What does this PR do?

This PR adds an interface for users to specify repetition_penalty, which helps avoid repetition in LLM generation and prevents watchdog timeouts during GRPO rollout. If not specified, repetition_penalty will remain at its default value of 1.0.

Checklist Before Starting

Search for similar PRs. No similar PRs found.
Format the PR title as [{modules}] {type}: {description} (This will be checked by the CI)
- {modules} include fsdp, megatron, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data
- If this PR involves multiple modules, separate them with , like [megatron, fsdp, doc]
- {type} is in feat, fix, refactor, chore, test
- If this PR breaks any API (CLI arguments, config, function signature, etc.), add [BREAKING] to the beginning of the title.
- Example: [BREAKING][fsdp, megatron] feat: dynamic batching

Test

This PR can be vetted by existing CI test cases.

API and Usage Example

Previously, users could not specify repetition_penalty, but this PR adds support for it.

For example, users can now start GRPO training with a command like:

python -m verl.trainer.main_ppo \
    +actor_rollout_ref.rollout.repetition_penalty=1.05 \
    # other params here...

Design & Code Changes

This PR adds an interface allowing users to specify the repetition_penalty (e.g., 1.05), while maintaining backward compatibility with the default value of 1.0.

Checklist Before Submitting

Important

Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.

Read the Contribute Guide.
Apply pre-commit checks: pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always
Add / Update the documentation.
Add unit or end-to-end test(s) to the CI workflow to cover all the code. If not feasible, explain why: ...
Once your PR is ready for CI, send a message in the ci-request channel in the verl Slack workspace. (If not accessible, please try the Feishu group (飞书群).)

… to avoid watchdog timeout during GRPO rollout

vermouth1992 · 2025-09-02T13:40:28Z

Could you also modify the args of server mode and sglang? Thanks!

Mighten · 2025-09-02T13:43:10Z

Could you also modify the args of server mode and sglang? Thanks!

Sure, let me read the related codes.

Mighten · 2025-09-02T14:21:07Z

Could you also modify the args of server mode and sglang? Thanks!

Could you grant me CI test permissions to execute CI pipeline test cases?

And please take another look, thank you.

Mighten · 2025-09-03T12:10:47Z

/gemini how to fix the failed sgl CI testcase?

gemini-code-assist · 2025-09-03T12:11:02Z

Hi Mighten,

I understand you're looking for help with a failed sgl CI test case. As an AI assistant, I don't have the ability to directly access or debug your CI environment, nor can I execute commands or grant permissions.

However, I can offer some general guidance on how to approach debugging CI failures, especially for sglang related tests:

Examine the CI Logs: The most crucial first step is to thoroughly review the detailed logs of the failed sgl CI run. Look for specific error messages, stack traces, or any output that indicates why the test failed. This often points directly to the problematic line of code or configuration.
Reproduce Locally: If possible, try to reproduce the exact CI failure on your local development environment. This allows for more interactive debugging, setting breakpoints, and inspecting variables. Ensure your local environment matches the CI environment as closely as possible (e.g., dependencies, Python version).
Review Relevant Test Files: Based on the changes you've made, specifically in verl/workers/rollout/sglang_rollout/, you should look at the corresponding test files. For sglang related tests, you might find relevant tests in directories like tests/workers/rollout/rollout_sglang/ or tests/special_e2e/ that cover sglang functionality.
Check the sgl.yml Workflow: The CI workflow definition file, located at .github/workflows/sgl.yml, can provide insights into the exact steps and commands being executed during the sgl CI run. Understanding these steps might help you pinpoint where the failure occurs.

Focusing on the error messages in the CI logs will be key to understanding the root cause. Let me know if you have specific code snippets or error messages you'd like me to review from a code correctness or best practices perspective, but I cannot directly debug the CI system itself.

Mighten · 2025-09-03T14:59:41Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces a configurable repetition_penalty for SGLang and vLLM rollouts, which is a useful addition to prevent generation loops. The implementation correctly reads the new configuration in the initialization paths. However, in the asynchronous server implementations for both SGLang and vLLM, the new logic incorrectly overrides any repetition_penalty provided by the caller, instead of using the configured value as a default. I've added critical comments with suggestions to fix this behavior to ensure that per-request customization is still possible.

verl/workers/rollout/sglang_rollout/async_sglang_server.py

verl/workers/rollout/vllm_rollout/vllm_async_server.py

…of async server

…penalty` to avoid watchdog timeout during GRPO rollout (volcengine#3309) Allow user customization of `repetition_penalty` to avoid watchdog timeout during GRPO rollout ### What does this PR do? This PR adds an interface for users to specify `repetition_penalty`, which helps avoid repetition in LLM generation and prevents watchdog timeouts during GRPO rollout. If not specified, `repetition_penalty` will remain at its default value of `1.0`. ### Checklist Before Starting - [X] Search for similar PRs. No similar PRs found. - [X] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test This PR can be vetted by existing CI test cases. ### API and Usage Example Previously, users could not specify `repetition_penalty`, but this PR adds support for it. For example, users can now start GRPO training with a command like: ```bash python -m verl.trainer.main_ppo \ +actor_rollout_ref.rollout.repetition_penalty=1.05 \ # other params here... ``` ### Design & Code Changes This PR adds an interface allowing users to specify the `repetition_penalty` (e.g., `1.05`), while maintaining backward compatibility with the default value of `1.0`. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [X] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [X] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [X] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [X] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

[rollout, vllm] fix: allow user customization of repetition_penalty…

bcaeaf6

… to avoid watchdog timeout during GRPO rollout

Mighten marked this pull request as ready for review September 2, 2025 10:20

Mighten requested review from PeterSH6, chenhaiq and wuxibin89 as code owners September 2, 2025 10:20

vermouth1992 approved these changes Sep 2, 2025

View reviewed changes

Mighten added 3 commits September 2, 2025 22:07

put user-specified repetition_penalty to vllm async server

bb9e2c7

put user-specified repetition_penalty to sglang rollout engine

6566dba

put user-specified repetition_penalty to sglang async server

481eacb

Mighten requested review from SwordFaith and zhaochenyang20 as code owners September 2, 2025 14:08

Mighten changed the title ~~[rollout, vllm] fix: allow user customization of repetition_penalty to avoid watchdog timeout during GRPO rollout~~ [rollout, vllm, sglang] fix: allow user customization of repetition_penalty to avoid watchdog timeout during GRPO rollout Sep 2, 2025

gemini-code-assist bot reviewed Sep 3, 2025

View reviewed changes

verl/workers/rollout/sglang_rollout/async_sglang_server.py Outdated Show resolved Hide resolved

verl/workers/rollout/vllm_rollout/vllm_async_server.py Outdated Show resolved Hide resolved

Mighten added 2 commits September 3, 2025 23:24

prioritize user-provided repetition_penalty over the default value …

1b83e3c

…of async server

Merge branch 'refs/heads/main' into vllm-rollout-repetition_penalty

6418a63

vermouth1992 approved these changes Sep 5, 2025

View reviewed changes

vermouth1992 merged commit 0b533f7 into volcengine:main Sep 5, 2025
55 of 59 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[rollout, vllm, sglang] fix: allow user customization of `repetition_penalty` to avoid watchdog timeout during GRPO rollout #3309

[rollout, vllm, sglang] fix: allow user customization of `repetition_penalty` to avoid watchdog timeout during GRPO rollout #3309

Uh oh!

Mighten commented Sep 2, 2025 •

edited

Loading

Uh oh!

vermouth1992 commented Sep 2, 2025

Uh oh!

Mighten commented Sep 2, 2025

Uh oh!

Mighten commented Sep 2, 2025

Uh oh!

Mighten commented Sep 3, 2025

Uh oh!

gemini-code-assist bot commented Sep 3, 2025

Uh oh!

Mighten commented Sep 3, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[rollout, vllm, sglang] fix: allow user customization of repetition_penalty to avoid watchdog timeout during GRPO rollout #3309

[rollout, vllm, sglang] fix: allow user customization of repetition_penalty to avoid watchdog timeout during GRPO rollout #3309

Uh oh!

Conversation

Mighten commented Sep 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Checklist Before Starting

Test

API and Usage Example

Design & Code Changes

Checklist Before Submitting

Uh oh!

vermouth1992 commented Sep 2, 2025

Uh oh!

Mighten commented Sep 2, 2025

Uh oh!

Mighten commented Sep 2, 2025

Uh oh!

Mighten commented Sep 3, 2025

Uh oh!

gemini-code-assist bot commented Sep 3, 2025

Uh oh!

Mighten commented Sep 3, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[rollout, vllm, sglang] fix: allow user customization of `repetition_penalty` to avoid watchdog timeout during GRPO rollout #3309

[rollout, vllm, sglang] fix: allow user customization of `repetition_penalty` to avoid watchdog timeout during GRPO rollout #3309

Mighten commented Sep 2, 2025 •

edited

Loading