[misc] feat: support logging rollout prob vs. actor probs for debugging purpose #1712

vermouth1992 · 2025-05-27T03:14:33Z

Checklist Before Starting

Search for similar PR(s).

What does this PR do?

Support logging rollout probs vs. actor probs for debugging purpose
Support both vllm and sglang async

High-Level Design

Demonstrate the high-level design if this PR is complex.

Specific Changes

List the specific changes.

API

Demonstrate how the API changes if any.

Usage Example

Provide usage example(s) for easier usage.

# Add code snippet or script demonstrating how to use this

Test

For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc.

Additional Info.

Issue Number: Fixes issue # or discussion # if any.
Training: [Note which backend this PR will affect: FSDP, Megatron, both, or none]
Inference: [Note which backend this PR will affect: vLLM, SGLang, both, or none]

Checklist Before Submitting

Read the Contribute Guide.
Apply pre-commit checks.
Add [BREAKING] to the PR title if it breaks any API.
Update the documentation about your changes in the docs.
Add CI test(s) if necessary.

PeterSH6 · 2025-05-27T03:18:41Z

Do you observe that setting logprob=1 in vllm may be slower than logprob=0?

I observed this behavior in vllm v0.7. While vllm 0.6.3 did not incur too much overhead when setting logprob>0

Not sure if it's still a potential problem in vllm v0.8

PeterSH6 · 2025-05-27T03:23:52Z

verl/workers/rollout/vllm_rollout/vllm_rollout.py

            # if n = 1: (bs, response_length) ; if n > 1: (bs * n, response_length)
            response = output[0].to(idx.device)
-            # log_probs = output[1].to(idx.device)
+            log_probs = output[1].to(idx.device)


Do we need to check if logprob is >0 here?

logprob will never be > 0?

What I refer to is here:

kwargs = dict( n=1, logprobs=0, # can be set to 0 and let actor to recompute max_tokens=config.response_length, )

We may need to set logprobs > 0 to get logprob returns in vllm

logprobs == 0 will return the highest logprob. So it would be fine here

vermouth1992 · 2025-05-27T03:24:11Z

Do you observe that setting logprob=1 in vllm may be slower than logprob=0?

I observed this behavior in vllm v0.7. While vllm 0.6.3 did not incur too much overhead when setting logprob>0

Not sure if it's still a potential problem in vllm v0.8

The overhead seems minimal in 0.8.5.post1

…ng purpose (volcengine#1712) ### Checklist Before Starting - [X] Search for similar PR(s). ### What does this PR do? - Support logging rollout probs vs. actor probs for debugging purpose - Support both vllm and sglang async ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - **Issue Number**: Fixes issue # or discussion # if any. - **Training**: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - **Inference**: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add CI test(s) if necessary.

GHGmc2 · 2025-06-17T08:43:37Z

Can we add an option (maybe default False) for it? Sometimes we may want as less as possible data to be sent/redv over ray controller on large clusters.

vermouth1992 · 2025-06-17T08:49:38Z

Sure. I guess you can draft a PR that only pass useful keys to workers to minimize communication cost.

GHGmc2 · 2025-06-18T01:50:06Z

Sure. I guess you can draft a PR that only pass useful keys to workers to minimize communication cost.

Please help to review: #2072

…`False` (#2072) ### Checklist Before Starting - [x] Searched for similar PR(s). - [x] Checked PR Title format - In format of: [modules] type: Title - modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data` - type is in `feat, fix, refactor, chore` - can involve multiple modules, seperated by `,` or space, like `[megatron, fsdp, doc] feat: xxx` ### What does this PR do? > As discussed in #1712, we may want to minimize communication cost on large clusters, add an option for it and default as `False` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [x] Add `[BREAKING]` to the PR title `description` if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] New CI unit test(s) are added to cover the code path. - [x] Rely on existing unit tests on CI that covers the code path. --------- Co-authored-by: Chi Zhang <[email protected]>

…`False` (volcengine#2072) ### Checklist Before Starting - [x] Searched for similar PR(s). - [x] Checked PR Title format - In format of: [modules] type: Title - modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data` - type is in `feat, fix, refactor, chore` - can involve multiple modules, seperated by `,` or space, like `[megatron, fsdp, doc] feat: xxx` ### What does this PR do? > As discussed in volcengine#1712, we may want to minimize communication cost on large clusters, add an option for it and default as `False` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [x] Add `[BREAKING]` to the PR title `description` if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] New CI unit test(s) are added to cover the code path. - [x] Rely on existing unit tests on CI that covers the code path. --------- Co-authored-by: Chi Zhang <[email protected]>

…turn** for debugging purpose, follow up of #1712 (#2808) ### What does this PR do? This PR is a follow-up to #1712. - adds support for recording rollout log-probs in multi-turn conversations - moves the diff-computation code into a separate file. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

…turn** for debugging purpose, follow up of volcengine#1712 (volcengine#2808) ### What does this PR do? This PR is a follow-up to volcengine#1712. - adds support for recording rollout log-probs in multi-turn conversations - moves the diff-computation code into a separate file. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

vermouth1992 added 4 commits May 27, 2025 08:26

return rollout log probs

04002ae

remove breakpoint

1a300ea

add sglang rollout logprobs

3f4d9bb

remove verbose

663031d

vermouth1992 requested review from PeterSH6 and tongyx361 May 27, 2025 03:14

update

0623c38

vermouth1992 changed the title ~~[misc] feat: support logging rollout prob vs. actor probs~~ [misc] feat: support logging rollout prob vs. actor probs for debugging purpose May 27, 2025

PeterSH6 reviewed May 27, 2025

View reviewed changes

vermouth1992 added 3 commits May 27, 2025 14:26

update dapo scripts

aa70dac

fix max_position_embeddings

3592251

add comments

3dcb6d5

PeterSH6 approved these changes May 27, 2025

View reviewed changes

vermouth1992 merged commit 16a13d8 into main May 28, 2025
38 of 39 checks passed

vermouth1992 deleted the chi/dev/rollout_log_probs branch May 28, 2025 00:14

GHGmc2 mentioned this pull request Jun 18, 2025

[rollout] refactor: Add option for rollout_log_probs, and default as False #2072

Merged

8 tasks

TomQunChao mentioned this pull request Jul 30, 2025

[misc] feat: support logging rollout prob vs. actor probs in **multi-turn** for debugging purpose, follow up of #1712 #2808

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[misc] feat: support logging rollout prob vs. actor probs for debugging purpose #1712

[misc] feat: support logging rollout prob vs. actor probs for debugging purpose #1712

Uh oh!

vermouth1992 commented May 27, 2025

Uh oh!

PeterSH6 commented May 27, 2025

Uh oh!

PeterSH6 May 27, 2025

Uh oh!

vermouth1992 May 27, 2025

Uh oh!

PeterSH6 May 27, 2025

Uh oh!

PeterSH6 May 27, 2025

Uh oh!

vermouth1992 commented May 27, 2025

Uh oh!

Uh oh!

GHGmc2 commented Jun 17, 2025 •

edited

Loading

Uh oh!

vermouth1992 commented Jun 17, 2025

Uh oh!

GHGmc2 commented Jun 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[misc] feat: support logging rollout prob vs. actor probs for debugging purpose #1712

[misc] feat: support logging rollout prob vs. actor probs for debugging purpose #1712

Uh oh!

Conversation

vermouth1992 commented May 27, 2025

Checklist Before Starting

What does this PR do?

High-Level Design

Specific Changes

API

Usage Example

Test

Additional Info.

Checklist Before Submitting

Uh oh!

PeterSH6 commented May 27, 2025

Uh oh!

PeterSH6 May 27, 2025

Choose a reason for hiding this comment

Uh oh!

vermouth1992 May 27, 2025

Choose a reason for hiding this comment

Uh oh!

PeterSH6 May 27, 2025

Choose a reason for hiding this comment

Uh oh!

PeterSH6 May 27, 2025

Choose a reason for hiding this comment

Uh oh!

vermouth1992 commented May 27, 2025

Uh oh!

Uh oh!

GHGmc2 commented Jun 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vermouth1992 commented Jun 17, 2025

Uh oh!

GHGmc2 commented Jun 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

GHGmc2 commented Jun 17, 2025 •

edited

Loading