Skip to content

[sglang, recipe] feat: add SGLang as rollout engine for one-step-off-policy#3531

Merged
wuxibin89 merged 2 commits into
verl-project:mainfrom
KAMiPan:main
Oct 14, 2025
Merged

[sglang, recipe] feat: add SGLang as rollout engine for one-step-off-policy#3531
wuxibin89 merged 2 commits into
verl-project:mainfrom
KAMiPan:main

Conversation

@KAMiPan
Copy link
Copy Markdown
Contributor

@KAMiPan KAMiPan commented Sep 19, 2025

What does this PR do?

This PR extends the one-step-off-policy recipe by adding SGLang as an alternative rollout engine to vLLM, allowing flexible backend selection and improving training efficiency.

Checklist Before Starting

  • Search for similar PRs. Paste at least one query link here: Add async vllm backend support for one-step-off-policy training in disaggregated architecture #3460
  • Format the PR title as [{modules}] {type}: {description} (This will be checked by the CI)
    • {modules} include fsdp, megatron, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data
    • If this PR involves multiple modules, separate them with , like [megatron, fsdp, doc]
    • {type} is in feat, fix, refactor, chore, test
    • If this PR breaks any API (CLI arguments, config, function signature, etc.), add [BREAKING] to the beginning of the title.
    • Example: [BREAKING][fsdp, megatron] feat: dynamic batching

Test

To validate this solution, we adopted the existing experimental configuration from the recipe one-step-off-policy.

The evaluation demonstrates that the proposed SGLang rollout engine integration achieves effective acceleration in one-step-off-policy asynchronous training, providing users with enhanced rollout engine options for diverse deployment scenarios.

Experimental Results

  • Machine Configuration: 2 nodes with 16 H20 GPUs each
    • Generation: 4 GPUs
    • Training: 12 GPUs
  • Model: Qwen2.5-Math-7B
  • Max Response Length: 8,192 tokens
  • Algorithm: DAPO
  • Rollout Engine: vLLM, SGLang
training mode engine step gen wait_prev_gen generate_sequences old_log_prob update_actor total time acc/best@32/mean acc/maj@32/mean
colocate sync SGLang+FSDP2 452 131 - 125 54 199 12h25m 0.6560 0.4471
one-step-overlap async SGLang+FSDP2 406 - 12 305 58 245 11h12m (+11%) 0.6303 0.4443
  • colocate sync: step ≈ gen + old_log_prob + update_actor
  • one-step-overlap async: step ≈ max(wait_prev_gen + generate_sequences, old_log_prob + update_actor)
image

API and Usage Example

Configuration Example

# Using SGLang engine
python3 -m recipe.one_step_off_policy.main_ppo \
    actor_rollout_ref.rollout.name=sglang \
    # ... other configuration parameters

# Using vLLM engine
python3 -m recipe.one_step_off_policy.main_ppo \
    actor_rollout_ref.rollout.name=vllm \
    # ... other configuration parameters

Script Usage

# Using SGLang engine
bash dapo_7b_math_fsdp2_sglang_4_12.sh
bash dapo_7b_math_fsdp2_sglang_colocate.sh

# Using vLLM engine
bash dapo_7b_math_fsdp2_4_12.sh
bash dapo_7b_math_fsdp2_colocate.sh

Checklist Before Submitting

Important

Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.

@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Sep 19, 2025

CLA assistant check
All committers have signed the CLA.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds SGLang as a rollout engine option for the one-step-off-policy recipe, providing an alternative to vLLM. The changes include new configuration files and experiment scripts for SGLang, and modifications to fsdp_workers.py to handle weight synchronization for the new engine. While the implementation is mostly correct, I've identified a significant performance issue in the weight synchronization logic for both SGLang and the existing vLLM implementation. My review includes a comment detailing this issue and a recommendation for refactoring to improve efficiency.

Comment on lines 104 to +117
if self._is_rollout:
inference_model.load_weights([(key, tensor)])
if rollout_name == "vllm":
inference_model.load_weights([(key, tensor)])
elif rollout_name == "sglang":
loop.run_until_complete(self.update_weights(inference_model, [(key, tensor)]))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This block for updating weights is inside a for loop that iterates over each weight tensor. Calling loop.run_until_complete for sglang and inference_model.load_weights for vllm on each tensor individually is inefficient.

For sglang, run_until_complete has significant overhead from setting up and tearing down the event loop. For vllm, it results in many small load_weights calls.

To improve performance, you should refactor this to batch the weight updates. Collect all (key, tensor) pairs in a list within the loop, and then make a single call to load_weights (for vllm) or run_until_complete (for sglang) with the entire list of tensors after the loop finishes. This will significantly reduce overhead.

vermouth1992 pushed a commit that referenced this pull request Sep 22, 2025
…olicy (#3556)

### What does this PR do?

Due to updated in the main package, the rollout worker calls
`self.model_config` during `generate_sequences`
(https://github.com/volcengine/verl/blob/d33c85e2c779da1203e54275645b7b30f7fe3ce1/verl/workers/fsdp_workers.py#L869)
which hasn't been initialized in current one-step-off recipe. This will
through out runtime errors.

Similar code in the default fsdp worker:
https://github.com/volcengine/verl/blob/d33c85e2c779da1203e54275645b7b30f7fe3ce1/verl/workers/fsdp_workers.py#L563

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here:
[...](#3531)
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
@KAMiPan
Copy link
Copy Markdown
Contributor Author

KAMiPan commented Sep 22, 2025

VocabVictor pushed a commit to VocabVictor/verl-plus that referenced this pull request Sep 24, 2025
…olicy (#3556)

### What does this PR do?

Due to updated in the main package, the rollout worker calls
`self.model_config` during `generate_sequences`
(https://github.com/volcengine/verl/blob/38edf0862a6f0c5612f2560b0eb765ebe0133c5a/verl/workers/fsdp_workers.py#L869)
which hasn't been initialized in current one-step-off recipe. This will
through out runtime errors.

Similar code in the default fsdp worker:
https://github.com/volcengine/verl/blob/38edf0862a6f0c5612f2560b0eb765ebe0133c5a/verl/workers/fsdp_workers.py#L563

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here:
[...](verl-project/verl#3531)
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
@KAMiPan KAMiPan marked this pull request as draft September 25, 2025 09:16
@KAMiPan KAMiPan marked this pull request as ready for review September 25, 2025 09:16
import torch
import torch.distributed
from omegaconf import DictConfig, OmegaConf
from sglang.srt.weight_sync.utils import update_weights as sgl_update_weights
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move this import into local scope.

if rollout_name == "vllm":
from .vllm_sharding_manager import VLLMShardingManager

rollout_sharding_manager = VLLMShardingManager(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All ShardingManagers have been deprecated in #3285 and will be removed in release v0.7

@wuxibin89 wuxibin89 merged commit 3abcc09 into verl-project:main Oct 14, 2025
7 checks passed
masoudhashemi pushed a commit to masoudhashemi/verl that referenced this pull request Oct 19, 2025
…olicy (verl-project#3556)

### What does this PR do?

Due to updated in the main package, the rollout worker calls
`self.model_config` during `generate_sequences`
(https://github.com/volcengine/verl/blob/d33c85e2c779da1203e54275645b7b30f7fe3ce1/verl/workers/fsdp_workers.py#L869)
which hasn't been initialized in current one-step-off recipe. This will
through out runtime errors.

Similar code in the default fsdp worker:
https://github.com/volcengine/verl/blob/d33c85e2c779da1203e54275645b7b30f7fe3ce1/verl/workers/fsdp_workers.py#L563

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here:
[...](verl-project#3531)
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
masoudhashemi pushed a commit to masoudhashemi/verl that referenced this pull request Oct 19, 2025
…policy (verl-project#3531)

### What does this PR do?

This PR extends the one-step-off-policy recipe by adding SGLang as an
alternative rollout engine to vLLM, allowing flexible backend selection
and improving training efficiency.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here:
verl-project#3460
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

To validate this solution, we adopted the existing experimental
configuration from the recipe one-step-off-policy.

The evaluation demonstrates that the proposed SGLang rollout engine
integration achieves effective acceleration in one-step-off-policy
asynchronous training, providing users with enhanced rollout engine
options for diverse deployment scenarios.

**Experimental Results**

- **Machine Configuration**: 2 nodes with 16 H20 GPUs each
    - Generation: 4 GPUs
    - Training: 12 GPUs
- **Model**: Qwen2.5-Math-7B
- **Max Response Length**: 8,192 tokens
- **Algorithm**: DAPO
- **Rollout Engine**: vLLM, SGLang

| training mode | engine | step | gen | wait_prev_gen |
generate_sequences | old_log_prob | update_actor | total time |
acc/best@32/mean | acc/maj@32/mean |

|------------------------|----------------|------|-----|---------------|--------------------|--------------|--------------|---------------|------------------|-----------------|
| colocate sync | SGLang+FSDP2 | 452 | 131 | - | 125 | 54 | 199 | 12h25m
| 0.6560 | 0.4471 |
| one-step-overlap async | SGLang+FSDP2 | 406 | - | 12 | 305 | 58 | 245
| 11h12m (+11%) | 0.6303 | 0.4443 |

* colocate sync: step ≈ gen + old_log_prob + update_actor
* one-step-overlap async: step ≈ max(wait_prev_gen + generate_sequences,
old_log_prob + update_actor)

<img width="1218" height="777" alt="image"
src="https://github.com/user-attachments/assets/58734164-2534-492f-bf00-1e80faae0fe7"
/>

### API and Usage Example

**Configuration Example**
```bash
# Using SGLang engine
python3 -m recipe.one_step_off_policy.main_ppo \
    actor_rollout_ref.rollout.name=sglang \
    # ... other configuration parameters

# Using vLLM engine
python3 -m recipe.one_step_off_policy.main_ppo \
    actor_rollout_ref.rollout.name=vllm \
    # ... other configuration parameters
```

**Script Usage**
```bash
# Using SGLang engine
bash dapo_7b_math_fsdp2_sglang_4_12.sh
bash dapo_7b_math_fsdp2_sglang_colocate.sh

# Using vLLM engine
bash dapo_7b_math_fsdp2_4_12.sh
bash dapo_7b_math_fsdp2_colocate.sh
```

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

---------

Co-authored-by: wuxibin <wuxibin@bytedance.com>
techkang pushed a commit to techkang/verl that referenced this pull request Oct 31, 2025
…olicy (verl-project#3556)

### What does this PR do?

Due to updated in the main package, the rollout worker calls
`self.model_config` during `generate_sequences`
(https://github.com/volcengine/verl/blob/d33c85e2c779da1203e54275645b7b30f7fe3ce1/verl/workers/fsdp_workers.py#L869)
which hasn't been initialized in current one-step-off recipe. This will
through out runtime errors.

Similar code in the default fsdp worker:
https://github.com/volcengine/verl/blob/d33c85e2c779da1203e54275645b7b30f7fe3ce1/verl/workers/fsdp_workers.py#L563

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here:
[...](verl-project#3531)
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
mtian8 pushed a commit to mtian8/verl that referenced this pull request Nov 1, 2025
…olicy (verl-project#3556)

### What does this PR do?

Due to updated in the main package, the rollout worker calls
`self.model_config` during `generate_sequences`
(https://github.com/volcengine/verl/blob/d33c85e2c779da1203e54275645b7b30f7fe3ce1/verl/workers/fsdp_workers.py#L869)
which hasn't been initialized in current one-step-off recipe. This will
through out runtime errors.

Similar code in the default fsdp worker:
https://github.com/volcengine/verl/blob/d33c85e2c779da1203e54275645b7b30f7fe3ce1/verl/workers/fsdp_workers.py#L563

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here:
[...](verl-project#3531)
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
mtian8 pushed a commit to mtian8/verl that referenced this pull request Nov 1, 2025
…policy (verl-project#3531)

### What does this PR do?

This PR extends the one-step-off-policy recipe by adding SGLang as an
alternative rollout engine to vLLM, allowing flexible backend selection
and improving training efficiency.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here:
verl-project#3460
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

To validate this solution, we adopted the existing experimental
configuration from the recipe one-step-off-policy.

The evaluation demonstrates that the proposed SGLang rollout engine
integration achieves effective acceleration in one-step-off-policy
asynchronous training, providing users with enhanced rollout engine
options for diverse deployment scenarios.

**Experimental Results**

- **Machine Configuration**: 2 nodes with 16 H20 GPUs each
    - Generation: 4 GPUs
    - Training: 12 GPUs
- **Model**: Qwen2.5-Math-7B
- **Max Response Length**: 8,192 tokens
- **Algorithm**: DAPO
- **Rollout Engine**: vLLM, SGLang

| training mode | engine | step | gen | wait_prev_gen |
generate_sequences | old_log_prob | update_actor | total time |
acc/best@32/mean | acc/maj@32/mean |

|------------------------|----------------|------|-----|---------------|--------------------|--------------|--------------|---------------|------------------|-----------------|
| colocate sync | SGLang+FSDP2 | 452 | 131 | - | 125 | 54 | 199 | 12h25m
| 0.6560 | 0.4471 |
| one-step-overlap async | SGLang+FSDP2 | 406 | - | 12 | 305 | 58 | 245
| 11h12m (+11%) | 0.6303 | 0.4443 |

* colocate sync: step ≈ gen + old_log_prob + update_actor
* one-step-overlap async: step ≈ max(wait_prev_gen + generate_sequences,
old_log_prob + update_actor)

<img width="1218" height="777" alt="image"
src="https://github.com/user-attachments/assets/58734164-2534-492f-bf00-1e80faae0fe7"
/>

### API and Usage Example

**Configuration Example**
```bash
# Using SGLang engine
python3 -m recipe.one_step_off_policy.main_ppo \
    actor_rollout_ref.rollout.name=sglang \
    # ... other configuration parameters

# Using vLLM engine
python3 -m recipe.one_step_off_policy.main_ppo \
    actor_rollout_ref.rollout.name=vllm \
    # ... other configuration parameters
```

**Script Usage**
```bash
# Using SGLang engine
bash dapo_7b_math_fsdp2_sglang_4_12.sh
bash dapo_7b_math_fsdp2_sglang_colocate.sh

# Using vLLM engine
bash dapo_7b_math_fsdp2_4_12.sh
bash dapo_7b_math_fsdp2_colocate.sh
```

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

---------

Co-authored-by: wuxibin <wuxibin@bytedance.com>
wangboxiong320 pushed a commit to wangboxiong320/verl that referenced this pull request Nov 1, 2025
…policy (verl-project#3531)

### What does this PR do?

This PR extends the one-step-off-policy recipe by adding SGLang as an
alternative rollout engine to vLLM, allowing flexible backend selection
and improving training efficiency.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here:
verl-project#3460
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

To validate this solution, we adopted the existing experimental
configuration from the recipe one-step-off-policy.

The evaluation demonstrates that the proposed SGLang rollout engine
integration achieves effective acceleration in one-step-off-policy
asynchronous training, providing users with enhanced rollout engine
options for diverse deployment scenarios.

**Experimental Results**

- **Machine Configuration**: 2 nodes with 16 H20 GPUs each
    - Generation: 4 GPUs
    - Training: 12 GPUs
- **Model**: Qwen2.5-Math-7B
- **Max Response Length**: 8,192 tokens
- **Algorithm**: DAPO
- **Rollout Engine**: vLLM, SGLang

| training mode | engine | step | gen | wait_prev_gen |
generate_sequences | old_log_prob | update_actor | total time |
acc/best@32/mean | acc/maj@32/mean |

|------------------------|----------------|------|-----|---------------|--------------------|--------------|--------------|---------------|------------------|-----------------|
| colocate sync | SGLang+FSDP2 | 452 | 131 | - | 125 | 54 | 199 | 12h25m
| 0.6560 | 0.4471 |
| one-step-overlap async | SGLang+FSDP2 | 406 | - | 12 | 305 | 58 | 245
| 11h12m (+11%) | 0.6303 | 0.4443 |

* colocate sync: step ≈ gen + old_log_prob + update_actor
* one-step-overlap async: step ≈ max(wait_prev_gen + generate_sequences,
old_log_prob + update_actor)

<img width="1218" height="777" alt="image"
src="https://github.com/user-attachments/assets/58734164-2534-492f-bf00-1e80faae0fe7"
/>

### API and Usage Example

**Configuration Example**
```bash
# Using SGLang engine
python3 -m recipe.one_step_off_policy.main_ppo \
    actor_rollout_ref.rollout.name=sglang \
    # ... other configuration parameters

# Using vLLM engine
python3 -m recipe.one_step_off_policy.main_ppo \
    actor_rollout_ref.rollout.name=vllm \
    # ... other configuration parameters
```

**Script Usage**
```bash
# Using SGLang engine
bash dapo_7b_math_fsdp2_sglang_4_12.sh
bash dapo_7b_math_fsdp2_sglang_colocate.sh

# Using vLLM engine
bash dapo_7b_math_fsdp2_4_12.sh
bash dapo_7b_math_fsdp2_colocate.sh
```

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

---------

Co-authored-by: wuxibin <wuxibin@bytedance.com>
chenjiaoAngel added a commit to chenjiaoAngel/verl that referenced this pull request Nov 14, 2025
…olicy (verl-project#3556)

### What does this PR do?

Due to updated in the main package, the rollout worker calls
`self.model_config` during `generate_sequences`
(https://github.com/volcengine/verl/blob/b68b9f7bc2e85545f85ac7266e9ae02dba6b282f/verl/workers/fsdp_workers.py#L869)
which hasn't been initialized in current one-step-off recipe. This will
through out runtime errors.

Similar code in the default fsdp worker:
https://github.com/volcengine/verl/blob/b68b9f7bc2e85545f85ac7266e9ae02dba6b282f/verl/workers/fsdp_workers.py#L563

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here:
[...](verl-project#3531)
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
chenjiaoAngel added a commit to chenjiaoAngel/verl that referenced this pull request Nov 14, 2025
…policy (verl-project#3531)

### What does this PR do?

This PR extends the one-step-off-policy recipe by adding SGLang as an
alternative rollout engine to vLLM, allowing flexible backend selection
and improving training efficiency.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here:
verl-project#3460
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

To validate this solution, we adopted the existing experimental
configuration from the recipe one-step-off-policy.

The evaluation demonstrates that the proposed SGLang rollout engine
integration achieves effective acceleration in one-step-off-policy
asynchronous training, providing users with enhanced rollout engine
options for diverse deployment scenarios.

**Experimental Results**

- **Machine Configuration**: 2 nodes with 16 H20 GPUs each
    - Generation: 4 GPUs
    - Training: 12 GPUs
- **Model**: Qwen2.5-Math-7B
- **Max Response Length**: 8,192 tokens
- **Algorithm**: DAPO
- **Rollout Engine**: vLLM, SGLang

| training mode | engine | step | gen | wait_prev_gen |
generate_sequences | old_log_prob | update_actor | total time |
acc/best@32/mean | acc/maj@32/mean |

|------------------------|----------------|------|-----|---------------|--------------------|--------------|--------------|---------------|------------------|-----------------|
| colocate sync | SGLang+FSDP2 | 452 | 131 | - | 125 | 54 | 199 | 12h25m
| 0.6560 | 0.4471 |
| one-step-overlap async | SGLang+FSDP2 | 406 | - | 12 | 305 | 58 | 245
| 11h12m (+11%) | 0.6303 | 0.4443 |

* colocate sync: step ≈ gen + old_log_prob + update_actor
* one-step-overlap async: step ≈ max(wait_prev_gen + generate_sequences,
old_log_prob + update_actor)

<img width="1218" height="777" alt="image"
src="https://github.com/user-attachments/assets/58734164-2534-492f-bf00-1e80faae0fe7"
/>

### API and Usage Example

**Configuration Example**
```bash
# Using SGLang engine
python3 -m recipe.one_step_off_policy.main_ppo \
    actor_rollout_ref.rollout.name=sglang \
    # ... other configuration parameters

# Using vLLM engine
python3 -m recipe.one_step_off_policy.main_ppo \
    actor_rollout_ref.rollout.name=vllm \
    # ... other configuration parameters
```

**Script Usage**
```bash
# Using SGLang engine
bash dapo_7b_math_fsdp2_sglang_4_12.sh
bash dapo_7b_math_fsdp2_sglang_colocate.sh

# Using vLLM engine
bash dapo_7b_math_fsdp2_4_12.sh
bash dapo_7b_math_fsdp2_colocate.sh
```

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

---------

Co-authored-by: wuxibin <wuxibin@bytedance.com>
chenhaiq pushed a commit to The-Hierophant/verl-1 that referenced this pull request Nov 18, 2025
…policy (verl-project#3531)

### What does this PR do?

This PR extends the one-step-off-policy recipe by adding SGLang as an
alternative rollout engine to vLLM, allowing flexible backend selection
and improving training efficiency.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here:
verl-project#3460
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

To validate this solution, we adopted the existing experimental
configuration from the recipe one-step-off-policy.

The evaluation demonstrates that the proposed SGLang rollout engine
integration achieves effective acceleration in one-step-off-policy
asynchronous training, providing users with enhanced rollout engine
options for diverse deployment scenarios.

**Experimental Results**

- **Machine Configuration**: 2 nodes with 16 H20 GPUs each
    - Generation: 4 GPUs
    - Training: 12 GPUs
- **Model**: Qwen2.5-Math-7B
- **Max Response Length**: 8,192 tokens
- **Algorithm**: DAPO
- **Rollout Engine**: vLLM, SGLang

| training mode | engine | step | gen | wait_prev_gen |
generate_sequences | old_log_prob | update_actor | total time |
acc/best@32/mean | acc/maj@32/mean |

|------------------------|----------------|------|-----|---------------|--------------------|--------------|--------------|---------------|------------------|-----------------|
| colocate sync | SGLang+FSDP2 | 452 | 131 | - | 125 | 54 | 199 | 12h25m
| 0.6560 | 0.4471 |
| one-step-overlap async | SGLang+FSDP2 | 406 | - | 12 | 305 | 58 | 245
| 11h12m (+11%) | 0.6303 | 0.4443 |

* colocate sync: step ≈ gen + old_log_prob + update_actor
* one-step-overlap async: step ≈ max(wait_prev_gen + generate_sequences,
old_log_prob + update_actor)

<img width="1218" height="777" alt="image"
src="https://github.com/user-attachments/assets/58734164-2534-492f-bf00-1e80faae0fe7"
/>

### API and Usage Example

**Configuration Example**
```bash
# Using SGLang engine
python3 -m recipe.one_step_off_policy.main_ppo \
    actor_rollout_ref.rollout.name=sglang \
    # ... other configuration parameters

# Using vLLM engine
python3 -m recipe.one_step_off_policy.main_ppo \
    actor_rollout_ref.rollout.name=vllm \
    # ... other configuration parameters
```

**Script Usage**
```bash
# Using SGLang engine
bash dapo_7b_math_fsdp2_sglang_4_12.sh
bash dapo_7b_math_fsdp2_sglang_colocate.sh

# Using vLLM engine
bash dapo_7b_math_fsdp2_4_12.sh
bash dapo_7b_math_fsdp2_colocate.sh
```

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

---------

Co-authored-by: wuxibin <wuxibin@bytedance.com>
NenoL2001 pushed a commit to NenoL2001/verl that referenced this pull request Nov 26, 2025
…olicy (verl-project#3556)

### What does this PR do?

Due to updated in the main package, the rollout worker calls
`self.model_config` during `generate_sequences`
(https://github.com/volcengine/verl/blob/d33c85e2c779da1203e54275645b7b30f7fe3ce1/verl/workers/fsdp_workers.py#L869)
which hasn't been initialized in current one-step-off recipe. This will
through out runtime errors.

Similar code in the default fsdp worker:
https://github.com/volcengine/verl/blob/d33c85e2c779da1203e54275645b7b30f7fe3ce1/verl/workers/fsdp_workers.py#L563

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here:
[...](verl-project#3531)
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
NenoL2001 pushed a commit to NenoL2001/verl that referenced this pull request Nov 26, 2025
…policy (verl-project#3531)

### What does this PR do?

This PR extends the one-step-off-policy recipe by adding SGLang as an
alternative rollout engine to vLLM, allowing flexible backend selection
and improving training efficiency.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here:
verl-project#3460
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

To validate this solution, we adopted the existing experimental
configuration from the recipe one-step-off-policy.

The evaluation demonstrates that the proposed SGLang rollout engine
integration achieves effective acceleration in one-step-off-policy
asynchronous training, providing users with enhanced rollout engine
options for diverse deployment scenarios.

**Experimental Results**

- **Machine Configuration**: 2 nodes with 16 H20 GPUs each
    - Generation: 4 GPUs
    - Training: 12 GPUs
- **Model**: Qwen2.5-Math-7B
- **Max Response Length**: 8,192 tokens
- **Algorithm**: DAPO
- **Rollout Engine**: vLLM, SGLang

| training mode | engine | step | gen | wait_prev_gen |
generate_sequences | old_log_prob | update_actor | total time |
acc/best@32/mean | acc/maj@32/mean |

|------------------------|----------------|------|-----|---------------|--------------------|--------------|--------------|---------------|------------------|-----------------|
| colocate sync | SGLang+FSDP2 | 452 | 131 | - | 125 | 54 | 199 | 12h25m
| 0.6560 | 0.4471 |
| one-step-overlap async | SGLang+FSDP2 | 406 | - | 12 | 305 | 58 | 245
| 11h12m (+11%) | 0.6303 | 0.4443 |

* colocate sync: step ≈ gen + old_log_prob + update_actor
* one-step-overlap async: step ≈ max(wait_prev_gen + generate_sequences,
old_log_prob + update_actor)

<img width="1218" height="777" alt="image"
src="https://github.com/user-attachments/assets/58734164-2534-492f-bf00-1e80faae0fe7"
/>

### API and Usage Example

**Configuration Example**
```bash
# Using SGLang engine
python3 -m recipe.one_step_off_policy.main_ppo \
    actor_rollout_ref.rollout.name=sglang \
    # ... other configuration parameters

# Using vLLM engine
python3 -m recipe.one_step_off_policy.main_ppo \
    actor_rollout_ref.rollout.name=vllm \
    # ... other configuration parameters
```

**Script Usage**
```bash
# Using SGLang engine
bash dapo_7b_math_fsdp2_sglang_4_12.sh
bash dapo_7b_math_fsdp2_sglang_colocate.sh

# Using vLLM engine
bash dapo_7b_math_fsdp2_4_12.sh
bash dapo_7b_math_fsdp2_colocate.sh
```

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

---------

Co-authored-by: wuxibin <wuxibin@bytedance.com>
paolo328 added a commit to paolo328/Verl that referenced this pull request Nov 27, 2025
…olicy (#3556)

### What does this PR do?

Due to updated in the main package, the rollout worker calls
`self.model_config` during `generate_sequences`
(https://github.com/volcengine/verl/blob/d33c85e2c779da1203e54275645b7b30f7fe3ce1/verl/workers/fsdp_workers.py#L869)
which hasn't been initialized in current one-step-off recipe. This will
through out runtime errors.

Similar code in the default fsdp worker:
https://github.com/volcengine/verl/blob/d33c85e2c779da1203e54275645b7b30f7fe3ce1/verl/workers/fsdp_workers.py#L563

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here:
[...](verl-project/verl#3531)
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
TimurTaepov pushed a commit to giorgossideris/verl that referenced this pull request Dec 20, 2025
…olicy (verl-project#3556)

### What does this PR do?

Due to updated in the main package, the rollout worker calls
`self.model_config` during `generate_sequences`
(https://github.com/volcengine/verl/blob/c97cbaa7dc69d20605b8e0287dbd24fdbe8b70cb/verl/workers/fsdp_workers.py#L869)
which hasn't been initialized in current one-step-off recipe. This will
through out runtime errors.

Similar code in the default fsdp worker:
https://github.com/volcengine/verl/blob/c97cbaa7dc69d20605b8e0287dbd24fdbe8b70cb/verl/workers/fsdp_workers.py#L563

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here:
[...](verl-project#3531)
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
TimurTaepov pushed a commit to giorgossideris/verl that referenced this pull request Dec 20, 2025
…policy (verl-project#3531)

### What does this PR do?

This PR extends the one-step-off-policy recipe by adding SGLang as an
alternative rollout engine to vLLM, allowing flexible backend selection
and improving training efficiency.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here:
verl-project#3460
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

To validate this solution, we adopted the existing experimental
configuration from the recipe one-step-off-policy.

The evaluation demonstrates that the proposed SGLang rollout engine
integration achieves effective acceleration in one-step-off-policy
asynchronous training, providing users with enhanced rollout engine
options for diverse deployment scenarios.

**Experimental Results**

- **Machine Configuration**: 2 nodes with 16 H20 GPUs each
    - Generation: 4 GPUs
    - Training: 12 GPUs
- **Model**: Qwen2.5-Math-7B
- **Max Response Length**: 8,192 tokens
- **Algorithm**: DAPO
- **Rollout Engine**: vLLM, SGLang

| training mode | engine | step | gen | wait_prev_gen |
generate_sequences | old_log_prob | update_actor | total time |
acc/best@32/mean | acc/maj@32/mean |

|------------------------|----------------|------|-----|---------------|--------------------|--------------|--------------|---------------|------------------|-----------------|
| colocate sync | SGLang+FSDP2 | 452 | 131 | - | 125 | 54 | 199 | 12h25m
| 0.6560 | 0.4471 |
| one-step-overlap async | SGLang+FSDP2 | 406 | - | 12 | 305 | 58 | 245
| 11h12m (+11%) | 0.6303 | 0.4443 |

* colocate sync: step ≈ gen + old_log_prob + update_actor
* one-step-overlap async: step ≈ max(wait_prev_gen + generate_sequences,
old_log_prob + update_actor)

<img width="1218" height="777" alt="image"
src="https://github.com/user-attachments/assets/58734164-2534-492f-bf00-1e80faae0fe7"
/>

### API and Usage Example

**Configuration Example**
```bash
# Using SGLang engine
python3 -m recipe.one_step_off_policy.main_ppo \
    actor_rollout_ref.rollout.name=sglang \
    # ... other configuration parameters

# Using vLLM engine
python3 -m recipe.one_step_off_policy.main_ppo \
    actor_rollout_ref.rollout.name=vllm \
    # ... other configuration parameters
```

**Script Usage**
```bash
# Using SGLang engine
bash dapo_7b_math_fsdp2_sglang_4_12.sh
bash dapo_7b_math_fsdp2_sglang_colocate.sh

# Using vLLM engine
bash dapo_7b_math_fsdp2_4_12.sh
bash dapo_7b_math_fsdp2_colocate.sh
```

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

---------

Co-authored-by: wuxibin <wuxibin@bytedance.com>
vyomakesh0728 added a commit to vyomakesh0728/verl that referenced this pull request Jan 22, 2026
…olicy (verl-project#3556)

### What does this PR do?

Due to updated in the main package, the rollout worker calls
`self.model_config` during `generate_sequences`
(https://github.com/volcengine/verl/blob/180beb7dbda6cba8dfb9b9a3e2654056342769e2/verl/workers/fsdp_workers.py#L869)
which hasn't been initialized in current one-step-off recipe. This will
through out runtime errors.

Similar code in the default fsdp worker:
https://github.com/volcengine/verl/blob/180beb7dbda6cba8dfb9b9a3e2654056342769e2/verl/workers/fsdp_workers.py#L563

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here:
[...](verl-project#3531)
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
vyomakesh0728 added a commit to vyomakesh0728/verl that referenced this pull request Jan 22, 2026
…policy (verl-project#3531)

### What does this PR do?

This PR extends the one-step-off-policy recipe by adding SGLang as an
alternative rollout engine to vLLM, allowing flexible backend selection
and improving training efficiency.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here:
verl-project#3460
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

To validate this solution, we adopted the existing experimental
configuration from the recipe one-step-off-policy.

The evaluation demonstrates that the proposed SGLang rollout engine
integration achieves effective acceleration in one-step-off-policy
asynchronous training, providing users with enhanced rollout engine
options for diverse deployment scenarios.

**Experimental Results**

- **Machine Configuration**: 2 nodes with 16 H20 GPUs each
    - Generation: 4 GPUs
    - Training: 12 GPUs
- **Model**: Qwen2.5-Math-7B
- **Max Response Length**: 8,192 tokens
- **Algorithm**: DAPO
- **Rollout Engine**: vLLM, SGLang

| training mode | engine | step | gen | wait_prev_gen |
generate_sequences | old_log_prob | update_actor | total time |
acc/best@32/mean | acc/maj@32/mean |

|------------------------|----------------|------|-----|---------------|--------------------|--------------|--------------|---------------|------------------|-----------------|
| colocate sync | SGLang+FSDP2 | 452 | 131 | - | 125 | 54 | 199 | 12h25m
| 0.6560 | 0.4471 |
| one-step-overlap async | SGLang+FSDP2 | 406 | - | 12 | 305 | 58 | 245
| 11h12m (+11%) | 0.6303 | 0.4443 |

* colocate sync: step ≈ gen + old_log_prob + update_actor
* one-step-overlap async: step ≈ max(wait_prev_gen + generate_sequences,
old_log_prob + update_actor)

<img width="1218" height="777" alt="image"
src="https://github.com/user-attachments/assets/58734164-2534-492f-bf00-1e80faae0fe7"
/>

### API and Usage Example

**Configuration Example**
```bash
# Using SGLang engine
python3 -m recipe.one_step_off_policy.main_ppo \
    actor_rollout_ref.rollout.name=sglang \
    # ... other configuration parameters

# Using vLLM engine
python3 -m recipe.one_step_off_policy.main_ppo \
    actor_rollout_ref.rollout.name=vllm \
    # ... other configuration parameters
```

**Script Usage**
```bash
# Using SGLang engine
bash dapo_7b_math_fsdp2_sglang_4_12.sh
bash dapo_7b_math_fsdp2_sglang_colocate.sh

# Using vLLM engine
bash dapo_7b_math_fsdp2_4_12.sh
bash dapo_7b_math_fsdp2_colocate.sh
```

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

---------

Co-authored-by: wuxibin <wuxibin@bytedance.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants