Skip to content

Conversation

@KAMiPan
Copy link
Contributor

@KAMiPan KAMiPan commented Sep 19, 2025

What does this PR do?

This PR extends the one-step-off-policy recipe by adding SGLang as an alternative rollout engine to vLLM, allowing flexible backend selection and improving training efficiency.

Checklist Before Starting

  • Search for similar PRs. Paste at least one query link here: Add async vllm backend support for one-step-off-policy training in disaggregated architecture #3460
  • Format the PR title as [{modules}] {type}: {description} (This will be checked by the CI)
    • {modules} include fsdp, megatron, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data
    • If this PR involves multiple modules, separate them with , like [megatron, fsdp, doc]
    • {type} is in feat, fix, refactor, chore, test
    • If this PR breaks any API (CLI arguments, config, function signature, etc.), add [BREAKING] to the beginning of the title.
    • Example: [BREAKING][fsdp, megatron] feat: dynamic batching

Test

To validate this solution, we adopted the existing experimental configuration from the recipe one-step-off-policy.

The evaluation demonstrates that the proposed SGLang rollout engine integration achieves effective acceleration in one-step-off-policy asynchronous training, providing users with enhanced rollout engine options for diverse deployment scenarios.

Experimental Results

  • Machine Configuration: 2 nodes with 16 H20 GPUs each
    • Generation: 4 GPUs
    • Training: 12 GPUs
  • Model: Qwen2.5-Math-7B
  • Max Response Length: 8,192 tokens
  • Algorithm: DAPO
  • Rollout Engine: vLLM, SGLang
training mode engine step gen wait_prev_gen generate_sequences old_log_prob update_actor total time acc/best@32/mean acc/maj@32/mean
colocate sync SGLang+FSDP2 452 131 - 125 54 199 12h25m 0.6560 0.4471
one-step-overlap async SGLang+FSDP2 406 - 12 305 58 245 11h12m (+11%) 0.6303 0.4443
  • colocate sync: step ≈ gen + old_log_prob + update_actor
  • one-step-overlap async: step ≈ max(wait_prev_gen + generate_sequences, old_log_prob + update_actor)
image

API and Usage Example

Configuration Example

# Using SGLang engine
python3 -m recipe.one_step_off_policy.main_ppo \
    actor_rollout_ref.rollout.name=sglang \
    # ... other configuration parameters

# Using vLLM engine
python3 -m recipe.one_step_off_policy.main_ppo \
    actor_rollout_ref.rollout.name=vllm \
    # ... other configuration parameters

Script Usage

# Using SGLang engine
bash dapo_7b_math_fsdp2_sglang_4_12.sh
bash dapo_7b_math_fsdp2_sglang_colocate.sh

# Using vLLM engine
bash dapo_7b_math_fsdp2_4_12.sh
bash dapo_7b_math_fsdp2_colocate.sh

Checklist Before Submitting

Important

Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.

@CLAassistant
Copy link

CLAassistant commented Sep 19, 2025

CLA assistant check
All committers have signed the CLA.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds SGLang as a rollout engine option for the one-step-off-policy recipe, providing an alternative to vLLM. The changes include new configuration files and experiment scripts for SGLang, and modifications to fsdp_workers.py to handle weight synchronization for the new engine. While the implementation is mostly correct, I've identified a significant performance issue in the weight synchronization logic for both SGLang and the existing vLLM implementation. My review includes a comment detailing this issue and a recommendation for refactoring to improve efficiency.

Comment on lines 104 to +116
if self._is_rollout:
inference_model.load_weights([(key, tensor)])
if rollout_name == "vllm":
inference_model.load_weights([(key, tensor)])
elif rollout_name == "sglang":
loop.run_until_complete(self.update_weights(inference_model, [(key, tensor)]))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This block for updating weights is inside a for loop that iterates over each weight tensor. Calling loop.run_until_complete for sglang and inference_model.load_weights for vllm on each tensor individually is inefficient.

For sglang, run_until_complete has significant overhead from setting up and tearing down the event loop. For vllm, it results in many small load_weights calls.

To improve performance, you should refactor this to batch the weight updates. Collect all (key, tensor) pairs in a list within the loop, and then make a single call to load_weights (for vllm) or run_until_complete (for sglang) with the entire list of tensors after the loop finishes. This will significantly reduce overhead.

vermouth1992 pushed a commit that referenced this pull request Sep 22, 2025
…olicy (#3556)

### What does this PR do?

Due to updated in the main package, the rollout worker calls
`self.model_config` during `generate_sequences`
(https://github.com/volcengine/verl/blob/d33c85e2c779da1203e54275645b7b30f7fe3ce1/verl/workers/fsdp_workers.py#L869)
which hasn't been initialized in current one-step-off recipe. This will
through out runtime errors.

Similar code in the default fsdp worker:
https://github.com/volcengine/verl/blob/d33c85e2c779da1203e54275645b7b30f7fe3ce1/verl/workers/fsdp_workers.py#L563

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here:
[...](#3531)
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
@KAMiPan
Copy link
Contributor Author

KAMiPan commented Sep 22, 2025

VocabVictor pushed a commit to VocabVictor/verl-plus that referenced this pull request Sep 24, 2025
…olicy (#3556)

### What does this PR do?

Due to updated in the main package, the rollout worker calls
`self.model_config` during `generate_sequences`
(https://github.com/volcengine/verl/blob/38edf0862a6f0c5612f2560b0eb765ebe0133c5a/verl/workers/fsdp_workers.py#L869)
which hasn't been initialized in current one-step-off recipe. This will
through out runtime errors.

Similar code in the default fsdp worker:
https://github.com/volcengine/verl/blob/38edf0862a6f0c5612f2560b0eb765ebe0133c5a/verl/workers/fsdp_workers.py#L563

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here:
[...](volcengine/verl#3531)
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
@KAMiPan KAMiPan marked this pull request as draft September 25, 2025 09:16
@KAMiPan KAMiPan marked this pull request as ready for review September 25, 2025 09:16
import torch
import torch.distributed
from omegaconf import DictConfig, OmegaConf
from sglang.srt.weight_sync.utils import update_weights as sgl_update_weights
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move this import into local scope.

if rollout_name == "vllm":
from .vllm_sharding_manager import VLLMShardingManager

rollout_sharding_manager = VLLMShardingManager(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All ShardingManagers have been deprecated in #3285 and will be removed in release v0.7

@wuxibin89 wuxibin89 merged commit 3abcc09 into volcengine:main Oct 14, 2025
7 checks passed
masoudhashemi pushed a commit to masoudhashemi/verl that referenced this pull request Oct 19, 2025
…olicy (volcengine#3556)

### What does this PR do?

Due to updated in the main package, the rollout worker calls
`self.model_config` during `generate_sequences`
(https://github.com/volcengine/verl/blob/d33c85e2c779da1203e54275645b7b30f7fe3ce1/verl/workers/fsdp_workers.py#L869)
which hasn't been initialized in current one-step-off recipe. This will
through out runtime errors.

Similar code in the default fsdp worker:
https://github.com/volcengine/verl/blob/d33c85e2c779da1203e54275645b7b30f7fe3ce1/verl/workers/fsdp_workers.py#L563

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here:
[...](volcengine#3531)
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
masoudhashemi pushed a commit to masoudhashemi/verl that referenced this pull request Oct 19, 2025
…policy (volcengine#3531)

### What does this PR do?

This PR extends the one-step-off-policy recipe by adding SGLang as an
alternative rollout engine to vLLM, allowing flexible backend selection
and improving training efficiency.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here:
volcengine#3460
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

To validate this solution, we adopted the existing experimental
configuration from the recipe one-step-off-policy.

The evaluation demonstrates that the proposed SGLang rollout engine
integration achieves effective acceleration in one-step-off-policy
asynchronous training, providing users with enhanced rollout engine
options for diverse deployment scenarios.

**Experimental Results**

- **Machine Configuration**: 2 nodes with 16 H20 GPUs each
    - Generation: 4 GPUs
    - Training: 12 GPUs
- **Model**: Qwen2.5-Math-7B
- **Max Response Length**: 8,192 tokens
- **Algorithm**: DAPO
- **Rollout Engine**: vLLM, SGLang

| training mode | engine | step | gen | wait_prev_gen |
generate_sequences | old_log_prob | update_actor | total time |
acc/best@32/mean | acc/maj@32/mean |

|------------------------|----------------|------|-----|---------------|--------------------|--------------|--------------|---------------|------------------|-----------------|
| colocate sync | SGLang+FSDP2 | 452 | 131 | - | 125 | 54 | 199 | 12h25m
| 0.6560 | 0.4471 |
| one-step-overlap async | SGLang+FSDP2 | 406 | - | 12 | 305 | 58 | 245
| 11h12m (+11%) | 0.6303 | 0.4443 |

* colocate sync: step ≈ gen + old_log_prob + update_actor
* one-step-overlap async: step ≈ max(wait_prev_gen + generate_sequences,
old_log_prob + update_actor)

<img width="1218" height="777" alt="image"
src="https://github.com/user-attachments/assets/58734164-2534-492f-bf00-1e80faae0fe7"
/>

### API and Usage Example

**Configuration Example**
```bash
# Using SGLang engine
python3 -m recipe.one_step_off_policy.main_ppo \
    actor_rollout_ref.rollout.name=sglang \
    # ... other configuration parameters

# Using vLLM engine
python3 -m recipe.one_step_off_policy.main_ppo \
    actor_rollout_ref.rollout.name=vllm \
    # ... other configuration parameters
```

**Script Usage**
```bash
# Using SGLang engine
bash dapo_7b_math_fsdp2_sglang_4_12.sh
bash dapo_7b_math_fsdp2_sglang_colocate.sh

# Using vLLM engine
bash dapo_7b_math_fsdp2_4_12.sh
bash dapo_7b_math_fsdp2_colocate.sh
```

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

---------

Co-authored-by: wuxibin <[email protected]>
techkang pushed a commit to techkang/verl that referenced this pull request Oct 31, 2025
…olicy (volcengine#3556)

### What does this PR do?

Due to updated in the main package, the rollout worker calls
`self.model_config` during `generate_sequences`
(https://github.com/volcengine/verl/blob/d33c85e2c779da1203e54275645b7b30f7fe3ce1/verl/workers/fsdp_workers.py#L869)
which hasn't been initialized in current one-step-off recipe. This will
through out runtime errors.

Similar code in the default fsdp worker:
https://github.com/volcengine/verl/blob/d33c85e2c779da1203e54275645b7b30f7fe3ce1/verl/workers/fsdp_workers.py#L563

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here:
[...](volcengine#3531)
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
mtian8 pushed a commit to mtian8/verl that referenced this pull request Nov 1, 2025
…olicy (volcengine#3556)

### What does this PR do?

Due to updated in the main package, the rollout worker calls
`self.model_config` during `generate_sequences`
(https://github.com/volcengine/verl/blob/d33c85e2c779da1203e54275645b7b30f7fe3ce1/verl/workers/fsdp_workers.py#L869)
which hasn't been initialized in current one-step-off recipe. This will
through out runtime errors.

Similar code in the default fsdp worker:
https://github.com/volcengine/verl/blob/d33c85e2c779da1203e54275645b7b30f7fe3ce1/verl/workers/fsdp_workers.py#L563

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here:
[...](volcengine#3531)
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
mtian8 pushed a commit to mtian8/verl that referenced this pull request Nov 1, 2025
…policy (volcengine#3531)

### What does this PR do?

This PR extends the one-step-off-policy recipe by adding SGLang as an
alternative rollout engine to vLLM, allowing flexible backend selection
and improving training efficiency.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here:
volcengine#3460
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

To validate this solution, we adopted the existing experimental
configuration from the recipe one-step-off-policy.

The evaluation demonstrates that the proposed SGLang rollout engine
integration achieves effective acceleration in one-step-off-policy
asynchronous training, providing users with enhanced rollout engine
options for diverse deployment scenarios.

**Experimental Results**

- **Machine Configuration**: 2 nodes with 16 H20 GPUs each
    - Generation: 4 GPUs
    - Training: 12 GPUs
- **Model**: Qwen2.5-Math-7B
- **Max Response Length**: 8,192 tokens
- **Algorithm**: DAPO
- **Rollout Engine**: vLLM, SGLang

| training mode | engine | step | gen | wait_prev_gen |
generate_sequences | old_log_prob | update_actor | total time |
acc/best@32/mean | acc/maj@32/mean |

|------------------------|----------------|------|-----|---------------|--------------------|--------------|--------------|---------------|------------------|-----------------|
| colocate sync | SGLang+FSDP2 | 452 | 131 | - | 125 | 54 | 199 | 12h25m
| 0.6560 | 0.4471 |
| one-step-overlap async | SGLang+FSDP2 | 406 | - | 12 | 305 | 58 | 245
| 11h12m (+11%) | 0.6303 | 0.4443 |

* colocate sync: step ≈ gen + old_log_prob + update_actor
* one-step-overlap async: step ≈ max(wait_prev_gen + generate_sequences,
old_log_prob + update_actor)

<img width="1218" height="777" alt="image"
src="https://github.com/user-attachments/assets/58734164-2534-492f-bf00-1e80faae0fe7"
/>

### API and Usage Example

**Configuration Example**
```bash
# Using SGLang engine
python3 -m recipe.one_step_off_policy.main_ppo \
    actor_rollout_ref.rollout.name=sglang \
    # ... other configuration parameters

# Using vLLM engine
python3 -m recipe.one_step_off_policy.main_ppo \
    actor_rollout_ref.rollout.name=vllm \
    # ... other configuration parameters
```

**Script Usage**
```bash
# Using SGLang engine
bash dapo_7b_math_fsdp2_sglang_4_12.sh
bash dapo_7b_math_fsdp2_sglang_colocate.sh

# Using vLLM engine
bash dapo_7b_math_fsdp2_4_12.sh
bash dapo_7b_math_fsdp2_colocate.sh
```

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

---------

Co-authored-by: wuxibin <[email protected]>
wangboxiong320 pushed a commit to wangboxiong320/verl that referenced this pull request Nov 1, 2025
…policy (volcengine#3531)

### What does this PR do?

This PR extends the one-step-off-policy recipe by adding SGLang as an
alternative rollout engine to vLLM, allowing flexible backend selection
and improving training efficiency.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here:
volcengine#3460
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

To validate this solution, we adopted the existing experimental
configuration from the recipe one-step-off-policy.

The evaluation demonstrates that the proposed SGLang rollout engine
integration achieves effective acceleration in one-step-off-policy
asynchronous training, providing users with enhanced rollout engine
options for diverse deployment scenarios.

**Experimental Results**

- **Machine Configuration**: 2 nodes with 16 H20 GPUs each
    - Generation: 4 GPUs
    - Training: 12 GPUs
- **Model**: Qwen2.5-Math-7B
- **Max Response Length**: 8,192 tokens
- **Algorithm**: DAPO
- **Rollout Engine**: vLLM, SGLang

| training mode | engine | step | gen | wait_prev_gen |
generate_sequences | old_log_prob | update_actor | total time |
acc/best@32/mean | acc/maj@32/mean |

|------------------------|----------------|------|-----|---------------|--------------------|--------------|--------------|---------------|------------------|-----------------|
| colocate sync | SGLang+FSDP2 | 452 | 131 | - | 125 | 54 | 199 | 12h25m
| 0.6560 | 0.4471 |
| one-step-overlap async | SGLang+FSDP2 | 406 | - | 12 | 305 | 58 | 245
| 11h12m (+11%) | 0.6303 | 0.4443 |

* colocate sync: step ≈ gen + old_log_prob + update_actor
* one-step-overlap async: step ≈ max(wait_prev_gen + generate_sequences,
old_log_prob + update_actor)

<img width="1218" height="777" alt="image"
src="https://github.com/user-attachments/assets/58734164-2534-492f-bf00-1e80faae0fe7"
/>

### API and Usage Example

**Configuration Example**
```bash
# Using SGLang engine
python3 -m recipe.one_step_off_policy.main_ppo \
    actor_rollout_ref.rollout.name=sglang \
    # ... other configuration parameters

# Using vLLM engine
python3 -m recipe.one_step_off_policy.main_ppo \
    actor_rollout_ref.rollout.name=vllm \
    # ... other configuration parameters
```

**Script Usage**
```bash
# Using SGLang engine
bash dapo_7b_math_fsdp2_sglang_4_12.sh
bash dapo_7b_math_fsdp2_sglang_colocate.sh

# Using vLLM engine
bash dapo_7b_math_fsdp2_4_12.sh
bash dapo_7b_math_fsdp2_colocate.sh
```

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

---------

Co-authored-by: wuxibin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants