[sglang, recipe] feat: add SGLang as rollout engine for one-step-off-policy by KAMiPan · Pull Request #3531 · verl-project/verl

KAMiPan · 2025-09-19T04:26:42Z

What does this PR do?

This PR extends the one-step-off-policy recipe by adding SGLang as an alternative rollout engine to vLLM, allowing flexible backend selection and improving training efficiency.

Checklist Before Starting

Search for similar PRs. Paste at least one query link here: Add async vllm backend support for one-step-off-policy training in disaggregated architecture #3460
Format the PR title as [{modules}] {type}: {description} (This will be checked by the CI)
- {modules} include fsdp, megatron, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data
- If this PR involves multiple modules, separate them with , like [megatron, fsdp, doc]
- {type} is in feat, fix, refactor, chore, test
- If this PR breaks any API (CLI arguments, config, function signature, etc.), add [BREAKING] to the beginning of the title.
- Example: [BREAKING][fsdp, megatron] feat: dynamic batching

Test

To validate this solution, we adopted the existing experimental configuration from the recipe one-step-off-policy.

The evaluation demonstrates that the proposed SGLang rollout engine integration achieves effective acceleration in one-step-off-policy asynchronous training, providing users with enhanced rollout engine options for diverse deployment scenarios.

Experimental Results

Machine Configuration: 2 nodes with 16 H20 GPUs each
- Generation: 4 GPUs
- Training: 12 GPUs
Model: Qwen2.5-Math-7B
Max Response Length: 8,192 tokens
Algorithm: DAPO
Rollout Engine: vLLM, SGLang

training mode	engine	step	gen	wait_prev_gen	generate_sequences	old_log_prob	update_actor	total time	acc/best@32/mean	acc/maj@32/mean
colocate sync	SGLang+FSDP2	452	131	-	125	54	199	12h25m	0.6560	0.4471
one-step-overlap async	SGLang+FSDP2	406	-	12	305	58	245	11h12m (+11%)	0.6303	0.4443

colocate sync: step ≈ gen + old_log_prob + update_actor
one-step-overlap async: step ≈ max(wait_prev_gen + generate_sequences, old_log_prob + update_actor)

API and Usage Example

Configuration Example

# Using SGLang engine
python3 -m recipe.one_step_off_policy.main_ppo \
    actor_rollout_ref.rollout.name=sglang \
    # ... other configuration parameters

# Using vLLM engine
python3 -m recipe.one_step_off_policy.main_ppo \
    actor_rollout_ref.rollout.name=vllm \
    # ... other configuration parameters

Script Usage

# Using SGLang engine
bash dapo_7b_math_fsdp2_sglang_4_12.sh
bash dapo_7b_math_fsdp2_sglang_colocate.sh

# Using vLLM engine
bash dapo_7b_math_fsdp2_4_12.sh
bash dapo_7b_math_fsdp2_colocate.sh

Checklist Before Submitting

Important

Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.

Read the Contribute Guide.
Apply pre-commit checks: pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always
Add / Update the documentation.
Add unit or end-to-end test(s) to the CI workflow to cover all the code. If not feasible, explain why: ...
Once your PR is ready for CI, send a message in the ci-request channel in the verl Slack workspace. (If not accessible, please try the Feishu group (飞书群).)

CLAassistant · 2025-09-19T04:26:49Z

All committers have signed the CLA.

gemini-code-assist

Code Review

This pull request adds SGLang as a rollout engine option for the one-step-off-policy recipe, providing an alternative to vLLM. The changes include new configuration files and experiment scripts for SGLang, and modifications to fsdp_workers.py to handle weight synchronization for the new engine. While the implementation is mostly correct, I've identified a significant performance issue in the weight synchronization logic for both SGLang and the existing vLLM implementation. My review includes a comment detailing this issue and a recommendation for refactoring to improve efficiency.

gemini-code-assist · 2025-09-19T04:29:28Z

            if self._is_rollout:
-                inference_model.load_weights([(key, tensor)])
+                if rollout_name == "vllm":
+                    inference_model.load_weights([(key, tensor)])
+                elif rollout_name == "sglang":
+                    loop.run_until_complete(self.update_weights(inference_model, [(key, tensor)]))


This block for updating weights is inside a for loop that iterates over each weight tensor. Calling loop.run_until_complete for sglang and inference_model.load_weights for vllm on each tensor individually is inefficient.

For sglang, run_until_complete has significant overhead from setting up and tearing down the event loop. For vllm, it results in many small load_weights calls.

To improve performance, you should refactor this to batch the weight updates. Collect all (key, tensor) pairs in a list within the loop, and then make a single call to load_weights (for vllm) or run_until_complete (for sglang) with the entire list of tensors after the loop finishes. This will significantly reduce overhead.

…olicy (#3556) ### What does this PR do? Due to updated in the main package, the rollout worker calls `self.model_config` during `generate_sequences` (https://github.com/volcengine/verl/blob/d33c85e2c779da1203e54275645b7b30f7fe3ce1/verl/workers/fsdp_workers.py#L869) which hasn't been initialized in current one-step-off recipe. This will through out runtime errors. Similar code in the default fsdp worker: https://github.com/volcengine/verl/blob/d33c85e2c779da1203e54275645b7b30f7fe3ce1/verl/workers/fsdp_workers.py#L563 ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: [...](#3531) - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

KAMiPan · 2025-09-22T03:52:38Z

@vermouth1992 @wuxibin89 @yushengsu-thu

…olicy (#3556) ### What does this PR do? Due to updated in the main package, the rollout worker calls `self.model_config` during `generate_sequences` (https://github.com/volcengine/verl/blob/38edf0862a6f0c5612f2560b0eb765ebe0133c5a/verl/workers/fsdp_workers.py#L869) which hasn't been initialized in current one-step-off recipe. This will through out runtime errors. Similar code in the default fsdp worker: https://github.com/volcengine/verl/blob/38edf0862a6f0c5612f2560b0eb765ebe0133c5a/verl/workers/fsdp_workers.py#L563 ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: [...](verl-project/verl#3531) - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

wuxibin89 · 2025-10-14T02:06:11Z

 import torch
 import torch.distributed
 from omegaconf import DictConfig, OmegaConf
+from sglang.srt.weight_sync.utils import update_weights as sgl_update_weights


Please move this import into local scope.

…policy

wuxibin89 · 2025-10-14T02:15:36Z

+        if rollout_name == "vllm":
+            from .vllm_sharding_manager import VLLMShardingManager
+
+            rollout_sharding_manager = VLLMShardingManager(


All ShardingManagers have been deprecated in #3285 and will be removed in release v0.7

…olicy (verl-project#3556) ### What does this PR do? Due to updated in the main package, the rollout worker calls `self.model_config` during `generate_sequences` (https://github.com/volcengine/verl/blob/d33c85e2c779da1203e54275645b7b30f7fe3ce1/verl/workers/fsdp_workers.py#L869) which hasn't been initialized in current one-step-off recipe. This will through out runtime errors. Similar code in the default fsdp worker: https://github.com/volcengine/verl/blob/d33c85e2c779da1203e54275645b7b30f7fe3ce1/verl/workers/fsdp_workers.py#L563 ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: [...](verl-project#3531) - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

…policy (verl-project#3531) ### What does this PR do? This PR extends the one-step-off-policy recipe by adding SGLang as an alternative rollout engine to vLLM, allowing flexible backend selection and improving training efficiency. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: verl-project#3460 - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test To validate this solution, we adopted the existing experimental configuration from the recipe one-step-off-policy. The evaluation demonstrates that the proposed SGLang rollout engine integration achieves effective acceleration in one-step-off-policy asynchronous training, providing users with enhanced rollout engine options for diverse deployment scenarios. **Experimental Results** - **Machine Configuration**: 2 nodes with 16 H20 GPUs each - Generation: 4 GPUs - Training: 12 GPUs - **Model**: Qwen2.5-Math-7B - **Max Response Length**: 8,192 tokens - **Algorithm**: DAPO - **Rollout Engine**: vLLM, SGLang | training mode | engine | step | gen | wait_prev_gen | generate_sequences | old_log_prob | update_actor | total time | acc/best@32/mean | acc/maj@32/mean | |------------------------|----------------|------|-----|---------------|--------------------|--------------|--------------|---------------|------------------|-----------------| | colocate sync | SGLang+FSDP2 | 452 | 131 | - | 125 | 54 | 199 | 12h25m | 0.6560 | 0.4471 | | one-step-overlap async | SGLang+FSDP2 | 406 | - | 12 | 305 | 58 | 245 | 11h12m (+11%) | 0.6303 | 0.4443 | * colocate sync: step ≈ gen + old_log_prob + update_actor * one-step-overlap async: step ≈ max(wait_prev_gen + generate_sequences, old_log_prob + update_actor) <img width="1218" height="777" alt="image" src="https://github.com/user-attachments/assets/58734164-2534-492f-bf00-1e80faae0fe7" /> ### API and Usage Example **Configuration Example** ```bash # Using SGLang engine python3 -m recipe.one_step_off_policy.main_ppo \ actor_rollout_ref.rollout.name=sglang \ # ... other configuration parameters # Using vLLM engine python3 -m recipe.one_step_off_policy.main_ppo \ actor_rollout_ref.rollout.name=vllm \ # ... other configuration parameters ``` **Script Usage** ```bash # Using SGLang engine bash dapo_7b_math_fsdp2_sglang_4_12.sh bash dapo_7b_math_fsdp2_sglang_colocate.sh # Using vLLM engine bash dapo_7b_math_fsdp2_4_12.sh bash dapo_7b_math_fsdp2_colocate.sh ``` ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) --------- Co-authored-by: wuxibin <wuxibin@bytedance.com>

…olicy (verl-project#3556) ### What does this PR do? Due to updated in the main package, the rollout worker calls `self.model_config` during `generate_sequences` (https://github.com/volcengine/verl/blob/d33c85e2c779da1203e54275645b7b30f7fe3ce1/verl/workers/fsdp_workers.py#L869) which hasn't been initialized in current one-step-off recipe. This will through out runtime errors. Similar code in the default fsdp worker: https://github.com/volcengine/verl/blob/d33c85e2c779da1203e54275645b7b30f7fe3ce1/verl/workers/fsdp_workers.py#L563 ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: [...](verl-project#3531) - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

…policy (verl-project#3531) ### What does this PR do? This PR extends the one-step-off-policy recipe by adding SGLang as an alternative rollout engine to vLLM, allowing flexible backend selection and improving training efficiency. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: verl-project#3460 - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test To validate this solution, we adopted the existing experimental configuration from the recipe one-step-off-policy. The evaluation demonstrates that the proposed SGLang rollout engine integration achieves effective acceleration in one-step-off-policy asynchronous training, providing users with enhanced rollout engine options for diverse deployment scenarios. **Experimental Results** - **Machine Configuration**: 2 nodes with 16 H20 GPUs each - Generation: 4 GPUs - Training: 12 GPUs - **Model**: Qwen2.5-Math-7B - **Max Response Length**: 8,192 tokens - **Algorithm**: DAPO - **Rollout Engine**: vLLM, SGLang | training mode | engine | step | gen | wait_prev_gen | generate_sequences | old_log_prob | update_actor | total time | acc/best@32/mean | acc/maj@32/mean | |------------------------|----------------|------|-----|---------------|--------------------|--------------|--------------|---------------|------------------|-----------------| | colocate sync | SGLang+FSDP2 | 452 | 131 | - | 125 | 54 | 199 | 12h25m | 0.6560 | 0.4471 | | one-step-overlap async | SGLang+FSDP2 | 406 | - | 12 | 305 | 58 | 245 | 11h12m (+11%) | 0.6303 | 0.4443 | * colocate sync: step ≈ gen + old_log_prob + update_actor * one-step-overlap async: step ≈ max(wait_prev_gen + generate_sequences, old_log_prob + update_actor) <img width="1218" height="777" alt="image" src="https://github.com/user-attachments/assets/58734164-2534-492f-bf00-1e80faae0fe7" /> ### API and Usage Example **Configuration Example** ```bash # Using SGLang engine python3 -m recipe.one_step_off_policy.main_ppo \ actor_rollout_ref.rollout.name=sglang \ # ... other configuration parameters # Using vLLM engine python3 -m recipe.one_step_off_policy.main_ppo \ actor_rollout_ref.rollout.name=vllm \ # ... other configuration parameters ``` **Script Usage** ```bash # Using SGLang engine bash dapo_7b_math_fsdp2_sglang_4_12.sh bash dapo_7b_math_fsdp2_sglang_colocate.sh # Using vLLM engine bash dapo_7b_math_fsdp2_4_12.sh bash dapo_7b_math_fsdp2_colocate.sh ``` ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) --------- Co-authored-by: wuxibin <wuxibin@bytedance.com>

…olicy (verl-project#3556) ### What does this PR do? Due to updated in the main package, the rollout worker calls `self.model_config` during `generate_sequences` (https://github.com/volcengine/verl/blob/b68b9f7bc2e85545f85ac7266e9ae02dba6b282f/verl/workers/fsdp_workers.py#L869) which hasn't been initialized in current one-step-off recipe. This will through out runtime errors. Similar code in the default fsdp worker: https://github.com/volcengine/verl/blob/b68b9f7bc2e85545f85ac7266e9ae02dba6b282f/verl/workers/fsdp_workers.py#L563 ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: [...](verl-project#3531) - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

…policy (verl-project#3531) ### What does this PR do? This PR extends the one-step-off-policy recipe by adding SGLang as an alternative rollout engine to vLLM, allowing flexible backend selection and improving training efficiency. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: verl-project#3460 - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test To validate this solution, we adopted the existing experimental configuration from the recipe one-step-off-policy. The evaluation demonstrates that the proposed SGLang rollout engine integration achieves effective acceleration in one-step-off-policy asynchronous training, providing users with enhanced rollout engine options for diverse deployment scenarios. **Experimental Results** - **Machine Configuration**: 2 nodes with 16 H20 GPUs each - Generation: 4 GPUs - Training: 12 GPUs - **Model**: Qwen2.5-Math-7B - **Max Response Length**: 8,192 tokens - **Algorithm**: DAPO - **Rollout Engine**: vLLM, SGLang | training mode | engine | step | gen | wait_prev_gen | generate_sequences | old_log_prob | update_actor | total time | acc/best@32/mean | acc/maj@32/mean | |------------------------|----------------|------|-----|---------------|--------------------|--------------|--------------|---------------|------------------|-----------------| | colocate sync | SGLang+FSDP2 | 452 | 131 | - | 125 | 54 | 199 | 12h25m | 0.6560 | 0.4471 | | one-step-overlap async | SGLang+FSDP2 | 406 | - | 12 | 305 | 58 | 245 | 11h12m (+11%) | 0.6303 | 0.4443 | * colocate sync: step ≈ gen + old_log_prob + update_actor * one-step-overlap async: step ≈ max(wait_prev_gen + generate_sequences, old_log_prob + update_actor) <img width="1218" height="777" alt="image" src="https://github.com/user-attachments/assets/58734164-2534-492f-bf00-1e80faae0fe7" /> ### API and Usage Example **Configuration Example** ```bash # Using SGLang engine python3 -m recipe.one_step_off_policy.main_ppo \ actor_rollout_ref.rollout.name=sglang \ # ... other configuration parameters # Using vLLM engine python3 -m recipe.one_step_off_policy.main_ppo \ actor_rollout_ref.rollout.name=vllm \ # ... other configuration parameters ``` **Script Usage** ```bash # Using SGLang engine bash dapo_7b_math_fsdp2_sglang_4_12.sh bash dapo_7b_math_fsdp2_sglang_colocate.sh # Using vLLM engine bash dapo_7b_math_fsdp2_4_12.sh bash dapo_7b_math_fsdp2_colocate.sh ``` ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) --------- Co-authored-by: wuxibin <wuxibin@bytedance.com>

…olicy (verl-project#3556) ### What does this PR do? Due to updated in the main package, the rollout worker calls `self.model_config` during `generate_sequences` (https://github.com/volcengine/verl/blob/d33c85e2c779da1203e54275645b7b30f7fe3ce1/verl/workers/fsdp_workers.py#L869) which hasn't been initialized in current one-step-off recipe. This will through out runtime errors. Similar code in the default fsdp worker: https://github.com/volcengine/verl/blob/d33c85e2c779da1203e54275645b7b30f7fe3ce1/verl/workers/fsdp_workers.py#L563 ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: [...](verl-project#3531) - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

…policy (verl-project#3531) ### What does this PR do? This PR extends the one-step-off-policy recipe by adding SGLang as an alternative rollout engine to vLLM, allowing flexible backend selection and improving training efficiency. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: verl-project#3460 - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test To validate this solution, we adopted the existing experimental configuration from the recipe one-step-off-policy. The evaluation demonstrates that the proposed SGLang rollout engine integration achieves effective acceleration in one-step-off-policy asynchronous training, providing users with enhanced rollout engine options for diverse deployment scenarios. **Experimental Results** - **Machine Configuration**: 2 nodes with 16 H20 GPUs each - Generation: 4 GPUs - Training: 12 GPUs - **Model**: Qwen2.5-Math-7B - **Max Response Length**: 8,192 tokens - **Algorithm**: DAPO - **Rollout Engine**: vLLM, SGLang | training mode | engine | step | gen | wait_prev_gen | generate_sequences | old_log_prob | update_actor | total time | acc/best@32/mean | acc/maj@32/mean | |------------------------|----------------|------|-----|---------------|--------------------|--------------|--------------|---------------|------------------|-----------------| | colocate sync | SGLang+FSDP2 | 452 | 131 | - | 125 | 54 | 199 | 12h25m | 0.6560 | 0.4471 | | one-step-overlap async | SGLang+FSDP2 | 406 | - | 12 | 305 | 58 | 245 | 11h12m (+11%) | 0.6303 | 0.4443 | * colocate sync: step ≈ gen + old_log_prob + update_actor * one-step-overlap async: step ≈ max(wait_prev_gen + generate_sequences, old_log_prob + update_actor) <img width="1218" height="777" alt="image" src="https://github.com/user-attachments/assets/58734164-2534-492f-bf00-1e80faae0fe7" /> ### API and Usage Example **Configuration Example** ```bash # Using SGLang engine python3 -m recipe.one_step_off_policy.main_ppo \ actor_rollout_ref.rollout.name=sglang \ # ... other configuration parameters # Using vLLM engine python3 -m recipe.one_step_off_policy.main_ppo \ actor_rollout_ref.rollout.name=vllm \ # ... other configuration parameters ``` **Script Usage** ```bash # Using SGLang engine bash dapo_7b_math_fsdp2_sglang_4_12.sh bash dapo_7b_math_fsdp2_sglang_colocate.sh # Using vLLM engine bash dapo_7b_math_fsdp2_4_12.sh bash dapo_7b_math_fsdp2_colocate.sh ``` ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) --------- Co-authored-by: wuxibin <wuxibin@bytedance.com>

…olicy (#3556) ### What does this PR do? Due to updated in the main package, the rollout worker calls `self.model_config` during `generate_sequences` (https://github.com/volcengine/verl/blob/d33c85e2c779da1203e54275645b7b30f7fe3ce1/verl/workers/fsdp_workers.py#L869) which hasn't been initialized in current one-step-off recipe. This will through out runtime errors. Similar code in the default fsdp worker: https://github.com/volcengine/verl/blob/d33c85e2c779da1203e54275645b7b30f7fe3ce1/verl/workers/fsdp_workers.py#L563 ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: [...](verl-project/verl#3531) - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

…olicy (verl-project#3556) ### What does this PR do? Due to updated in the main package, the rollout worker calls `self.model_config` during `generate_sequences` (https://github.com/volcengine/verl/blob/c97cbaa7dc69d20605b8e0287dbd24fdbe8b70cb/verl/workers/fsdp_workers.py#L869) which hasn't been initialized in current one-step-off recipe. This will through out runtime errors. Similar code in the default fsdp worker: https://github.com/volcengine/verl/blob/c97cbaa7dc69d20605b8e0287dbd24fdbe8b70cb/verl/workers/fsdp_workers.py#L563 ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: [...](verl-project#3531) - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

…policy (verl-project#3531) ### What does this PR do? This PR extends the one-step-off-policy recipe by adding SGLang as an alternative rollout engine to vLLM, allowing flexible backend selection and improving training efficiency. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: verl-project#3460 - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test To validate this solution, we adopted the existing experimental configuration from the recipe one-step-off-policy. The evaluation demonstrates that the proposed SGLang rollout engine integration achieves effective acceleration in one-step-off-policy asynchronous training, providing users with enhanced rollout engine options for diverse deployment scenarios. **Experimental Results** - **Machine Configuration**: 2 nodes with 16 H20 GPUs each - Generation: 4 GPUs - Training: 12 GPUs - **Model**: Qwen2.5-Math-7B - **Max Response Length**: 8,192 tokens - **Algorithm**: DAPO - **Rollout Engine**: vLLM, SGLang | training mode | engine | step | gen | wait_prev_gen | generate_sequences | old_log_prob | update_actor | total time | acc/best@32/mean | acc/maj@32/mean | |------------------------|----------------|------|-----|---------------|--------------------|--------------|--------------|---------------|------------------|-----------------| | colocate sync | SGLang+FSDP2 | 452 | 131 | - | 125 | 54 | 199 | 12h25m | 0.6560 | 0.4471 | | one-step-overlap async | SGLang+FSDP2 | 406 | - | 12 | 305 | 58 | 245 | 11h12m (+11%) | 0.6303 | 0.4443 | * colocate sync: step ≈ gen + old_log_prob + update_actor * one-step-overlap async: step ≈ max(wait_prev_gen + generate_sequences, old_log_prob + update_actor) <img width="1218" height="777" alt="image" src="https://github.com/user-attachments/assets/58734164-2534-492f-bf00-1e80faae0fe7" /> ### API and Usage Example **Configuration Example** ```bash # Using SGLang engine python3 -m recipe.one_step_off_policy.main_ppo \ actor_rollout_ref.rollout.name=sglang \ # ... other configuration parameters # Using vLLM engine python3 -m recipe.one_step_off_policy.main_ppo \ actor_rollout_ref.rollout.name=vllm \ # ... other configuration parameters ``` **Script Usage** ```bash # Using SGLang engine bash dapo_7b_math_fsdp2_sglang_4_12.sh bash dapo_7b_math_fsdp2_sglang_colocate.sh # Using vLLM engine bash dapo_7b_math_fsdp2_4_12.sh bash dapo_7b_math_fsdp2_colocate.sh ``` ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) --------- Co-authored-by: wuxibin <wuxibin@bytedance.com>

…olicy (verl-project#3556) ### What does this PR do? Due to updated in the main package, the rollout worker calls `self.model_config` during `generate_sequences` (https://github.com/volcengine/verl/blob/180beb7dbda6cba8dfb9b9a3e2654056342769e2/verl/workers/fsdp_workers.py#L869) which hasn't been initialized in current one-step-off recipe. This will through out runtime errors. Similar code in the default fsdp worker: https://github.com/volcengine/verl/blob/180beb7dbda6cba8dfb9b9a3e2654056342769e2/verl/workers/fsdp_workers.py#L563 ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: [...](verl-project#3531) - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

…policy (verl-project#3531) ### What does this PR do? This PR extends the one-step-off-policy recipe by adding SGLang as an alternative rollout engine to vLLM, allowing flexible backend selection and improving training efficiency. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: verl-project#3460 - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test To validate this solution, we adopted the existing experimental configuration from the recipe one-step-off-policy. The evaluation demonstrates that the proposed SGLang rollout engine integration achieves effective acceleration in one-step-off-policy asynchronous training, providing users with enhanced rollout engine options for diverse deployment scenarios. **Experimental Results** - **Machine Configuration**: 2 nodes with 16 H20 GPUs each - Generation: 4 GPUs - Training: 12 GPUs - **Model**: Qwen2.5-Math-7B - **Max Response Length**: 8,192 tokens - **Algorithm**: DAPO - **Rollout Engine**: vLLM, SGLang | training mode | engine | step | gen | wait_prev_gen | generate_sequences | old_log_prob | update_actor | total time | acc/best@32/mean | acc/maj@32/mean | |------------------------|----------------|------|-----|---------------|--------------------|--------------|--------------|---------------|------------------|-----------------| | colocate sync | SGLang+FSDP2 | 452 | 131 | - | 125 | 54 | 199 | 12h25m | 0.6560 | 0.4471 | | one-step-overlap async | SGLang+FSDP2 | 406 | - | 12 | 305 | 58 | 245 | 11h12m (+11%) | 0.6303 | 0.4443 | * colocate sync: step ≈ gen + old_log_prob + update_actor * one-step-overlap async: step ≈ max(wait_prev_gen + generate_sequences, old_log_prob + update_actor) <img width="1218" height="777" alt="image" src="https://github.com/user-attachments/assets/58734164-2534-492f-bf00-1e80faae0fe7" /> ### API and Usage Example **Configuration Example** ```bash # Using SGLang engine python3 -m recipe.one_step_off_policy.main_ppo \ actor_rollout_ref.rollout.name=sglang \ # ... other configuration parameters # Using vLLM engine python3 -m recipe.one_step_off_policy.main_ppo \ actor_rollout_ref.rollout.name=vllm \ # ... other configuration parameters ``` **Script Usage** ```bash # Using SGLang engine bash dapo_7b_math_fsdp2_sglang_4_12.sh bash dapo_7b_math_fsdp2_sglang_colocate.sh # Using vLLM engine bash dapo_7b_math_fsdp2_4_12.sh bash dapo_7b_math_fsdp2_colocate.sh ``` ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) --------- Co-authored-by: wuxibin <wuxibin@bytedance.com>

gemini-code-assist Bot reviewed Sep 19, 2025

View reviewed changes

KAMiPan force-pushed the main branch from 0ca89f5 to ac8f3ad Compare September 19, 2025 06:32

zlwang-cs mentioned this pull request Sep 22, 2025

[recipe] fix: init self.model_config in fsdp worker of one-step-off policy #3556

Merged

7 tasks

KAMiPan marked this pull request as draft September 25, 2025 09:16

KAMiPan marked this pull request as ready for review September 25, 2025 09:16

wuxibin89 reviewed Oct 14, 2025

View reviewed changes

KAMiPan and others added 2 commits October 14, 2025 10:10

[sglang, recipe] feat: add SGLang as rollout engine for one-step-off-…

98b0072

…policy

move sglang import to local scope

ffa770c

wuxibin89 force-pushed the main branch from 02c5325 to ffa770c Compare October 14, 2025 02:13

wuxibin89 reviewed Oct 14, 2025

View reviewed changes

wuxibin89 approved these changes Oct 14, 2025

View reviewed changes

wuxibin89 merged commit 3abcc09 into verl-project:main Oct 14, 2025
7 checks passed

moehanabi mentioned this pull request Nov 17, 2025

[WIP][sglang, recipe] feat: Add SGLang as rollout engine for one-step-off-policy with megatron #4169

Closed

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[sglang, recipe] feat: add SGLang as rollout engine for one-step-off-policy#3531

[sglang, recipe] feat: add SGLang as rollout engine for one-step-off-policy#3531
wuxibin89 merged 2 commits into
verl-project:mainfrom
KAMiPan:main

KAMiPan commented Sep 19, 2025 •

edited

Loading

Uh oh!

CLAassistant commented Sep 19, 2025 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Sep 19, 2025

Uh oh!

KAMiPan commented Sep 22, 2025

Uh oh!

wuxibin89 Oct 14, 2025

Uh oh!

wuxibin89 Oct 14, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

KAMiPan commented Sep 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Checklist Before Starting

Test

API and Usage Example

Checklist Before Submitting

Uh oh!

CLAassistant commented Sep 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Sep 19, 2025

Choose a reason for hiding this comment

Uh oh!

KAMiPan commented Sep 22, 2025

Uh oh!

wuxibin89 Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

wuxibin89 Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

KAMiPan commented Sep 19, 2025 •

edited

Loading

CLAassistant commented Sep 19, 2025 •

edited

Loading