-
Notifications
You must be signed in to change notification settings - Fork 2.4k
[sglang] fix: remove unused padding in SGLang rollout #3138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
wuxibin89
merged 1 commit into
volcengine:main
from
PopSoda2002:feat/fix_over_sampling_padding
Aug 21, 2025
Merged
[sglang] fix: remove unused padding in SGLang rollout #3138
wuxibin89
merged 1 commit into
volcengine:main
from
PopSoda2002:feat/fix_over_sampling_padding
Aug 21, 2025
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
zhaochenyang20
approved these changes
Aug 20, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good to go. pass the CI plz
wuxibin89
approved these changes
Aug 21, 2025
yellowbee686
pushed a commit
to yellowbee686/verl
that referenced
this pull request
Aug 23, 2025
### What does this PR do? What does this PR do? There are some unused padding talked in this issue: zhaochenyang20/Awesome-ML-SYS-Tutorial#193 - There are just 5 key fields which need to return back after rollout(example in `agent_loop`): ```python batch = TensorDict( { "prompts": prompt_ids, # [bsz, prompt_length] "responses": response_ids, # [bsz, response_length] "response_mask": response_mask, # [bsz, response_length] "input_ids": input_ids, # [bsz, prompt_length + response_length] "attention_mask": attention_mask, # [bsz, prompt_length + response_length] "position_ids": position_ids, # position_ids: [bsz, 3, prompt_length + response_length] or [bsz, prompt_length + response_length] }, batch_size=len(inputs), ) ``` - Remove some unused variable like `prompt_loss_mask` - Make `response_position_id` all zero tensor - Copy class to avoid constructing a new class ### Test `over_sample = 0.1` [wandb](https://wandb.ai/popsoda-university-of-washington/multi-turn-grpo-qwen2.5-3b-sglang/runs/1p87zi7v?nw=nwuserpopsoda) <img width="1555" height="680" alt="image" src="https://github.com/user-attachments/assets/b837acab-824d-42c6-ad3d-8342d06397d1" /> No issue. `over_sample = 0.0` [wandb](https://wandb.ai/popsoda-university-of-washington/multi-turn-grpo-qwen2.5-3b-sglang/runs/xloii5wm?nw=nwuserpopsoda) <img width="1532" height="683" alt="image" src="https://github.com/user-attachments/assets/fd69be47-8182-4461-86d0-86063e6f8e1a" /> As expected too ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always`
cczitong123
pushed a commit
to cczitong123/verl
that referenced
this pull request
Sep 5, 2025
### What does this PR do? What does this PR do? There are some unused padding talked in this issue: zhaochenyang20/Awesome-ML-SYS-Tutorial#193 - There are just 5 key fields which need to return back after rollout(example in `agent_loop`): ```python batch = TensorDict( { "prompts": prompt_ids, # [bsz, prompt_length] "responses": response_ids, # [bsz, response_length] "response_mask": response_mask, # [bsz, response_length] "input_ids": input_ids, # [bsz, prompt_length + response_length] "attention_mask": attention_mask, # [bsz, prompt_length + response_length] "position_ids": position_ids, # position_ids: [bsz, 3, prompt_length + response_length] or [bsz, prompt_length + response_length] }, batch_size=len(inputs), ) ``` - Remove some unused variable like `prompt_loss_mask` - Make `response_position_id` all zero tensor - Copy class to avoid constructing a new class ### Test `over_sample = 0.1` [wandb](https://wandb.ai/popsoda-university-of-washington/multi-turn-grpo-qwen2.5-3b-sglang/runs/1p87zi7v?nw=nwuserpopsoda) <img width="1555" height="680" alt="image" src="https://github.com/user-attachments/assets/b837acab-824d-42c6-ad3d-8342d06397d1" /> No issue. `over_sample = 0.0` [wandb](https://wandb.ai/popsoda-university-of-washington/multi-turn-grpo-qwen2.5-3b-sglang/runs/xloii5wm?nw=nwuserpopsoda) <img width="1532" height="683" alt="image" src="https://github.com/user-attachments/assets/fd69be47-8182-4461-86d0-86063e6f8e1a" /> As expected too ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always`
DDVD233
pushed a commit
to DDVD233/mirl
that referenced
this pull request
Sep 5, 2025
### What does this PR do? What does this PR do? There are some unused padding talked in this issue: zhaochenyang20/Awesome-ML-SYS-Tutorial#193 - There are just 5 key fields which need to return back after rollout(example in `agent_loop`): ```python batch = TensorDict( { "prompts": prompt_ids, # [bsz, prompt_length] "responses": response_ids, # [bsz, response_length] "response_mask": response_mask, # [bsz, response_length] "input_ids": input_ids, # [bsz, prompt_length + response_length] "attention_mask": attention_mask, # [bsz, prompt_length + response_length] "position_ids": position_ids, # position_ids: [bsz, 3, prompt_length + response_length] or [bsz, prompt_length + response_length] }, batch_size=len(inputs), ) ``` - Remove some unused variable like `prompt_loss_mask` - Make `response_position_id` all zero tensor - Copy class to avoid constructing a new class ### Test `over_sample = 0.1` [wandb](https://wandb.ai/popsoda-university-of-washington/multi-turn-grpo-qwen2.5-3b-sglang/runs/1p87zi7v?nw=nwuserpopsoda) <img width="1555" height="680" alt="image" src="https://github.com/user-attachments/assets/b837acab-824d-42c6-ad3d-8342d06397d1" /> No issue. `over_sample = 0.0` [wandb](https://wandb.ai/popsoda-university-of-washington/multi-turn-grpo-qwen2.5-3b-sglang/runs/xloii5wm?nw=nwuserpopsoda) <img width="1532" height="683" alt="image" src="https://github.com/user-attachments/assets/fd69be47-8182-4461-86d0-86063e6f8e1a" /> As expected too ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always`
WncFht
pushed a commit
to WncFht/verl
that referenced
this pull request
Oct 10, 2025
### What does this PR do? What does this PR do? There are some unused padding talked in this issue: zhaochenyang20/Awesome-ML-SYS-Tutorial#193 - There are just 5 key fields which need to return back after rollout(example in `agent_loop`): ```python batch = TensorDict( { "prompts": prompt_ids, # [bsz, prompt_length] "responses": response_ids, # [bsz, response_length] "response_mask": response_mask, # [bsz, response_length] "input_ids": input_ids, # [bsz, prompt_length + response_length] "attention_mask": attention_mask, # [bsz, prompt_length + response_length] "position_ids": position_ids, # position_ids: [bsz, 3, prompt_length + response_length] or [bsz, prompt_length + response_length] }, batch_size=len(inputs), ) ``` - Remove some unused variable like `prompt_loss_mask` - Make `response_position_id` all zero tensor - Copy class to avoid constructing a new class ### Test `over_sample = 0.1` [wandb](https://wandb.ai/popsoda-university-of-washington/multi-turn-grpo-qwen2.5-3b-sglang/runs/1p87zi7v?nw=nwuserpopsoda) <img width="1555" height="680" alt="image" src="https://github.com/user-attachments/assets/b837acab-824d-42c6-ad3d-8342d06397d1" /> No issue. `over_sample = 0.0` [wandb](https://wandb.ai/popsoda-university-of-washington/multi-turn-grpo-qwen2.5-3b-sglang/runs/xloii5wm?nw=nwuserpopsoda) <img width="1532" height="683" alt="image" src="https://github.com/user-attachments/assets/fd69be47-8182-4461-86d0-86063e6f8e1a" /> As expected too ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always`
techkang
pushed a commit
to techkang/verl
that referenced
this pull request
Oct 31, 2025
### What does this PR do? What does this PR do? There are some unused padding talked in this issue: zhaochenyang20/Awesome-ML-SYS-Tutorial#193 - There are just 5 key fields which need to return back after rollout(example in `agent_loop`): ```python batch = TensorDict( { "prompts": prompt_ids, # [bsz, prompt_length] "responses": response_ids, # [bsz, response_length] "response_mask": response_mask, # [bsz, response_length] "input_ids": input_ids, # [bsz, prompt_length + response_length] "attention_mask": attention_mask, # [bsz, prompt_length + response_length] "position_ids": position_ids, # position_ids: [bsz, 3, prompt_length + response_length] or [bsz, prompt_length + response_length] }, batch_size=len(inputs), ) ``` - Remove some unused variable like `prompt_loss_mask` - Make `response_position_id` all zero tensor - Copy class to avoid constructing a new class ### Test `over_sample = 0.1` [wandb](https://wandb.ai/popsoda-university-of-washington/multi-turn-grpo-qwen2.5-3b-sglang/runs/1p87zi7v?nw=nwuserpopsoda) <img width="1555" height="680" alt="image" src="https://github.com/user-attachments/assets/b837acab-824d-42c6-ad3d-8342d06397d1" /> No issue. `over_sample = 0.0` [wandb](https://wandb.ai/popsoda-university-of-washington/multi-turn-grpo-qwen2.5-3b-sglang/runs/xloii5wm?nw=nwuserpopsoda) <img width="1532" height="683" alt="image" src="https://github.com/user-attachments/assets/fd69be47-8182-4461-86d0-86063e6f8e1a" /> As expected too ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always`
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What does this PR do?
What does this PR do?
There are some unused padding talked in this issue:
zhaochenyang20/Awesome-ML-SYS-Tutorial#193
agent_loop):prompt_loss_maskresponse_position_idall zero tensorTest
over_sample = 0.1wandb
No issue.
over_sample = 0.0wandb
As expected too
Checklist Before Submitting
Important
Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.
pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always