Skip to content

Conversation

@PopSoda2002
Copy link
Contributor

@PopSoda2002 PopSoda2002 commented Aug 20, 2025

What does this PR do?

What does this PR do?
There are some unused padding talked in this issue:
zhaochenyang20/Awesome-ML-SYS-Tutorial#193

  • There are just 5 key fields which need to return back after rollout(example in agent_loop):
batch = TensorDict(
{
    "prompts": prompt_ids,  # [bsz, prompt_length]
    "responses": response_ids,  # [bsz, response_length]
    "response_mask": response_mask,  # [bsz, response_length]
    "input_ids": input_ids,  # [bsz, prompt_length + response_length]
    "attention_mask": attention_mask,  # [bsz, prompt_length + response_length]
    "position_ids": position_ids, 
    # position_ids: [bsz, 3, prompt_length + response_length] or [bsz, prompt_length + response_length]
},
batch_size=len(inputs),
)
  • Remove some unused variable like prompt_loss_mask
  • Make response_position_id all zero tensor
  • Copy class to avoid constructing a new class

Test

over_sample = 0.1
wandb

image

No issue.

over_sample = 0.0
wandb

image

As expected too

Checklist Before Submitting

Important

Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.

Copy link
Collaborator

@zhaochenyang20 zhaochenyang20 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to go. pass the CI plz

@PopSoda2002 PopSoda2002 marked this pull request as ready for review August 21, 2025 02:50
@wuxibin89 wuxibin89 merged commit 0e15c9b into volcengine:main Aug 21, 2025
59 of 61 checks passed
yellowbee686 pushed a commit to yellowbee686/verl that referenced this pull request Aug 23, 2025
### What does this PR do?

What does this PR do?
There are some unused padding talked in this issue: 
zhaochenyang20/Awesome-ML-SYS-Tutorial#193
- There are just 5 key fields which need to return back after
rollout(example in `agent_loop`):
```python
batch = TensorDict(
{
    "prompts": prompt_ids,  # [bsz, prompt_length]
    "responses": response_ids,  # [bsz, response_length]
    "response_mask": response_mask,  # [bsz, response_length]
    "input_ids": input_ids,  # [bsz, prompt_length + response_length]
    "attention_mask": attention_mask,  # [bsz, prompt_length + response_length]
    "position_ids": position_ids, 
    # position_ids: [bsz, 3, prompt_length + response_length] or [bsz, prompt_length + response_length]
},
batch_size=len(inputs),
)
``` 
- Remove some unused variable like `prompt_loss_mask`
- Make `response_position_id` all zero tensor
- Copy class to avoid constructing a new class

### Test

`over_sample = 0.1` 

[wandb](https://wandb.ai/popsoda-university-of-washington/multi-turn-grpo-qwen2.5-3b-sglang/runs/1p87zi7v?nw=nwuserpopsoda)

<img width="1555" height="680" alt="image"
src="https://github.com/user-attachments/assets/b837acab-824d-42c6-ad3d-8342d06397d1"
/>

No issue.

`over_sample = 0.0` 

[wandb](https://wandb.ai/popsoda-university-of-washington/multi-turn-grpo-qwen2.5-3b-sglang/runs/xloii5wm?nw=nwuserpopsoda)

<img width="1532" height="683" alt="image"
src="https://github.com/user-attachments/assets/fd69be47-8182-4461-86d0-86063e6f8e1a"
/>

As expected too

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
cczitong123 pushed a commit to cczitong123/verl that referenced this pull request Sep 5, 2025
### What does this PR do?

What does this PR do?
There are some unused padding talked in this issue: 
zhaochenyang20/Awesome-ML-SYS-Tutorial#193
- There are just 5 key fields which need to return back after
rollout(example in `agent_loop`):
```python
batch = TensorDict(
{
    "prompts": prompt_ids,  # [bsz, prompt_length]
    "responses": response_ids,  # [bsz, response_length]
    "response_mask": response_mask,  # [bsz, response_length]
    "input_ids": input_ids,  # [bsz, prompt_length + response_length]
    "attention_mask": attention_mask,  # [bsz, prompt_length + response_length]
    "position_ids": position_ids, 
    # position_ids: [bsz, 3, prompt_length + response_length] or [bsz, prompt_length + response_length]
},
batch_size=len(inputs),
)
``` 
- Remove some unused variable like `prompt_loss_mask`
- Make `response_position_id` all zero tensor
- Copy class to avoid constructing a new class

### Test

`over_sample = 0.1` 

[wandb](https://wandb.ai/popsoda-university-of-washington/multi-turn-grpo-qwen2.5-3b-sglang/runs/1p87zi7v?nw=nwuserpopsoda)

<img width="1555" height="680" alt="image"
src="https://github.com/user-attachments/assets/b837acab-824d-42c6-ad3d-8342d06397d1"
/>

No issue.

`over_sample = 0.0` 

[wandb](https://wandb.ai/popsoda-university-of-washington/multi-turn-grpo-qwen2.5-3b-sglang/runs/xloii5wm?nw=nwuserpopsoda)

<img width="1532" height="683" alt="image"
src="https://github.com/user-attachments/assets/fd69be47-8182-4461-86d0-86063e6f8e1a"
/>

As expected too

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
DDVD233 pushed a commit to DDVD233/mirl that referenced this pull request Sep 5, 2025
### What does this PR do?

What does this PR do?
There are some unused padding talked in this issue: 
zhaochenyang20/Awesome-ML-SYS-Tutorial#193
- There are just 5 key fields which need to return back after
rollout(example in `agent_loop`):
```python
batch = TensorDict(
{
    "prompts": prompt_ids,  # [bsz, prompt_length]
    "responses": response_ids,  # [bsz, response_length]
    "response_mask": response_mask,  # [bsz, response_length]
    "input_ids": input_ids,  # [bsz, prompt_length + response_length]
    "attention_mask": attention_mask,  # [bsz, prompt_length + response_length]
    "position_ids": position_ids, 
    # position_ids: [bsz, 3, prompt_length + response_length] or [bsz, prompt_length + response_length]
},
batch_size=len(inputs),
)
``` 
- Remove some unused variable like `prompt_loss_mask`
- Make `response_position_id` all zero tensor
- Copy class to avoid constructing a new class

### Test

`over_sample = 0.1` 

[wandb](https://wandb.ai/popsoda-university-of-washington/multi-turn-grpo-qwen2.5-3b-sglang/runs/1p87zi7v?nw=nwuserpopsoda)

<img width="1555" height="680" alt="image"
src="https://github.com/user-attachments/assets/b837acab-824d-42c6-ad3d-8342d06397d1"
/>

No issue.

`over_sample = 0.0` 

[wandb](https://wandb.ai/popsoda-university-of-washington/multi-turn-grpo-qwen2.5-3b-sglang/runs/xloii5wm?nw=nwuserpopsoda)

<img width="1532" height="683" alt="image"
src="https://github.com/user-attachments/assets/fd69be47-8182-4461-86d0-86063e6f8e1a"
/>

As expected too

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
WncFht pushed a commit to WncFht/verl that referenced this pull request Oct 10, 2025
### What does this PR do?

What does this PR do?
There are some unused padding talked in this issue: 
zhaochenyang20/Awesome-ML-SYS-Tutorial#193
- There are just 5 key fields which need to return back after
rollout(example in `agent_loop`):
```python
batch = TensorDict(
{
    "prompts": prompt_ids,  # [bsz, prompt_length]
    "responses": response_ids,  # [bsz, response_length]
    "response_mask": response_mask,  # [bsz, response_length]
    "input_ids": input_ids,  # [bsz, prompt_length + response_length]
    "attention_mask": attention_mask,  # [bsz, prompt_length + response_length]
    "position_ids": position_ids, 
    # position_ids: [bsz, 3, prompt_length + response_length] or [bsz, prompt_length + response_length]
},
batch_size=len(inputs),
)
``` 
- Remove some unused variable like `prompt_loss_mask`
- Make `response_position_id` all zero tensor
- Copy class to avoid constructing a new class

### Test

`over_sample = 0.1` 

[wandb](https://wandb.ai/popsoda-university-of-washington/multi-turn-grpo-qwen2.5-3b-sglang/runs/1p87zi7v?nw=nwuserpopsoda)

<img width="1555" height="680" alt="image"
src="https://github.com/user-attachments/assets/b837acab-824d-42c6-ad3d-8342d06397d1"
/>

No issue.

`over_sample = 0.0` 

[wandb](https://wandb.ai/popsoda-university-of-washington/multi-turn-grpo-qwen2.5-3b-sglang/runs/xloii5wm?nw=nwuserpopsoda)

<img width="1532" height="683" alt="image"
src="https://github.com/user-attachments/assets/fd69be47-8182-4461-86d0-86063e6f8e1a"
/>

As expected too

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
techkang pushed a commit to techkang/verl that referenced this pull request Oct 31, 2025
### What does this PR do?

What does this PR do?
There are some unused padding talked in this issue: 
zhaochenyang20/Awesome-ML-SYS-Tutorial#193
- There are just 5 key fields which need to return back after
rollout(example in `agent_loop`):
```python
batch = TensorDict(
{
    "prompts": prompt_ids,  # [bsz, prompt_length]
    "responses": response_ids,  # [bsz, response_length]
    "response_mask": response_mask,  # [bsz, response_length]
    "input_ids": input_ids,  # [bsz, prompt_length + response_length]
    "attention_mask": attention_mask,  # [bsz, prompt_length + response_length]
    "position_ids": position_ids, 
    # position_ids: [bsz, 3, prompt_length + response_length] or [bsz, prompt_length + response_length]
},
batch_size=len(inputs),
)
``` 
- Remove some unused variable like `prompt_loss_mask`
- Make `response_position_id` all zero tensor
- Copy class to avoid constructing a new class

### Test

`over_sample = 0.1` 

[wandb](https://wandb.ai/popsoda-university-of-washington/multi-turn-grpo-qwen2.5-3b-sglang/runs/1p87zi7v?nw=nwuserpopsoda)

<img width="1555" height="680" alt="image"
src="https://github.com/user-attachments/assets/b837acab-824d-42c6-ad3d-8342d06397d1"
/>

No issue.

`over_sample = 0.0` 

[wandb](https://wandb.ai/popsoda-university-of-washington/multi-turn-grpo-qwen2.5-3b-sglang/runs/xloii5wm?nw=nwuserpopsoda)

<img width="1532" height="683" alt="image"
src="https://github.com/user-attachments/assets/fd69be47-8182-4461-86d0-86063e6f8e1a"
/>

As expected too

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants