Skip to content

Conversation

@EduardDurech
Copy link
Collaborator

Vectorize RLOO advantage estimator
130ms -> 6ms
Similar method can be done for other advantage estimators, I just don't have time

Implements

$$r_i - \frac{\sum_{j\ne i} r_j}{G-1} = \frac{(G-1)r_i - \sum_{j\ne i} r_j}{G-1} = \frac{G r_i - \sum_{j\in g} r_j}{G-1}$$

image

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request vectorizes the RLOO advantage estimator, leading to a significant performance improvement. The previous implementation using Python loops and dictionaries has been replaced with an efficient, vectorized approach using torch.bincount. This not only boosts performance but also enhances correctness by correctly handling groups with a single response, for which the advantage is now properly set to zero. The new code is more concise and idiomatic for tensor computations. Overall, this is an excellent and well-executed optimization.

@vermouth1992
Copy link
Collaborator

Could you help add a CI to ensure that they are consistent?

@vermouth1992
Copy link
Collaborator

We may want to introduce a new adv estimator called rloo_vector instead of directly over-write the original one

@EduardDurech
Copy link
Collaborator Author

EduardDurech commented Sep 22, 2025

@vermouth1992 I don't know your guys' standard workflow YAML but there is a PyTest if you want to add to ci

$ pytest tests/trainer/ppo/test_core_algos_on_cpu.py::test_rloo_and_vectorized_equivalence -q -s
> [RLOO] seed=0 groups=5 shape=torch.Size([64, 128]) mask_tokens=4147 adv_max_diff=1.907e-06 ret_max_diff=1.907e-06
> [RLOO] seed=1 groups=8 shape=torch.Size([128, 256]) mask_tokens=16364 adv_max_diff=1.907e-06 ret_max_diff=1.907e-06
> [RLOO] seed=2 groups=10 shape=torch.Size([512, 512]) mask_tokens=130968 adv_max_diff=3.815e-06 ret_max_diff=3.815e-06

@vermouth1992 vermouth1992 merged commit 26a734e into volcengine:main Sep 24, 2025
70 of 75 checks passed
masoudhashemi pushed a commit to masoudhashemi/verl that referenced this pull request Oct 19, 2025
…olcengine#3555)

Vectorize RLOO advantage estimator
130ms -> 6ms
Similar method can be done for other advantage estimators, I just don't
have time

Implements

$$r_i - \frac{\sum_{j\ne i} r_j}{G-1} = \frac{(G-1)r_i - \sum_{j\ne i}
r_j}{G-1} = \frac{G r_i - \sum_{j\in g} r_j}{G-1}$$

<img width="2199" height="628" alt="image"
src="https://github.com/user-attachments/assets/339e5bd2-6949-4460-a297-34268ffc1764"
/>
techkang pushed a commit to techkang/verl that referenced this pull request Oct 31, 2025
…olcengine#3555)

Vectorize RLOO advantage estimator
130ms -> 6ms
Similar method can be done for other advantage estimators, I just don't
have time

Implements

$$r_i - \frac{\sum_{j\ne i} r_j}{G-1} = \frac{(G-1)r_i - \sum_{j\ne i}
r_j}{G-1} = \frac{G r_i - \sum_{j\in g} r_j}{G-1}$$

<img width="2199" height="628" alt="image"
src="https://github.com/user-attachments/assets/339e5bd2-6949-4460-a297-34268ffc1764"
/>
mtian8 pushed a commit to mtian8/verl that referenced this pull request Nov 1, 2025
…olcengine#3555)

Vectorize RLOO advantage estimator
130ms -> 6ms
Similar method can be done for other advantage estimators, I just don't
have time

Implements

$$r_i - \frac{\sum_{j\ne i} r_j}{G-1} = \frac{(G-1)r_i - \sum_{j\ne i}
r_j}{G-1} = \frac{G r_i - \sum_{j\in g} r_j}{G-1}$$

<img width="2199" height="628" alt="image"
src="https://github.com/user-attachments/assets/339e5bd2-6949-4460-a297-34268ffc1764"
/>
wangboxiong320 pushed a commit to wangboxiong320/verl that referenced this pull request Nov 1, 2025
…olcengine#3555)

Vectorize RLOO advantage estimator
130ms -> 6ms
Similar method can be done for other advantage estimators, I just don't
have time

Implements

$$r_i - \frac{\sum_{j\ne i} r_j}{G-1} = \frac{(G-1)r_i - \sum_{j\ne i}
r_j}{G-1} = \frac{G r_i - \sum_{j\in g} r_j}{G-1}$$

<img width="2199" height="628" alt="image"
src="https://github.com/user-attachments/assets/339e5bd2-6949-4460-a297-34268ffc1764"
/>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants