[algo, perf] feat: Vectorize RLOO Advantage Estimator - 20x Speedup #3555

EduardDurech · 2025-09-21T18:55:06Z

Vectorize RLOO advantage estimator
130ms -> 6ms
Similar method can be done for other advantage estimators, I just don't have time

Implements

$$r_i - \frac{\sum_{j\ne i} r_j}{G-1} = \frac{(G-1)r_i - \sum_{j\ne i} r_j}{G-1} = \frac{G r_i - \sum_{j\in g} r_j}{G-1}$$

gemini-code-assist

Code Review

This pull request vectorizes the RLOO advantage estimator, leading to a significant performance improvement. The previous implementation using Python loops and dictionaries has been replaced with an efficient, vectorized approach using torch.bincount. This not only boosts performance but also enhances correctness by correctly handling groups with a single response, for which the advantage is now properly set to zero. The new code is more concise and idiomatic for tensor computations. Overall, this is an excellent and well-executed optimization.

vermouth1992 · 2025-09-22T03:07:47Z

Could you help add a CI to ensure that they are consistent?

vermouth1992 · 2025-09-22T03:08:47Z

We may want to introduce a new adv estimator called rloo_vector instead of directly over-write the original one

EduardDurech · 2025-09-22T19:27:51Z

@vermouth1992 I don't know your guys' standard workflow YAML but there is a PyTest if you want to add to ci

$ pytest tests/trainer/ppo/test_core_algos_on_cpu.py::test_rloo_and_vectorized_equivalence -q -s
> [RLOO] seed=0 groups=5 shape=torch.Size([64, 128]) mask_tokens=4147 adv_max_diff=1.907e-06 ret_max_diff=1.907e-06
> [RLOO] seed=1 groups=8 shape=torch.Size([128, 256]) mask_tokens=16364 adv_max_diff=1.907e-06 ret_max_diff=1.907e-06
> [RLOO] seed=2 groups=10 shape=torch.Size([512, 512]) mask_tokens=130968 adv_max_diff=3.815e-06 ret_max_diff=3.815e-06

…olcengine#3555) Vectorize RLOO advantage estimator 130ms -> 6ms Similar method can be done for other advantage estimators, I just don't have time Implements $$r_i - \frac{\sum_{j\ne i} r_j}{G-1} = \frac{(G-1)r_i - \sum_{j\ne i} r_j}{G-1} = \frac{G r_i - \sum_{j\in g} r_j}{G-1}$$ <img width="2199" height="628" alt="image" src="https://github.com/user-attachments/assets/339e5bd2-6949-4460-a297-34268ffc1764" />

EduardDurech requested review from PeterSH6, eric-haibin-lin, tongyx361 and vermouth1992 as code owners September 21, 2025 18:55

gemini-code-assist bot reviewed Sep 21, 2025

View reviewed changes

EduardDurech added 2 commits September 22, 2025 20:53

Vectorized RLOO

dcff47f

RLOO Parity Test

e73112b

EduardDurech force-pushed the feat/vectorized_rloo branch from 59e6c0f to e73112b Compare September 22, 2025 19:26

EduardDurech requested a review from zhaochenyang20 as a code owner September 22, 2025 19:26

ci fix

27b0bcc

vermouth1992 approved these changes Sep 24, 2025

View reviewed changes

vermouth1992 merged commit 26a734e into volcengine:main Sep 24, 2025
70 of 75 checks passed

CedricHwong mentioned this pull request Sep 26, 2025

[Feature] Vectorized GRPO with Group-Wise Helpers #3634

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[algo, perf] feat: Vectorize RLOO Advantage Estimator - 20x Speedup #3555

[algo, perf] feat: Vectorize RLOO Advantage Estimator - 20x Speedup #3555

Uh oh!

EduardDurech commented Sep 21, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

vermouth1992 commented Sep 22, 2025

Uh oh!

vermouth1992 commented Sep 22, 2025

Uh oh!

EduardDurech commented Sep 22, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[algo, perf] feat: Vectorize RLOO Advantage Estimator - 20x Speedup #3555

[algo, perf] feat: Vectorize RLOO Advantage Estimator - 20x Speedup #3555

Uh oh!

Conversation

EduardDurech commented Sep 21, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

vermouth1992 commented Sep 22, 2025

Uh oh!

vermouth1992 commented Sep 22, 2025

Uh oh!

EduardDurech commented Sep 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

EduardDurech commented Sep 22, 2025 •

edited

Loading