[Feature] Vectorized GRPO with Group-Wise Helpers

### Feature request

## Prior art

This work borrows the core idea from PR 

> [#3555](https://github.com/volcengine/verl/pull/3555) (vectorized RLOO)


### Motivation

1. Add AdvantageEstimator.GRPO_VECTORIZED
2. Add groupwise vector helpers
3. Add CI test


### Your contribution

Group-Wise helpers

To support GRPO (and future per-group estimators) cleanly and efficiently:
	•	segment_sum(values, group_idx, G): computes per-group sums using torch.bincount(..., minlength=G) (CPU/GPU friendly, no external deps).
	•	segment_count(group_idx, G): per-group counts via bincount.
	•	gather_by_group(stat, group_idx): maps a per-group tensor back to per-sample via stat.index_select(0, group_idx).

This achieves:
	•	No Python loops over groups or samples.
	•	O(B) time, minimal kernel launches (just a few reductions + gathers).
	•	Drop-in for other estimators in future PRs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature] Vectorized GRPO with Group-Wise Helpers #3634

Feature request

Prior art

Motivation

Your contribution

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature] Vectorized GRPO with Group-Wise Helpers #3634

Description

Feature request

Prior art

Motivation

Your contribution

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions