Support completion bootstrap for VLM in GRPO/RLOO #4452

SolarWindRider · 2025-11-05T07:34:08Z

1. Fix a little bug in GRPOTrainer.

When inputs are conversational, bootstrap is a list not str, causing a str concat error.

I add a bootstrap type check to fix the bug while maintain the original program

… into trpo

albertvillanova

I think you mixed 2 contributions in this PR:

Fix bug in GRPOTrainer
Add experimental RTPO

I would suggest you split your contributions in 2 PRs, and keep here only the bug fix.

trl/trainer/grpo_trainer.py

Co-authored-by: Albert Villanova del Moral <[email protected]>

SolarWindRider · 2025-11-05T10:35:09Z

Thank you for your suggestion

SolarWindRider · 2025-11-05T11:21:41Z

I think you mixed 2 contributions in this PR:
* Fix bug in GRPOTrainer

* Add experimental RTPO
I would suggest you split your contributions in 2 PRs, and keep here only the bug fix.

The problem is I don't know how to split it into 2 PR2 properly.

This reverts commit 4590205.

trl/trainer/grpo_trainer.py

qgallouedec

LGTM thanks!

HuggingFaceDocBuilderDev · 2025-11-06T21:46:53Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

SolarWindRider · 2025-11-07T06:30:21Z

Thank you guys! @qgallouedec @albertvillanova
♪(･ω･)ﾉ

SolarWindRider and others added 3 commits November 5, 2025 15:25

fix a little bug in completions with bootstrap

4a643b0

feature: rtpo trainer

4590205

Merge branch 'main' into grpo-bug-fix

28703db

SolarWindRider mentioned this pull request Nov 5, 2025

[GRPOTrainer bug fix] a little bug in completions with bootstrap #4442

Closed

SolarWindRider and others added 2 commits November 5, 2025 15:45

Merge branch 'huggingface:main' into trpo

7e9e121

Merge branch 'grpo-bug-fix' of https://github.com/SolarWindRider/trl-my…

16cf457

… into trpo

albertvillanova requested changes Nov 5, 2025

View reviewed changes

trl/trainer/grpo_trainer.py Outdated Show resolved Hide resolved

trl/trainer/grpo_trainer.py Outdated Show resolved Hide resolved

SolarWindRider and others added 2 commits November 5, 2025 18:33

Update trl/trainer/grpo_trainer.py

9f01d00

Co-authored-by: Albert Villanova del Moral <[email protected]>

Update trl/trainer/grpo_trainer.py

7b3cfd6

Co-authored-by: Albert Villanova del Moral <[email protected]>

SolarWindRider and others added 3 commits November 5, 2025 19:21

Merge branch 'main' into grpo-bug-fix

f58b1d7

Revert "feature: rtpo trainer"

2c39a19

This reverts commit 4590205.

Merge branch 'main' into grpo-bug-fix

5e455ea

SolarWindRider requested a review from albertvillanova November 6, 2025 07:25

qgallouedec reviewed Nov 6, 2025

View reviewed changes

trl/trainer/grpo_trainer.py Outdated Show resolved Hide resolved

qgallouedec and others added 3 commits November 6, 2025 14:40

Apply suggestion from @qgallouedec

7efd13b

Merge branch 'main' into grpo-bug-fix

84a5308

propagate to other trainers

89c2a33

qgallouedec changed the title ~~fix: fix a little bug in GRPOTrainer~~ Support completion bootstrap for VLM in GRPO/RLOO Nov 6, 2025

qgallouedec approved these changes Nov 6, 2025

View reviewed changes

qgallouedec enabled auto-merge (squash) November 6, 2025 22:54

qgallouedec approved these changes Nov 6, 2025

View reviewed changes

qgallouedec disabled auto-merge November 6, 2025 22:55

qgallouedec merged commit 43b6541 into huggingface:main Nov 6, 2025
11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support completion bootstrap for VLM in GRPO/RLOO #4452

Support completion bootstrap for VLM in GRPO/RLOO #4452

Uh oh!

SolarWindRider commented Nov 5, 2025

Uh oh!

albertvillanova left a comment

Uh oh!

Uh oh!

Uh oh!

SolarWindRider commented Nov 5, 2025 •

edited

Loading

Uh oh!

SolarWindRider commented Nov 5, 2025

Uh oh!

Uh oh!

qgallouedec left a comment

Uh oh!

HuggingFaceDocBuilderDev commented Nov 6, 2025

Uh oh!

Uh oh!

SolarWindRider commented Nov 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Support completion bootstrap for VLM in GRPO/RLOO #4452

Support completion bootstrap for VLM in GRPO/RLOO #4452

Uh oh!

Conversation

SolarWindRider commented Nov 5, 2025

1. Fix a little bug in GRPOTrainer.

Uh oh!

albertvillanova left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

SolarWindRider commented Nov 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SolarWindRider commented Nov 5, 2025

Uh oh!

Uh oh!

qgallouedec left a comment

Choose a reason for hiding this comment

Uh oh!

HuggingFaceDocBuilderDev commented Nov 6, 2025

Uh oh!

Uh oh!

SolarWindRider commented Nov 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

SolarWindRider commented Nov 5, 2025 •

edited

Loading