Commit 9530472
[rollout, tool] feat: export rollout rewards to total rewards (volcengine#3563)
### What does this PR do?
This PR exports rollout rewards including tool calling rewards and
interaction rewards to `compute_score` fn.
Currently, rollout reward_scores is calculated but not used in the final
`compute_score`.
https://github.com/volcengine/verl/blob/96e7071de1bc6a7c0e12f4999f97556da9310cc3/verl/workers/rollout/sglang_rollout/sglang_rollout.py#L1320-L1324
Fix volcengine#3525
### Checklist Before Starting
- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
- `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
- Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`
### Checklist Before Submitting
> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.
- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)1 parent 2b69c1a commit 9530472
File tree
5 files changed
+32
-10
lines changed- verl
- experimental/agent_loop
- workers/reward_manager
5 files changed
+32
-10
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
701 | 701 | | |
702 | 702 | | |
703 | 703 | | |
704 | | - | |
| 704 | + | |
| 705 | + | |
| 706 | + | |
705 | 707 | | |
706 | 708 | | |
707 | 709 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
68 | 68 | | |
69 | 69 | | |
70 | 70 | | |
| 71 | + | |
71 | 72 | | |
72 | 73 | | |
73 | 74 | | |
| |||
175 | 176 | | |
176 | 177 | | |
177 | 178 | | |
178 | | - | |
| 179 | + | |
179 | 180 | | |
180 | 181 | | |
181 | 182 | | |
| |||
268 | 269 | | |
269 | 270 | | |
270 | 271 | | |
271 | | - | |
| 272 | + | |
272 | 273 | | |
273 | 274 | | |
274 | 275 | | |
| |||
321 | 322 | | |
322 | 323 | | |
323 | 324 | | |
| 325 | + | |
| 326 | + | |
| 327 | + | |
324 | 328 | | |
325 | 329 | | |
326 | 330 | | |
| |||
403 | 407 | | |
404 | 408 | | |
405 | 409 | | |
406 | | - | |
| 410 | + | |
| 411 | + | |
| 412 | + | |
407 | 413 | | |
408 | 414 | | |
409 | 415 | | |
| |||
413 | 419 | | |
414 | 420 | | |
415 | 421 | | |
416 | | - | |
| 422 | + | |
417 | 423 | | |
418 | 424 | | |
419 | | - | |
420 | | - | |
| 425 | + | |
| 426 | + | |
| 427 | + | |
| 428 | + | |
| 429 | + | |
| 430 | + | |
421 | 431 | | |
422 | 432 | | |
423 | 433 | | |
| |||
443 | 453 | | |
444 | 454 | | |
445 | 455 | | |
446 | | - | |
| 456 | + | |
447 | 457 | | |
448 | 458 | | |
449 | 459 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
61 | 61 | | |
62 | 62 | | |
63 | 63 | | |
64 | | - | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
65 | 69 | | |
66 | 70 | | |
67 | 71 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
92 | 92 | | |
93 | 93 | | |
94 | 94 | | |
95 | | - | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
96 | 100 | | |
97 | 101 | | |
98 | 102 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
82 | 82 | | |
83 | 83 | | |
84 | 84 | | |
| 85 | + | |
85 | 86 | | |
| 87 | + | |
86 | 88 | | |
87 | 89 | | |
88 | 90 | | |
| |||
0 commit comments