fix: math reward fn #26

yueyang130 · 2025-02-25T06:21:47Z

Currently, the reward fn intends to give +0.1 reward to encourage output answer in the format of \boxed{}.

However, in CustomRewardManager class, it uses sequences = torch.cat((valid_prompt_ids, valid_response_ids)) as the input of self.compute_score. In this way, var seqences includes the system prompt "Please reason step by step, and put your final answer within \boxed{}.".

Note that the system prompt already includes \boxed{}. Thus the reward fn gives +0.1 even the model acutally output without \boxed{}. An example is below:

Fix this bug by passing the response without system prompt into self.compute_score

yueyang130 · 2025-02-25T07:00:04Z

Except training, the bug also effects test score, causing ~0.1 test score higher than the acutal one at the intial stage of training.

hiyouga

Nice catch!

…ga#26) Refs: PAAS-1985

- Update the CSS - Change the UI: - use tabs - use file upload Refs: ART-95

fix math reward fn

d0af1c9

hiyouga self-requested a review February 25, 2025 10:29

hiyouga approved these changes Feb 25, 2025

View reviewed changes

hiyouga merged commit 5b6b628 into hiyouga:main Feb 25, 2025

hiyouga pushed a commit that referenced this pull request Oct 4, 2025

[reward] fix math reward fn (#26)

bfeb4d5

malhajar17 pushed a commit to malhajar17/EasyR1_ex that referenced this pull request Oct 21, 2025

Add workflow to Build and Push FCS experiment RAG Docker image (hiyou…

958dbbd

…ga#26) Refs: PAAS-1985

malhajar17 pushed a commit to malhajar17/EasyR1_ex that referenced this pull request Oct 21, 2025

RAG demo app - update UI (hiyouga#26)

662a7de

- Update the CSS - Change the UI: - use tabs - use file upload Refs: ART-95

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: math reward fn #26

fix: math reward fn #26

yueyang130 commented Feb 25, 2025

Uh oh!

yueyang130 commented Feb 25, 2025

Uh oh!

hiyouga left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fix: math reward fn #26

fix: math reward fn #26

Conversation

yueyang130 commented Feb 25, 2025

Uh oh!

yueyang130 commented Feb 25, 2025

Uh oh!

hiyouga left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants