Add in-loop gen tasks #5

davidheineman · 2025-03-10T16:21:56Z

This PR adds the BPB over the correct continuation in GSM, Minerva, MBPP and HumanEval.

The BPB task configs use the setup implemented in https://github.com/allenai/oe-eval-internal/pull/374, and the implementation is the same as allenai/OLMo#777

Evaluation Setup

The "gold continuation" appears as follows:

Question: Janet’s ducks lay 16 eggs per day. She eats three for breakfast every morning and bakes muffins for her friends every day with four. She sells the remainder at the farmers' market daily for $2 per fresh duck egg. How much in dollars does she make every day at the farmers' market?
Answer: Janet sells 16 - 3 - 4 = 9 duck eggs a day. She makes 9 * 2 = $18 every day at the farmer’s market. So the answer is 18.

Sanity check

I tested this with the in-loop trainer:

torchrun --nproc-per-node=4 src/scripts/train/OLMo2-1B.py train test_run .

undfined

lgtm

Nit: Do the json configs need to be unformatted? If not then a lot easier to read/scan down the road.

I guess one other nit: We're kind of mixing and matching prefixing these with codex_ here and in oe-eval-internal. Would be nice to be consistent.

davidheineman · 2025-03-10T23:21:36Z

Thanks for taking a look! Updated mbpp -> codex_mbpp.

Agree on your first nit: but I won't format the .json in this PR, since the other .json files are not formatted.

add new gen tasks

85f0e51

davidheineman requested review from dirkgr and undfined March 10, 2025 16:21

davidheineman added 2 commits March 10, 2025 09:27

run linting

04ca118

update actions cache

c415e2c

undfined approved these changes Mar 10, 2025

View reviewed changes

update mbpp to mbpp_codex

c85c90f

davidheineman merged commit 2f01ec8 into main Mar 10, 2025
7 checks passed

davidheineman deleted the new-gen-tasks branch March 10, 2025 23:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add in-loop gen tasks #5

Add in-loop gen tasks #5

Uh oh!

davidheineman commented Mar 10, 2025 •

edited

Loading

Uh oh!

undfined left a comment

Uh oh!

davidheineman commented Mar 10, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add in-loop gen tasks #5

Add in-loop gen tasks #5

Uh oh!

Conversation

davidheineman commented Mar 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Evaluation Setup

Sanity check

Uh oh!

undfined left a comment

Choose a reason for hiding this comment

Uh oh!

davidheineman commented Mar 10, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

davidheineman commented Mar 10, 2025 •

edited

Loading