Skip to content

Conversation

@davidheineman
Copy link
Member

Mayee noticed that the in-loop evals after this commit were 4x slower as a result of b635bb9

The culprit is logic in when we don't use fast_mc. Instead of running evals once, it runs evals for each token in the continuation (and then throws out all but 1 pass).

choice_ids = cont_tokens # tensor: (toks, 1)

for choice_idx, choice_token in enumerate(choice_ids):
    ...

Testing this fix, the numbers are the same, but the fixed version is much faster:

# before fix:
INFO    Finished downstream evals in 56.1 seconds. Metrics:
    minerva_math_precalculus_gold_bpb_0shot (BPB)=7.557
    minerva_math_precalculus_gold_bpb_0shot (BPB v2)=7.539

# after fix:
INFO    Finished downstream evals in 14.8 seconds. Metrics:
    minerva_math_precalculus_gold_bpb_0shot (BPB)=7.557
    minerva_math_precalculus_gold_bpb_0shot (BPB v2)=7.539

To be clear, the eval results are the same before/after the fix, just that the evals are faster.

@davidheineman davidheineman merged commit fa83ee4 into main May 27, 2025
8 checks passed
@davidheineman davidheineman deleted the bpb-rc-fix branch May 27, 2025 15:41
@davidheineman
Copy link
Member Author

I ran in-loop evals over the full "recommended" set before/after the fix to verify the numbers are the same. Here are the logs for both:

logs_after_fix.txt

logs_before_fix.txt

epwalsh pushed a commit to allenai/OLMo-core that referenced this pull request May 27, 2025
Incorporate this one-line PR:
allenai/OLMo-in-loop-evals#12

TL;DR: #281 made in-loop RC and
BPB slower, this fixes that bug. **The RC/BPB in-loop evals run with
`ai2-olmo-eval~=0.8.0` are correct evals, just slower.**
TianhuaTao pushed a commit to allenai/OLMo-core that referenced this pull request May 28, 2025
Incorporate this one-line PR:
allenai/OLMo-in-loop-evals#12

TL;DR: #281 made in-loop RC and
BPB slower, this fixes that bug. **The RC/BPB in-loop evals run with
`ai2-olmo-eval~=0.8.0` are correct evals, just slower.**
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants