Skip to content

Conversation

@le-big-mac
Copy link
Contributor

Fixes #1807

Currently, when using GRPOTrainer, a KeyError occurs in the loss computation due to a missing [mode] key. This PR adds this mode key back.

@danielhanchen
Copy link
Contributor

Thanks! Oh this is for the nightly release of TRL right?
Would it be possible to first check whether self._metrics["completion_length"] exists, and or self._metrics["train"] / "eval" exists to allow older versions of TRL to work? Also it looks like I might need to update the notebook logging

@le-big-mac
Copy link
Contributor Author

I've added a check for "train" as a key in self._metrics, from looking at the history of TRL's GRPOTrainer this should be enough to distinguish between the versions. It used to be self._metrics = defaultdict(list), and now it's self._metrics = {"train: defaultdict(list), "eval": defaultdict(list)}, and I can't see evidence of a "train" key ever being used in the old version.

@danielhanchen
Copy link
Contributor

Very good work thanks!

@danielhanchen danielhanchen merged commit 2c0f501 into unslothai:main Feb 25, 2025
zhzLuke96 pushed a commit to zhzLuke96/unsloth that referenced this pull request Apr 1, 2025
* fix keyerror in GRPOTrainer

* check for train in _metrics
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

KeyError: 'completion_length' in GRPO trainer

2 participants