fix: Remove chat template setting from non-SFT trainer scripts #4437
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
Resolves #4404
This PR removes the
SIMPLE_CHAT_TEMPLATEimport and chat template setting from all non-SFT trainer scripts. Setting chat templates only makes sense for SFT (supervised fine-tuning/instruction tuning), not for preference optimization or reward-based training methods.Changes
Removed chat template setting from:
examples/scripts/online_dpo.py- Online DPO (preference optimization)examples/scripts/orpo.py- ORPO (preference optimization)examples/scripts/cpo.py- CPO (preference optimization)examples/scripts/nash_md.py- Nash-MD (multi-objective RL)examples/scripts/xpo.py- XPO (preference optimization)examples/scripts/ppo/ppo.py- PPO (reward-based training)examples/scripts/ppo/ppo_tldr.py- PPO TLDR (reward-based training)For each script:
from trl.trainer.utils import SIMPLE_CHAT_TEMPLATEimporttokenizer.chat_template = SIMPLE_CHAT_TEMPLATERationale
Chat templates are used to format conversational data for instruction tuning (SFT). They don't apply to:
Setting chat templates in these contexts was unnecessary and could cause confusion about the expected data format.