fix: Remove chat template setting from non-SFT trainer scripts #4437

behroozazarkhalili · 2025-11-03T03:39:21Z

Summary

Resolves #4404

This PR removes the SIMPLE_CHAT_TEMPLATE import and chat template setting from all non-SFT trainer scripts. Setting chat templates only makes sense for SFT (supervised fine-tuning/instruction tuning), not for preference optimization or reward-based training methods.

Changes

Removed chat template setting from:

examples/scripts/online_dpo.py - Online DPO (preference optimization)
examples/scripts/orpo.py - ORPO (preference optimization)
examples/scripts/cpo.py - CPO (preference optimization)
examples/scripts/nash_md.py - Nash-MD (multi-objective RL)
examples/scripts/xpo.py - XPO (preference optimization)
examples/scripts/ppo/ppo.py - PPO (reward-based training)
examples/scripts/ppo/ppo_tldr.py - PPO TLDR (reward-based training)

For each script:

Removed from trl.trainer.utils import SIMPLE_CHAT_TEMPLATE import
Removed conditional block that sets tokenizer.chat_template = SIMPLE_CHAT_TEMPLATE

Rationale

Chat templates are used to format conversational data for instruction tuning (SFT). They don't apply to:

Preference optimization (DPO, CPO, ORPO, XPO): These methods optimize based on preference pairs, not conversational format
Reward-based training (PPO, Nash-MD): These use reward signals, not chat formatting

Setting chat templates in these contexts was unnecessary and could cause confusion about the expected data format.

Resolves #4404 - Remove SIMPLE_CHAT_TEMPLATE import from 7 trainer scripts - Remove chat template setting for non-SFT trainers (DPO, CPO, ORPO, PPO, Nash-MD, XPO, Online DPO) - Chat templates only make sense for SFT (instruction tuning), not for preference optimization or reward-based training - Scripts modified: - examples/scripts/online_dpo.py - examples/scripts/orpo.py - examples/scripts/cpo.py - examples/scripts/nash_md.py - examples/scripts/xpo.py - examples/scripts/ppo/ppo.py - examples/scripts/ppo/ppo_tldr.py

HuggingFaceDocBuilderDev · 2025-11-03T03:42:08Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

qgallouedec

lgtm!

commit 4677cf2 Author: Harras Mansoor <[email protected]> Date: Wed Nov 5 04:06:13 2025 +0500 Removed Sentiment Tuning Examples (#4424) commit 7a9592b Author: Quentin Gallouédec <[email protected]> Date: Tue Nov 4 14:32:04 2025 -0700 🐍 Drop Python 3.9 (#4183) commit 7f15a7f Author: Harras Mansoor <[email protected]> Date: Wed Nov 5 02:06:31 2025 +0500 Removed outdated warning about batch contamination (#4423) commit 8b0a3ce Author: Albert Villanova del Moral <[email protected]> Date: Tue Nov 4 21:37:39 2025 +0100 Update tokenizer apply_chat_template with return_dict=True default (#4448) commit d9f9e2b Author: Pramodith Ballapuram <[email protected]> Date: Tue Nov 4 19:56:58 2025 +0000 Support casting to fp32 when word embeddings are tied to lm_head (#4446) commit 4e138ab Author: Sergio Paniego Blanco <[email protected]> Date: Tue Nov 4 15:15:23 2025 +0100 Upload notebook with T4 selected (#4449) commit 43253b2 Author: Pramodith Ballapuram <[email protected]> Date: Mon Nov 3 21:07:31 2025 +0000 Add On-Policy Distillation from thinking labs to paper index. (#4410) Co-authored-by: Quentin Gallouédec <[email protected]> commit 6f41b18 Author: Behrooz Azarkhalili <[email protected]> Date: Mon Nov 3 10:57:51 2025 -0800 fix: Remove chat template setting from non-SFT trainer scripts (#4437) Co-authored-by: Quentin Gallouédec <[email protected]> Co-authored-by: Quentin Gallouédec <[email protected]>

qgallouedec and others added 3 commits November 3, 2025 15:26

same for test

ee5827c

Merge branch 'main' into fix/remove-chat-template-non-sft

fb925a3

Merge branch 'main' into fix/remove-chat-template-non-sft

2a05f11

qgallouedec approved these changes Nov 3, 2025

View reviewed changes

Merge branch 'main' into fix/remove-chat-template-non-sft

e5fcb01

behroozazarkhalili merged commit 6f41b18 into main Nov 3, 2025
11 of 12 checks passed

behroozazarkhalili deleted the fix/remove-chat-template-non-sft branch November 3, 2025 18:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: Remove chat template setting from non-SFT trainer scripts #4437

fix: Remove chat template setting from non-SFT trainer scripts #4437

Uh oh!

behroozazarkhalili commented Nov 3, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Nov 3, 2025

Uh oh!

qgallouedec left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

fix: Remove chat template setting from non-SFT trainer scripts #4437

fix: Remove chat template setting from non-SFT trainer scripts #4437

Uh oh!

Conversation

behroozazarkhalili commented Nov 3, 2025

Summary

Changes

Rationale

Uh oh!

HuggingFaceDocBuilderDev commented Nov 3, 2025

Uh oh!

qgallouedec left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants