Skip to content

Conversation

@behroozazarkhalili
Copy link
Collaborator

Summary

Resolves #4404

This PR removes the SIMPLE_CHAT_TEMPLATE import and chat template setting from all non-SFT trainer scripts. Setting chat templates only makes sense for SFT (supervised fine-tuning/instruction tuning), not for preference optimization or reward-based training methods.

Changes

Removed chat template setting from:

  • examples/scripts/online_dpo.py - Online DPO (preference optimization)
  • examples/scripts/orpo.py - ORPO (preference optimization)
  • examples/scripts/cpo.py - CPO (preference optimization)
  • examples/scripts/nash_md.py - Nash-MD (multi-objective RL)
  • examples/scripts/xpo.py - XPO (preference optimization)
  • examples/scripts/ppo/ppo.py - PPO (reward-based training)
  • examples/scripts/ppo/ppo_tldr.py - PPO TLDR (reward-based training)

For each script:

  • Removed from trl.trainer.utils import SIMPLE_CHAT_TEMPLATE import
  • Removed conditional block that sets tokenizer.chat_template = SIMPLE_CHAT_TEMPLATE

Rationale

Chat templates are used to format conversational data for instruction tuning (SFT). They don't apply to:

  • Preference optimization (DPO, CPO, ORPO, XPO): These methods optimize based on preference pairs, not conversational format
  • Reward-based training (PPO, Nash-MD): These use reward signals, not chat formatting

Setting chat templates in these contexts was unnecessary and could cause confusion about the expected data format.

Resolves #4404

- Remove SIMPLE_CHAT_TEMPLATE import from 7 trainer scripts
- Remove chat template setting for non-SFT trainers (DPO, CPO, ORPO, PPO, Nash-MD, XPO, Online DPO)
- Chat templates only make sense for SFT (instruction tuning), not for preference optimization or reward-based training
- Scripts modified:
  - examples/scripts/online_dpo.py
  - examples/scripts/orpo.py
  - examples/scripts/cpo.py
  - examples/scripts/nash_md.py
  - examples/scripts/xpo.py
  - examples/scripts/ppo/ppo.py
  - examples/scripts/ppo/ppo_tldr.py
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Member

@qgallouedec qgallouedec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

@behroozazarkhalili behroozazarkhalili merged commit 6f41b18 into main Nov 3, 2025
11 of 12 checks passed
@behroozazarkhalili behroozazarkhalili deleted the fix/remove-chat-template-non-sft branch November 3, 2025 18:57
qgallouedec added a commit that referenced this pull request Nov 4, 2025
commit 4677cf2
Author: Harras Mansoor <[email protected]>
Date:   Wed Nov 5 04:06:13 2025 +0500

    Removed Sentiment Tuning Examples (#4424)

commit 7a9592b
Author: Quentin Gallouédec <[email protected]>
Date:   Tue Nov 4 14:32:04 2025 -0700

    🐍 Drop Python 3.9 (#4183)

commit 7f15a7f
Author: Harras Mansoor <[email protected]>
Date:   Wed Nov 5 02:06:31 2025 +0500

    Removed outdated warning about batch contamination (#4423)

commit 8b0a3ce
Author: Albert Villanova del Moral <[email protected]>
Date:   Tue Nov 4 21:37:39 2025 +0100

    Update tokenizer apply_chat_template with return_dict=True default (#4448)

commit d9f9e2b
Author: Pramodith Ballapuram <[email protected]>
Date:   Tue Nov 4 19:56:58 2025 +0000

    Support casting to fp32 when word embeddings are tied to lm_head (#4446)

commit 4e138ab
Author: Sergio Paniego Blanco <[email protected]>
Date:   Tue Nov 4 15:15:23 2025 +0100

    Upload notebook with T4 selected (#4449)

commit 43253b2
Author: Pramodith Ballapuram <[email protected]>
Date:   Mon Nov 3 21:07:31 2025 +0000

    Add On-Policy Distillation from thinking labs to paper index. (#4410)

    Co-authored-by: Quentin Gallouédec <[email protected]>

commit 6f41b18
Author: Behrooz Azarkhalili <[email protected]>
Date:   Mon Nov 3 10:57:51 2025 -0800

    fix: Remove chat template setting from non-SFT trainer scripts (#4437)

    Co-authored-by: Quentin Gallouédec <[email protected]>
    Co-authored-by: Quentin Gallouédec <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Remove setting chat template in most trainers

4 participants