Skip to content

Conversation

@sergiopaniego
Copy link
Member

What does this PR do?

In the sft example script, the tokenizer is currently created before creating the trainer. This causes errors for VLM since they use processor. It can safely be removed since it's created inside the trainer

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a GitHub issue? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@sergiopaniego
Copy link
Member Author

We also have sft_vlm.py and more example scripts. I think we can keep those for some time and then drop them, creating a generic one for all the cases. wdyt?

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Member

@qgallouedec qgallouedec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

@qgallouedec
Copy link
Member

qgallouedec commented Oct 6, 2025

We also have sft_vlm.py and more example scripts. I think we can keep those for some time and then drop them, creating a generic one for all the cases. wdyt?

I think that unless we have a good reason to create the tokenizer outside the trainer, we should let the trainer create it. I'm fine with removing in other scripts now if you want

@sergiopaniego sergiopaniego merged commit 7ad9ce8 into main Oct 6, 2025
4 of 12 checks passed
@sergiopaniego sergiopaniego deleted the fix-sft branch October 6, 2025 09:04
qgallouedec added a commit that referenced this pull request Oct 6, 2025
commit ae6837f
Author: Sergio Paniego Blanco <[email protected]>
Date:   Mon Oct 6 18:40:18 2025 +0200

    Removed tokenizer/processor creation from example scripts (#4211)

commit 56a8f11
Author: Albert Villanova del Moral <[email protected]>
Date:   Mon Oct 6 17:45:44 2025 +0200

    Replace setup with pyproject and fix packaging unintended modules (#4194)

commit 5291015
Author: Sergio Paniego Blanco <[email protected]>
Date:   Mon Oct 6 16:04:06 2025 +0200

    Remove `Optional` from `processing_class` in `PPOTrainer` (#4212)

commit 0588b1f
Author: Sergio Paniego Blanco <[email protected]>
Date:   Mon Oct 6 15:57:17 2025 +0200

    Updated vLLM integration guide (#4162)

    Co-authored-by: Quentin Gallouédec <[email protected]>

commit 45ee98b
Author: Albert Villanova del Moral <[email protected]>
Date:   Mon Oct 6 11:14:54 2025 +0200

    Replace unittest with pytest (#4188)

commit 3800a6e
Author: Albert Villanova del Moral <[email protected]>
Date:   Mon Oct 6 11:13:21 2025 +0200

    Hotfix: Exclude transformers 4.57.0 for Python 3.9 (#4209)

    Co-authored-by: Sergio Paniego Blanco <[email protected]>

commit 7ad9ce8
Author: Sergio Paniego Blanco <[email protected]>
Date:   Mon Oct 6 11:04:20 2025 +0200

    Remove tokenizer creation from `sft` example script (#4197)

commit 0c2dc14
Author: Albert Villanova del Moral <[email protected]>
Date:   Mon Oct 6 08:31:58 2025 +0200

    Remove custome_container for building the docs (#4198)

commit ced8b33
Author: burtenshaw <[email protected]>
Date:   Mon Oct 6 08:23:11 2025 +0200

    [DOCS/FIX] lora without regrets - fix lr (#4207)
qgallouedec added a commit that referenced this pull request Oct 6, 2025
commit 65eb45c
Author: Quentin Gallouédec <[email protected]>
Date:   Mon Oct 6 13:07:18 2025 -0600

    Apply style and revert change in `sft_video_llm` example (#4214)

commit ae6837f
Author: Sergio Paniego Blanco <[email protected]>
Date:   Mon Oct 6 18:40:18 2025 +0200

    Removed tokenizer/processor creation from example scripts (#4211)

commit 56a8f11
Author: Albert Villanova del Moral <[email protected]>
Date:   Mon Oct 6 17:45:44 2025 +0200

    Replace setup with pyproject and fix packaging unintended modules (#4194)

commit 5291015
Author: Sergio Paniego Blanco <[email protected]>
Date:   Mon Oct 6 16:04:06 2025 +0200

    Remove `Optional` from `processing_class` in `PPOTrainer` (#4212)

commit 0588b1f
Author: Sergio Paniego Blanco <[email protected]>
Date:   Mon Oct 6 15:57:17 2025 +0200

    Updated vLLM integration guide (#4162)

    Co-authored-by: Quentin Gallouédec <[email protected]>

commit 45ee98b
Author: Albert Villanova del Moral <[email protected]>
Date:   Mon Oct 6 11:14:54 2025 +0200

    Replace unittest with pytest (#4188)

commit 3800a6e
Author: Albert Villanova del Moral <[email protected]>
Date:   Mon Oct 6 11:13:21 2025 +0200

    Hotfix: Exclude transformers 4.57.0 for Python 3.9 (#4209)

    Co-authored-by: Sergio Paniego Blanco <[email protected]>

commit 7ad9ce8
Author: Sergio Paniego Blanco <[email protected]>
Date:   Mon Oct 6 11:04:20 2025 +0200

    Remove tokenizer creation from `sft` example script (#4197)

commit 0c2dc14
Author: Albert Villanova del Moral <[email protected]>
Date:   Mon Oct 6 08:31:58 2025 +0200

    Remove custome_container for building the docs (#4198)

commit ced8b33
Author: burtenshaw <[email protected]>
Date:   Mon Oct 6 08:23:11 2025 +0200

    [DOCS/FIX] lora without regrets - fix lr (#4207)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants