Patch Mistral Common Tokenizer 2 #41962

juliendenize · 2025-10-31T11:07:04Z

What does this PR do?

Fixes the behavior of add_special_tokens to match PretrainedTokenizer goals:

if mode is finetuning: add both bos and eos tokens
if mode is test: add only bos tokens so that the model can generate freely.

Add ValidationMode as a string support to avoid import from mistral-common for the user.
Fix decode when user passes an int instead of a sequence of ints

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

cc @patrickvonplaten for viz

@ArthurZucker and @itazap

juliendenize added 6 commits November 5, 2025 16:59

Patch Mistral Common Tokenizer 2: fix add_special_tokens

3b6d1db

Fix typos

d25467c

Fix typing issue

ab9c745

Make _get_validation_mode static

2c7ac1e

wip

f892b7e

Add int support to decode

0b817f9

juliendenize force-pushed the patch_mistral_tokenizer_2 branch from ef7d5fb to 0b817f9 Compare November 5, 2025 15:59

juliendenize changed the title ~~Patch Mistral Common Tokenizer 2: fix add_special_tokens~~ Patch Mistral Common Tokenizer 2 Nov 5, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Patch Mistral Common Tokenizer 2 #41962

Patch Mistral Common Tokenizer 2 #41962

Uh oh!

juliendenize commented Oct 31, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Patch Mistral Common Tokenizer 2 #41962

Are you sure you want to change the base?

Patch Mistral Common Tokenizer 2 #41962

Uh oh!

Conversation

juliendenize commented Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Who can review?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

juliendenize commented Oct 31, 2025 •

edited

Loading