Running AutoTokenizer.from_pretrained with Mistral V3 is actually loading LlamaTokenizer

### System Info

- Ubuntu 22.04
- Python 3.10.12
- Transformers Version: from source

### Who can help?

@ArthurZucker 

### Information

- [ ] The official example scripts
- [x] My own modified scripts

### Tasks

- [ ] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [x] My own task or dataset (give details below)

### Reproduction

I'm trying to load MistralTokenizer using AutoTokenizer from mistral model in the following code snippet:

```
from transformers import AutoTokenizer

model_id = "mistralai/Mistral-7B-Instruct-v0.3"
auto_tokenizer = AutoTokenizer.from_pretrained(model_id)
```

When I inspect auto_tokenizer variable, then I get LlamaTokenizerFast:

```
LlamaTokenizerFast(name_or_path='mistralai/Mistral-7B-Instruct-v0.3', vocab_size=32768, model_max_length=1000000000000000019884624838656, is_fast=True, padding_side='left', truncation_side='right', special_tokens={'bos_token': '<s>', 'eos_token': '</s>', 'unk_token': '<unk>'}, clean_up_tokenization_spaces=False),  added_tokens_decoder={
	0: AddedToken("<unk>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
	...
}
```

I don't know if I'm missing something, but it is loading a different tokenizer than I expected.

### Expected behavior

IMHO it should instantiate a MistralTokenizer.v3() tokenizer as implemented in [mistral-common](https://github.com/mistralai/mistral-common). I checked the `TOKENIZER_MAPPING` object, and Mistral isn't even listed there.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Running AutoTokenizer.from_pretrained with Mistral V3 is actually loading LlamaTokenizer #31375

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Running AutoTokenizer.from_pretrained with Mistral V3 is actually loading LlamaTokenizer #31375

Description

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions