Llama-4-Scout-17B-16E-Instruct model perplexity anomaly when transformers==4.55.1

### System Info


The hardware environment and other major software packages used are as follows.

```
gpu                                 mi300
rocm                                6.10.5
python                             3.12.11
numpy                             2.1.3
tokenizers                        0.21.4
torch                             2.7.1+rocm6.3
torchaudio                        2.7.1+rocm6.3
torchvision                       0.22.1+rocm6.3
triton                            3.2.0+gite5begpu
```

### Who can help?

_No response_

### Information

- [ ] The official example scripts
- [ ] My own modified scripts

### Tasks

- [ ] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [ ] My own task or dataset (give details below)

### Reproduction

1. Loading the model
```
model = Llama4ForConditionalGeneration.from_pretrained(
            ckpt_path="meta-llama/Llama-4-Scout-17B-16E-Instruct",
            device_map="auto",
            torch_dtype="auto"
        )
```
2. Using llm_eval library to do the evaluation

### Expected behavior

When transformers==4.53.0, the harmness perplexity of `Llama-4-Scout-17B-16E-Instruct` was as below.
```
| Tasks  |Version|Filter|n-shot|    Metric     |   |Value |   |Stderr|
|--------|------:|------|-----:|---------------|---|-----:|---|------|
|wikitext|      2|none  |     0|bits_per_byte  |_  |0.6006|_  |   N/A|
|        |       |none  |     0|byte_perplexity|_  |1.5164|_  |   N/A|
|        |       |none  |     0|word_perplexity|_  |9.2650|_  |   N/A|
```
After transformers version upgrade to 4.55.1, the perplexity:
```
| Tasks  |Version|Filter|n-shot|    Metric     |   | Value |   |Stderr|
|--------|------:|------|-----:|---------------|---|------:|---|------|
|wikitext|      2|none  |     0|bits_per_byte  |_  | 1.2306|_  |   N/A|
|        |       |none  |     0|byte_perplexity|_  | 2.3466|_  |   N/A|
|        |       |none  |     0|word_perplexity|_  |95.7053|_  |   N/A|
```
All accuracy metrics have deteriorated.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Llama-4-Scout-17B-16E-Instruct model perplexity anomaly when transformers==4.55.1 #40642

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Llama-4-Scout-17B-16E-Instruct model perplexity anomaly when transformers==4.55.1 #40642

Description

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions