Build 8070 regression: prefill ~75 tok/s instead of ~250 tok/s

rocm7-nightlies build 8070 (cc45f2ada) has a severe prefill regression.

Setup:
- GMKtec EVO-X2 (Ryzen AI Max+ 395, 128GB)
- Kernel 6.18.7-200.fc43, firmware 20260110
- Model: Qwen3-Coder-30B-A3B-Instruct BF16

Expected (yesterday): ~250 tok/s prefill
Actual (build 8070): ~75 tok/s prefill
Generation speed: 27 tok/s (correct)

Tested with both ROCBLAS_USE_HIPBLASLT=1 and =0, same result.

Command:
llama-server -m model.gguf --jinja --no-mmap -fa 1 -ngl 999 -c 65536

Image digest: sha256:6c1fc630d304c02c9c726a17407923f9951ac8e18e3e8b84f0c1685d0c15c2ba

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Build 8070 regression: prefill ~75 tok/s instead of ~250 tok/s #58

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Build 8070 regression: prefill ~75 tok/s instead of ~250 tok/s #58

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions