Skip to content

Build 8070 regression: prefill ~75 tok/s instead of ~250 tok/s #58

@vincentsider

Description

@vincentsider

rocm7-nightlies build 8070 (cc45f2ada) has a severe prefill regression.

Setup:

  • GMKtec EVO-X2 (Ryzen AI Max+ 395, 128GB)
  • Kernel 6.18.7-200.fc43, firmware 20260110
  • Model: Qwen3-Coder-30B-A3B-Instruct BF16

Expected (yesterday): ~250 tok/s prefill
Actual (build 8070): ~75 tok/s prefill
Generation speed: 27 tok/s (correct)

Tested with both ROCBLAS_USE_HIPBLASLT=1 and =0, same result.

Command:
llama-server -m model.gguf --jinja --no-mmap -fa 1 -ngl 999 -c 65536

Image digest: sha256:6c1fc630d304c02c9c726a17407923f9951ac8e18e3e8b84f0c1685d0c15c2ba

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions