Biased mmvq: minor optimization by ikawrakow · Pull Request #880 · ikawrakow/ik_llama.cpp

ikawrakow · 2025-10-30T09:24:00Z

This PR derives from PR 16847 in mainline.

On my GPU (RTX-4080) it is a very minor improvement over the main branch (~0.5% better TG for GPT-OSS-20B-MXFP4, less for other models). But based on the discussion in the mainline PR, it may lead to larger performance gains for low memory bandwidth GPUs.

The PR also adds the -mmvq | --merge-qkv option (see #878) to llama-bench.

This reverts commit fd3757d.

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

Biased mmvq: minor optimization

bb4752d

ikawrakow merged commit fd3757d into main Oct 31, 2025

Nexesenex added a commit to Nexesenex/ik_llama.cpp.nxs that referenced this pull request Nov 6, 2025

Revert "Biased mmvq: minor optimization (ikawrakow#880)"

e49462b

This reverts commit fd3757d.

Nexesenex added a commit to Nexesenex/ik_llama.cpp.nxs that referenced this pull request Nov 6, 2025

Revert "Biased mmvq: minor optimization (ikawrakow#880)"

f59c391

This reverts commit fd3757d.

ikawrakow pushed a commit that referenced this pull request Nov 11, 2025

Opt from #880 also for iqk cuda gemv

5266eee

ikawrakow added a commit that referenced this pull request Nov 11, 2025

Opt from #880 also for iqk cuda gemv (#938)

463c694

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Biased mmvq: minor optimization#880

Biased mmvq: minor optimization#880
ikawrakow merged 1 commit intomainfrom
ik/biased_mmvq

ikawrakow commented Oct 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ikawrakow commented Oct 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant