Enable AVX-VNNI 256-bit path for IQ3_XXS and IQ3_S R4 matmul by accaldwell · Pull Request #1474 · ikawrakow/ik_llama.cpp

accaldwell · 2026-03-20T08:21:34Z

Adding HAVE_VNNI256 optimized paths for mul_mat_iq3_xxs_r4_q8_k and mul_mat_iq3_s_r4_q8_k kernels.

The method we take here is very similar to some of my previous PRs, adding and conditionally using an optimized dpbusd path instead of the multi-instruction AVX2 alternative.

FANCY is doing its own thing above the respective new VNNI code blocks and shouldn't reach this new code.

Performance

Sweep bench (Qwen3.5-2B IQ3_XS, 6 P-core threads, --run-time-repack, N_KV 0-16384):

Small but solid boost to pp and the data is suggestive of a tiny increase in tg as well

QA

Token generation: identical output across 4 prompts (deterministic, seed=42)
Perplexity: identical, 14.0944 +/- 0.10305

Enable AVX-VNNI 256-bit path for IQ3_XXS and IQ3_S R4 matmul

41a03c2

accaldwell force-pushed the ac/vnni_iq3_r4 branch from 9a7edd5 to 41a03c2 Compare March 20, 2026 08:26

accaldwell marked this pull request as ready for review March 20, 2026 09:38

ikawrakow approved these changes Mar 20, 2026

View reviewed changes

ikawrakow merged commit ac4d6b9 into ikawrakow:main Mar 20, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable AVX-VNNI 256-bit path for IQ3_XXS and IQ3_S R4 matmul#1474

Enable AVX-VNNI 256-bit path for IQ3_XXS and IQ3_S R4 matmul#1474
ikawrakow merged 1 commit intoikawrakow:mainfrom
accaldwell:ac/vnni_iq3_r4

accaldwell commented Mar 20, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

accaldwell commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Performance

QA

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

accaldwell commented Mar 20, 2026 •

edited

Loading