Enable AVX-VNNI 256-bit path for IQ4_NL R4 matmul by accaldwell · Pull Request #1467 · ikawrakow/ik_llama.cpp

accaldwell · 2026-03-19T22:03:19Z

IQ4_NL has a special kernel in repacked (R4) mode (mul_mat_iq4_nl_r4_q8_2).

It currently has a FANCY_SIMD path that requires AVX-512, here we update the fallback AVX2 path to have a conditionally VNNI accelerated path on AVX-VNNI CPUs.

Benchmarks

Model: Qwen3.5-2B IQ4_NL, pp512

rtr 0 (control - different kernel that is already VNNI optimized)

Build	t/s
Baseline	271.10 ± 0.59
PR	270.35 ± 1.31

rtr 1 (runtime repack - uses the newly optimized kernel)

Build	t/s
Baseline	189.13 ± 0.37
PR	246.07 ± 1.98

Big improvement here, though rtr 0 is still faster on my hardware for this quant.

Text generation QA

Text generation QA with llama-cli across multiple prompts shows bit-identical results. Full perplexity against wikitest-2 is unchanged as well (13.1025 +/- 0.09740).

Enable AVX-VNNI 256-bit path for IQ4_NL R4 matmul

80bab7f

accaldwell marked this pull request as ready for review March 19, 2026 22:11

ikawrakow approved these changes Mar 20, 2026

View reviewed changes

ikawrakow merged commit a56a786 into ikawrakow:main Mar 20, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable AVX-VNNI 256-bit path for IQ4_NL R4 matmul#1467

Enable AVX-VNNI 256-bit path for IQ4_NL R4 matmul#1467
ikawrakow merged 1 commit intoikawrakow:mainfrom
accaldwell:ac/vnni-iq4nl-r4-matmul

accaldwell commented Mar 19, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

accaldwell commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmarks

rtr 0 (control - different kernel that is already VNNI optimized)

rtr 1 (runtime repack - uses the newly optimized kernel)

Text generation QA

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

accaldwell commented Mar 19, 2026 •

edited

Loading