Skip to content

Enable AVX-VNNI 256-bit path for Q6_K R4 matmul#1482

Merged
ikawrakow merged 1 commit intoikawrakow:mainfrom
accaldwell:ac/vnni-q6k-r4
Mar 23, 2026
Merged

Enable AVX-VNNI 256-bit path for Q6_K R4 matmul#1482
ikawrakow merged 1 commit intoikawrakow:mainfrom
accaldwell:ac/vnni-q6k-r4

Conversation

@accaldwell
Copy link
Copy Markdown
Contributor

@accaldwell accaldwell commented Mar 20, 2026

Summary

Add a HAVE_VNNI256 code path for the Q6_K R4 kernel, replacing AVX2 _mm256_maddubs_epi16 + _mm256_madd_epi16 with _mm256_dpbusd_epi32, and _mm256_madd_epi16 with _mm256_dpwssd_epi32 for the bsums correction. The existing HAVE_FANCY_SIMD (AVX-512 VNNI) path is preserved unchanged.

This follows the v2 approach used in PR #1472 (Q3_K R4 VNNI256), which is currently awaiting review.

Benchmark

Qwen3.5-2B Q6_K, rtr=1:

sweep_q6k_comparison

QA

Qwen3.5-2B Q6_K, --run-time-repack, comparing baseline (56e026f) and PR builds (ce35079):

  • 4/4 llama-cli prompts produce identical output
  • Perplexity identical: 12.7941 +/- 0.09497

Add a separate HAVE_VNNI256 code path using _mm256_dpwssd_epi32 and
_mm256_dpbusd_epi32 for the Q6_K R4 kernel. The existing HAVE_FANCY_SIMD
(AVX-512 VNNI) path is preserved unchanged.
Nexesenex added a commit to Nexesenex/ik_llama.cpp.nxs that referenced this pull request Mar 21, 2026
@accaldwell accaldwell marked this pull request as ready for review March 21, 2026 03:17
@ikawrakow ikawrakow merged commit 87e4b92 into ikawrakow:main Mar 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants