Enable AVX-VNNI 256-bit path for Q6_K R4 matmul#1482
Merged
ikawrakow merged 1 commit intoikawrakow:mainfrom Mar 23, 2026
Merged
Enable AVX-VNNI 256-bit path for Q6_K R4 matmul#1482ikawrakow merged 1 commit intoikawrakow:mainfrom
ikawrakow merged 1 commit intoikawrakow:mainfrom
Conversation
Add a separate HAVE_VNNI256 code path using _mm256_dpwssd_epi32 and _mm256_dpbusd_epi32 for the Q6_K R4 kernel. The existing HAVE_FANCY_SIMD (AVX-512 VNNI) path is preserved unchanged.
Nexesenex
added a commit
to Nexesenex/ik_llama.cpp.nxs
that referenced
this pull request
Mar 21, 2026
ikawrakow
approved these changes
Mar 23, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Add a
HAVE_VNNI256code path for the Q6_K R4 kernel, replacing AVX2_mm256_maddubs_epi16+_mm256_madd_epi16with_mm256_dpbusd_epi32, and_mm256_madd_epi16with_mm256_dpwssd_epi32for the bsums correction. The existingHAVE_FANCY_SIMD(AVX-512 VNNI) path is preserved unchanged.This follows the v2 approach used in PR #1472 (Q3_K R4 VNNI256), which is currently awaiting review.
Benchmark
Qwen3.5-2B Q6_K, rtr=1:
QA
Qwen3.5-2B Q6_K, --run-time-repack, comparing baseline (56e026f) and PR builds (ce35079):