get_rows & dequantize function implementation for repacked weights of type q6_K (q6_Kx8) by swetha097 · Pull Request #16743 · ggml-org/llama.cpp

swetha097 · 2025-10-23T14:40:14Z

NOTE: Creating the PR changes required for whisper.cpp here as llama.cpp already includes test-backend-op

This implements the GGML_OP_GET_ROWS operation specifically for repacked (block interleaved) 6-bit quantized format (q6_Kx8).
The following gains were observed by the changes made in the PR - The changes allow for increased usage of the GEMM function (ggml_gemm_q6_K_8x8_q8_0) for q6_K type.
The PR was tested in AMD Raphael 7600X for whisper - which supports the following flags :
system_info: n_threads = 4 / 12 | WHISPER : COREML = 0 | OPENVINO = 0 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | BMI2 = 1 | AVX512 = 1 | AVX512_VBMI = 1 | AVX512_VNNI = 1 | AVX512_BF16 = 1 | LLAMAFILE = 1 | OPENMP = 1 | AARCH64_REPACK = 1 |

master branch commit - swetha097/whisper.cpp@fc45bb8

q6K repacking commit ( block interleaving approach for Q6_K quantization for x64/x86 SIMD Architecture )- swetha097/whisper.cpp@d89aaf2

development (get_rows) branch commit - swetha097/whisper.cpp@de9839e

Model for performance tests Downloaded from : https://huggingface.co/ggerganov/whisper.cpp/blob/main/ggml-base.en.bin and quantized to q6_K

This patch of the code was also tested with llama.cpp repository & the perplexity of Q6_K models were ensured to be the same before and after the changes made :

Final estimate: PPL = 5.3669 +/- 0.13305

Model used for perplexity test quantized from - https://huggingface.co/meta-llama/Llama-2-7b

This PR is to merged after - Q6_K - Block Interleaving Implementation for x86 SIMD (AVX512/AVX2) #15275

Alcpz · 2026-03-11T13:00:06Z

@swetha097 I've come across an issue due to the lack of support with GET_ROWS in CPU_REPACK (described in #20396). I think your PR essentially solves the problem, but it's been open for a while. Can I help you somehow to move this forward (testing, rebasing). This seemed blocked due to the lack of support for Q6_K repack, but despite #15275 being still open, other contributions already enabled q6_K, so it should be fine to review and merge this in.

swetha097 · 2026-03-17T10:15:19Z

@swetha097 I've come across an issue due to the lack of support with GET_ROWS in CPU_REPACK (described in #20396). I think your PR essentially solves the problem, but it's been open for a while. Can I help you somehow to move this forward (testing, rebasing). This seemed blocked due to the lack of support for Q6_K repack, but despite #15275 being still open, other contributions already enabled q6_K, so it should be fine to review and merge this in.

Hi @Alcpz
We are rebasing and testing this PR which is in progress. Also our team has issued the Q6_K PR - block interleaving for the x86 architechture -#19706, can you help us in merging this PR to the master branch.

Alcpz · 2026-03-17T10:59:43Z

I can give a hand on the other PR. This one should be pretty straight forward. As I mentioned, #20396 is essentially this PR adapted a bit. Once you are confident it's good to go, ping ggerganov. Slaren is taking a break, so he won't be able to help with the review.

Edit: Also ping me if you need help here as well.

ggerganov · 2026-03-17T12:55:23Z

@Alcpz It's difficult to extend the repack logic without having a testing infrastructure for the extra buffer types in place first (see the discussion in ggml-org/whisper.cpp#3223). So for now we avoid such changes.

Alcpz · 2026-03-17T13:03:52Z

I understand. It's been a pain to test my PRs and honestly, I'm grateful that you agreed to merge those (which add a few new repack types).
So this is blocked until #16004 gets in right?

I had a chat with @tdakhran and he found that currently models with tied embeddings duplicate the tensors and after investigating I landed in this PR. The lack of GET_ROWS causes this duplication, as we now need one tensor repacked and one without repack. For small models we get a very high memory footprint that heavily affects low memory devices, so I was looking if I could help move this forward somehow.

ggerganov · 2026-03-17T13:20:05Z

So this is blocked until #16004 gets in right?

I think so, though it's quite low priority on my end.

We basically need a mechanism to exercise and verify all of the repack logic. This also requires CI workflows and respective hardware that would run it regularly.

swetha097 added 2 commits October 23, 2025 02:17

q6K get_rows & dequantize function

8ffdaea

Resolve PR comments

d611fb4

swetha097 requested review from ggerganov and slaren as code owners October 23, 2025 14:40

github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Oct 23, 2025

Alcpz mentioned this pull request Mar 11, 2026

ggml-cpu: Add get_rows support for Q6_K REPACK #20396

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

get_rows & dequantize function implementation for repacked weights of type q6_K (q6_Kx8) #16743

get_rows & dequantize function implementation for repacked weights of type q6_K (q6_Kx8) #16743
swetha097 wants to merge 2 commits intoggml-org:masterfrom
swetha097:q6_K/get_rows_and_dequantize

swetha097 commented Oct 23, 2025

Uh oh!

Alcpz commented Mar 11, 2026

Uh oh!

swetha097 commented Mar 17, 2026 •

edited

Loading

Uh oh!

Alcpz commented Mar 17, 2026 •

edited

Loading

Uh oh!

ggerganov commented Mar 17, 2026

Uh oh!

Alcpz commented Mar 17, 2026

Uh oh!

ggerganov commented Mar 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

swetha097 commented Oct 23, 2025

Uh oh!

Alcpz commented Mar 11, 2026

Uh oh!

swetha097 commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Alcpz commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ggerganov commented Mar 17, 2026

Uh oh!

Alcpz commented Mar 17, 2026

Uh oh!

ggerganov commented Mar 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

swetha097 commented Mar 17, 2026 •

edited

Loading

Alcpz commented Mar 17, 2026 •

edited

Loading