Adding IQ5_KS - 5.25 bpw quants #422

ikawrakow · 2025-05-15T13:02:20Z

For motivation, see the CUDA performance graphs in #417 and #418.

Implementation for AVX2, Zen4, ARM_NEON, CUDA, Metal.

The AVX2 implementation suffers from int16_t overflow, and so do the IQ4_K, IQ5_K, IQ6_K and IQ4_KS, so I will have to fix all of these in a follow up PR.

I also want to add interleaved variant IQ5_KS_R4 before giving more performance and accuracy details.

But is is not quite right, just like iq4_k, iq5_k, iq6_k, iq4_ks. All these need fixing on AVX2.

ubergarm · 2025-05-18T21:18:35Z

Just did some testing of a mixed IQ5_KS / IQ4_KS quant of Qwen3-14B dense showing some Perplexity and Speed comparisons for full CUDA offload in this new quant cookers guide (just scroll to bottom, can't link anchors in gh discussions...)

Thanks for adding, the quality looks really good for the size!

Iwan Kawrakow added 10 commits May 15, 2025 09:38

iq5_ks: basics

560820c

iq5_ks: quantize

d6eb80d

iq5_ks: CUDA dequantize works

ecfbaba

iq5_ks: dot product works on CUDA

31ecbaa

iq5_ks: MMQ works

f0355f2

iq5_ks: Zen4

65b9d33

iq5_ks: AVX2

e2ecb1a

But is is not quite right, just like iq4_k, iq5_k, iq6_k, iq4_ks. All these need fixing on AVX2.

iq5_ks: NEON

b8db611

iq5_ks: Metal dequantize

cf93e69

iq5_ks: Metal dot product

a7ceba3

ikawrakow merged commit 3d92d7f into main May 15, 2025

ikawrakow mentioned this pull request May 16, 2025

Fix AVX2 implementation of IQ4_K, IQ4_KS, IQ5_K, IQ6_K #427

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Adding IQ5_KS - 5.25 bpw quants #422

Adding IQ5_KS - 5.25 bpw quants #422

Uh oh!

ikawrakow commented May 15, 2025

Uh oh!

ubergarm commented May 18, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Adding IQ5_KS - 5.25 bpw quants #422

Adding IQ5_KS - 5.25 bpw quants #422

Uh oh!

Conversation

ikawrakow commented May 15, 2025

Uh oh!

ubergarm commented May 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ubergarm commented May 18, 2025 •

edited

Loading