Skip to content

Even better Q4_0 KV cache (CPU)#1556

Merged
ikawrakow merged 1 commit intomainfrom
ik/better_q40_kv_cache_cpu
Mar 31, 2026
Merged

Even better Q4_0 KV cache (CPU)#1556
ikawrakow merged 1 commit intomainfrom
ik/better_q40_kv_cache_cpu

Conversation

@ikawrakow
Copy link
Copy Markdown
Owner

Cont #1547, which was CUDA only.

This PR does the same for the CPU back-end. Once at it, I also simdified Q4_0 quantization on AVX2 for a minor performance benefit when using Q4_0 KV cache.

@ikawrakow ikawrakow merged commit bd7daa7 into main Mar 31, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant