AVX IQ Quants#7845
Conversation
ggerganov
left a comment
There was a problem hiding this comment.
Make sure the test-backend-ops passes
Yeah it runs and passes with Considering how it only takes a minute to run I think it's worth adding the CPU version of |
|
The goal of |
|
I compared the AVX CPU vs the GPU results on my linux box and tests are passing. Should be good to merge |
Update hv/matmul up to: commit 557b653 (HEAD -> master, origin/master, origin/HEAD) Author: k.h.lai <adrian.k.h.lai@outlook.com> Date: Fri Jun 21 16:28:20 2024 +0800 vulkan: detect multiple devices by deviceUUID instead of deviceID (ggml-org#8022) commit 7d5e877 Author: Eve <139727413+netrunnereve@users.noreply.github.com> Date: Fri Jun 21 05:57:36 2024 +0000 ggml : AVX IQ quants (ggml-org#7845) ...
I finally had the time to work on original AVX versions of the IQ quants
ggml_vec_dotfor Sandy Bridge and Ivy Bridge users.Master:
PR:
Some example benchmarks:
The scalar IQ code is really slow on my computer, even with a 8B model. Pretty much any K quant of equivalent size can beat it with a 30B model! I mostly followed the original AVX2 implementation and converted the new 256-bit instructions into two 128-bit ones when required.