Skip to content

feat: CUDA port of TurboQuant3 KV cache — 3.47x compression, 98.5% of F16 decode speed on RTX 5090#3

Open
signalnine wants to merge 4 commits intoTheTom:feature/turboquant-kv-cachefrom
signalnine:feature/turboquant-kv-cache
Open

feat: CUDA port of TurboQuant3 KV cache — 3.47x compression, 98.5% of F16 decode speed on RTX 5090#3
signalnine wants to merge 4 commits intoTheTom:feature/turboquant-kv-cachefrom
signalnine:feature/turboquant-kv-cache

Commits

Commits on Mar 26, 2026

Commits on Mar 27, 2026