Skip to content

perf: sparse V dequant — skip negligible attention weights in VEC kernel

a5efe54
Select commit
Loading
Failed to load commit list.
Sign in for the full log view
Open

feat: CUDA port of TurboQuant3 KV cache — 3.47x compression, 98.5% of F16 decode speed on RTX 5090 #3

perf: sparse V dequant — skip negligible attention weights in VEC kernel
a5efe54
Select commit
Loading
Failed to load commit list.
labeler
succeeded Mar 28, 2026 in 12s