iq2_tn: slightly faster PP on Zen4 by ikawrakow · Pull Request #43 · ikawrakow/ik_llama.cpp

ikawrakow · 2024-09-08T09:31:23Z

With this change we get PP512 = 494 t/s (using flash attention), up from 468 t/s (~5% improvement) running on a Ryzen-7950X CPU.

Compared to the initial IQ2_TN PR #13 the cumulative improvement is 15%.

Compared to TQ2_0 in llama.cpp, which has now been merged, we are now 80% faster.

iq2_tn: slightly faster PP

b7f7eed

ikawrakow merged commit bf4b19b into main Sep 8, 2024

Provide feedback