Skip to content

iq2_tn: slightly faster PP on Zen4#43

Merged
ikawrakow merged 1 commit intomainfrom
ik/iq2_tn_faster_pp
Sep 8, 2024
Merged

iq2_tn: slightly faster PP on Zen4#43
ikawrakow merged 1 commit intomainfrom
ik/iq2_tn_faster_pp

Conversation

@ikawrakow
Copy link
Copy Markdown
Owner

@ikawrakow ikawrakow commented Sep 8, 2024

With this change we get PP512 = 494 t/s (using flash attention), up from 468 t/s (~5% improvement) running on a Ryzen-7950X CPU.

Compared to the initial IQ2_TN PR #13 the cumulative improvement is 15%.

Compared to TQ2_0 in llama.cpp, which has now been merged, we are now 80% faster.

@ikawrakow ikawrakow merged commit bf4b19b into main Sep 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant