add python/pytorch version compat notes by wizzard0 · Pull Request #44 · ggml-org/llama.cpp

wizzard0 · 2023-03-12T10:53:42Z

see #32 (comments)

* RAM usage reduction and calculations Removed -b batch limit (1024) (tested up to-b 8192) Fixed a integer overflow in ggml matmul (happened at around nbatch 3000) Added a dynamic calculation for batched scratch memory consumption Overall reduced RAM buffer sizes by magnitudes for normal settings RAM usage scales quadratically with increasing context size * batch Using a small batch (or default 1) will result in a very small memory footprint even at thousands of tokens processed Tested up to 13,000 tokens prompt and 8k batch Needs more tests on various platforms * removed debug * minor ---------

* Adding iq1_tn - 1.6875 bpw for TriLM ternary models * iq1_tn: NEON * iq1_tn: faster NEON * iq2_bn: improve performance on NEON We now get TG-128 = 100 t/s for Bitnet-3B-1.58b! * iq1_tn: improve AVX2 PP-512 goes to 533 t/s up from 455. TG-128 @ 2 threads goes to 16.6 t/s up from 14.2. However, we seem to have a bottleneck somewhere as TG saturates at 8 threads. * iq1_tn: improve Zen4 PP-512 goes to 485 t/s up from 352. With FA we get 545 t/s up from 380. TG-128 @ 1 thread goes to 12.4 t/s up from 10.4. However, we seem to have a bottleneck somewhere as TG saturates at 8 threads. * iq2_bn: improve on Zen4 We now get PP-512 = 614 t/s up from 542 t/s * iq2_bn: improve AVX2 implementation We now get PP-512 = 753 t/s up from 680 t/s. * Remove unnecessary barrier in ggml_compute_forward_mul_mat --------- Co-authored-by: Iwan Kawrakow <[email protected]>

python/pytorch compat notes

97a25c1

ggerganov merged commit b9bd1d0 into ggml-org:master Mar 12, 2023

Bearsaerker mentioned this pull request Mar 12, 2025

Eval bug: Gemma 3 extremly slow prompt processing when using quantized kv cache. #12352

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add python/pytorch version compat notes#44

add python/pytorch version compat notes#44
ggerganov merged 1 commit intoggml-org:masterfrom
wizzard0:master

wizzard0 commented Mar 12, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

wizzard0 commented Mar 12, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants