Skip to content

add python/pytorch version compat notes#44

Merged
ggerganov merged 1 commit intoggml-org:masterfrom
wizzard0:master
Mar 12, 2023
Merged

add python/pytorch version compat notes#44
ggerganov merged 1 commit intoggml-org:masterfrom
wizzard0:master

Conversation

@wizzard0
Copy link
Copy Markdown
Contributor

see #32 (comments)

@ggerganov ggerganov merged commit b9bd1d0 into ggml-org:master Mar 12, 2023
44670 pushed a commit to 44670/llama.cpp that referenced this pull request Aug 2, 2023
* RAM usage reduction and calculations
Removed -b batch limit (1024) (tested up to-b 8192)
Fixed a integer overflow in ggml matmul (happened at around nbatch 3000)
Added a dynamic calculation for batched scratch memory consumption
Overall reduced RAM buffer sizes by magnitudes for normal settings
RAM usage scales quadratically with increasing context size * batch
Using a small batch (or default 1) will result in a very small memory footprint even at thousands of tokens processed
Tested up to 13,000 tokens prompt and 8k batch
Needs more tests on various platforms

* removed debug

* minor

---------
SamuelOliveirads pushed a commit to SamuelOliveirads/llama.cpp that referenced this pull request Dec 29, 2025
* Adding iq1_tn - 1.6875 bpw for TriLM ternary models

* iq1_tn: NEON

* iq1_tn: faster NEON

* iq2_bn: improve performance on NEON

We now get TG-128 = 100 t/s for Bitnet-3B-1.58b!

* iq1_tn: improve AVX2

PP-512 goes to 533 t/s up from 455.
TG-128 @ 2 threads goes to 16.6 t/s up from 14.2.
However, we seem to have a bottleneck somewhere as
TG saturates at 8 threads.

* iq1_tn: improve Zen4

PP-512 goes to 485 t/s up from 352. With FA we get 545 t/s up from 380.
TG-128 @ 1 thread goes to 12.4 t/s up from 10.4.
However, we seem to have a bottleneck somewhere as
TG saturates at 8 threads.

* iq2_bn: improve on Zen4

We now get PP-512 = 614 t/s up from 542 t/s

* iq2_bn: improve AVX2 implementation

We now get PP-512 = 753 t/s up from 680 t/s.

* Remove unnecessary barrier in ggml_compute_forward_mul_mat

---------

Co-authored-by: Iwan Kawrakow <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants