llama-bench: enable having different number of threads for tg and pp by ikawrakow · Pull Request #284 · ikawrakow/ik_llama.cpp

ikawrakow · 2025-03-24T16:13:01Z

All applications in the examples folder except llama-bench accept -t (to specify number of threads for token generation) and -tb (to specify number of threads for prompt processing, a.k.a. prefill) as command line arguments. This is handy because often TG peak performance is reached at a lower number of threads, so one wants to use that instead of the number of cores, which is good for maximum prompt processing speed. llama-bench, inherited from upstream, has its own command line argument parsing, where one only has available -t but not -tb.

This PR adds a new command line argument to llama-bench: -tgb (or --threads-gen-batch). One can use it as, e.g.,

./bin/llama-bench -tgb 4,16 -p 512 -n 128 other_arguments

where 4 threads will be used for the tg128 test, and 16 threads will be used for the pp512 test. For tests that are a combination of prefill and gen (-pg, -gp), the batch number of threads will be used for prefill, and the gen number of threads will be used for token generation. One can also specify multiple pairs of {t_gen, t_batch} for the -tgb argument, separating them with a semicolon. E.g.,

./bin/llama-bench -tgb 2,16;4,16;8,32

The -t argument continues to work as before. It adds a pair of the same integer in the list of {t_hen, t_batch} number of thread pairs.

Caveat: For -p the batch number of threads is added to the table. For all other tests the gen number of threads is printed. This is of course appropriate for -n and -gp, but it becomes confusing for -pg, where the batch and gen number of threads both matter for the reported performance. I guess, it would be better to print both thread numbers in this case, but this is not done in this PR.

ubergarm · 2025-03-25T16:27:02Z

Thanks for this one, should help optimize the big xeon 6980P given previous testing suggests that pp likes more threads than tg.

Iwan Kawrakow added 2 commits March 24, 2025 17:48

llama-bench: enable having different number of threads for tg and pp

52fd0ac

Add -tgb to usage

e3ebf3c

ikawrakow merged commit a22250d into main Mar 25, 2025

ubergarm mentioned this pull request Mar 25, 2025

Is this better for DeepSeek-R1? #287

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llama-bench: enable having different number of threads for tg and pp#284

llama-bench: enable having different number of threads for tg and pp#284
ikawrakow merged 2 commits intomainfrom
ik/llama_bench_tgb

ikawrakow commented Mar 24, 2025

Uh oh!

ubergarm commented Mar 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ikawrakow commented Mar 24, 2025

Uh oh!

ubergarm commented Mar 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants