Faster adaptive_p sampling by ikawrakow · Pull Request #1165 · ikawrakow/ik_llama.cpp

ikawrakow · 2026-01-19T10:38:50Z

This PR further optimizes adaptive_p sampling compared to PR #1161. For more context, see the discussion there.

To actually measure the time spent in the adaptive_p sampler, one needs to add up the time spent in all of its functions, not just the final sampling time, which is fast. This is done in this PR and also in #1161, but I also modified the main branch (not pushed here) to be able to compare.

Here the results of a quick experiment with Qwne3-30B-A3B-Q8_0, adaptive_p enabled, prompt "Give me an extended summary of the history of Bulgaria". We see a massive improvement between the main branch and #1161 (~17X), and an additional speedup of 2.4X in this PR.

Main branch

llama_print_timings:        load time =    3036.16 ms
llama_print_timings:      sample time =   26234.31 ms /  2617 runs   (   10.02 ms per token,    99.75 tokens per second)
llama_print_timings: prompt eval time =      69.45 ms /    19 tokens (    3.66 ms per token,   273.59 tokens per second)
llama_print_timings:        eval time =   16612.06 ms /  2616 runs   (    6.35 ms per token,   157.48 tokens per second)
llama_print_timings:       total time =   56298.48 ms /  2635 tokens

PR #1161

llama_print_timings:        load time =    2938.55 ms
llama_print_timings:      sample time =    1524.91 ms /  2617 runs   (    0.58 ms per token,  1716.16 tokens per second)
llama_print_timings: prompt eval time =      70.19 ms /    19 tokens (    3.69 ms per token,   270.69 tokens per second)
llama_print_timings:        eval time =   16584.15 ms /  2616 runs   (    6.34 ms per token,   157.74 tokens per second)
llama_print_timings:       total time =   33213.94 ms /  2635 tokens

This PR

ik/adaptive_p_2
llama_print_timings:        load time =    2954.32 ms
llama_print_timings:      sample time =     627.39 ms /  2667 runs   (    0.24 ms per token,  4250.91 tokens per second)
llama_print_timings: prompt eval time =      69.66 ms /    19 tokens (    3.67 ms per token,   272.75 tokens per second)
llama_print_timings:        eval time =   16910.88 ms /  2666 runs   (    6.34 ms per token,   157.65 tokens per second)
llama_print_timings:       total time =   38529.50 ms /  2685 tokens

ikawrakow · 2026-01-19T13:18:22Z

OK, I added AVX2 implementation of the probabilities. With that I get for the above test case

llama_print_timings:        load time =    2985.43 ms
llama_print_timings:      sample time =     476.95 ms /  2667 runs   (    0.18 ms per token,  5591.72 tokens per second)
llama_print_timings: prompt eval time =      70.72 ms /    19 tokens (    3.72 ms per token,   268.67 tokens per second)
llama_print_timings:        eval time =   16940.82 ms /  2666 runs   (    6.35 ms per token,   157.37 tokens per second)
llama_print_timings:       total time =   33074.71 ms /  2685 tokens

This is, somewhat disappointingly, only 0.58/0.18 = 3.2 times faster than #1161. But if we take into account the time spent in other samplers (0.07 ms in this example), the speedup compared to #1161 becomes (0.58 - 0.07)/(0.18 - 0.07) = 4.6 times. If I campare to the previous main branch, speedup is (10.02 - 0.07)/(0.18 - 0.07) = 90.5 times!

ikawrakow added 6 commits January 19, 2026 06:44

A hopefully more efficient adaptive_p sampling

6c65430

Once at it, lets fix the formatting too

b2c9689

More formatting

61eccfc

Hopefully better

a9f37c2

This should be better

4df3251

Correctly accumulate adaptive_p sampling time

bd24349

Ph0rk0z mentioned this pull request Jan 19, 2026

A hopefully more efficient adaptive_p sampling #1161

Merged

ikawrakow added 2 commits January 19, 2026 13:03

AVX2

c9cd616

Merge remote-tracking branch 'origin/main' into ik/adaptive_p_2

f62e317

ikawrakow merged commit 98b30e5 into main Jan 19, 2026

ikawrakow mentioned this pull request Jan 24, 2026

Much faster rng sampling #1187

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Faster adaptive_p sampling#1165

Faster adaptive_p sampling#1165
ikawrakow merged 8 commits intomainfrom
ik/adaptive_p_2

ikawrakow commented Jan 19, 2026

Uh oh!

ikawrakow commented Jan 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ikawrakow commented Jan 19, 2026

Main branch

PR #1161

This PR

Uh oh!

ikawrakow commented Jan 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant