sampling: add Top-nσ sampler by VJHack · Pull Request #11223 · ggml-org/llama.cpp

VJHack · 2025-01-14T00:26:29Z

Top-nσ: Not All Logits Are You Need

https://arxiv.org/pdf/2411.07641
The authors of this paper propose a new sampling method known as Top-nσ. The main feature of this sampler is that "unlike existing methods (e.g., top-p, min-p) that inadvertently include more noise tokens at higher temperatures, top-nσ maintains a stable sampling space regardless of temperature scaling". They discovered that logits natually separate into a gaussian-distributed noisy region and an informative region.

This PR implements the sampling method proposed in the paper. Here the algorithm implemented from the paper:

Since the manipulation is done directly on the logits pre-softmax, I added it as a stand-alone sampler instead of chaining it with the common samplers. The changes only add support for llama-cli.
sampler chain: logits -> logit-bias -> temp -> top-n-sigma -> dist

I'm aware that this algorithm is still in it's early phases so we could tag this as demo for now but I'll leave that choice up to the maintainers.

resolves #11057

Relavent Links:
https://huggingface.co/papers/2411.07641
https://arxiv.org/pdf/2411.07641
https://github.com/Tomorrowdawn/top_nsigma
#11057

MaggotHATE · 2025-01-14T16:12:02Z

Thank you for this implementation! Top-nσ is definitely special and needs a lot of testing.

I like the results so far, especially since high temperature is not a problem, as shown in the paper, and I'm going to test it more and see what its limitations are.

common/sampling.cpp

hdu-hh · 2025-02-11T21:07:10Z

Sorry for the ping, but what is needed so that this approved PR gets merged? I like the new feature very much as it allows the benefits of a higher temperature without the usual drawbacks.

VJHack · 2025-02-12T01:42:31Z

@hdu-hh I agree! This would be a cool feature to have but I don't think it's a priority for the maintainers at the moment.
They'll get around to it if they see it benefiting the community. This isn't one of the mainstream sampling algos so it's understandable if they don't accept it. We should label it as demo at the very least.

src/llama-sampling.cpp

common/sampling.cpp

src/llama-sampling.cpp

Co-authored-by: Georgi Gerganov <[email protected]>

* initial sampling changes: * completed top nsigma sampler implementation * apply parameter to only llama-cli * updated readme * added tests and fixed nsigma impl * cleaned up pr * format * format * format * removed commented tests * cleanup pr and remove explicit floats * added top-k sampler to improve performance * changed sigma to float * fixed string format to float * Update src/llama-sampling.cpp Co-authored-by: Georgi Gerganov <[email protected]> * Update common/sampling.cpp Co-authored-by: Georgi Gerganov <[email protected]> * Update src/llama-sampling.cpp Co-authored-by: Georgi Gerganov <[email protected]> * Update src/llama-sampling.cpp Co-authored-by: Georgi Gerganov <[email protected]> * Update src/llama-sampling.cpp Co-authored-by: Georgi Gerganov <[email protected]> * Update src/llama-sampling.cpp Co-authored-by: Georgi Gerganov <[email protected]> * added llama_sampler_init --------- Co-authored-by: Georgi Gerganov <[email protected]>

- update libllama to match `llama.cpp/include/llama.h@71e90e8813f90097701e62f7fce137d96ddf41e2` - add top-n-sigma sampler (ref: [llama.cpp#11223](ggml-org/llama.cpp#11223)) - add missing newlines to Gemma3 prompt format - add new `Llama.n_head_kv()`, `Llama.bpw()`, `Llama.warmup()` - add sampler parameter info the the sampler chain string - adopt new sampler parameter defaults (temp=1.0, min_p=0.1, all others neutral) - improved logic for applying the penalties sampler - warming-up the model now does a single token decode as well as a full batch decode - fix `Llama.chat_template()` - fix return type annotations for some token methods of Llama (`int` -> `Optional[int]`) - remove old sampler presets - continue working on new server + webui

SmallAndSoft · 2025-07-11T16:50:05Z

We are just a small step away from bringing it to users: #14637

VJHack added 6 commits January 9, 2025 23:04

initial sampling changes:

ddc3c22

completed top nsigma sampler implementation

da038d8

apply parameter to only llama-cli

bee4c7c

updated readme

8fb681b

added tests and fixed nsigma impl

54ef105

cleaned up pr

d905a9e

github-actions bot added the examples label Jan 14, 2025

resolve merge conflicts

66cffa8

github-actions bot added the testing Everything test related label Jan 14, 2025

VJHack added 3 commits January 13, 2025 19:05

format

a590dcb

format

0f7501c

format

b29deb8

VJHack marked this pull request as ready for review January 14, 2025 01:12

VJHack added 2 commits January 13, 2025 20:32

removed commented tests

f08e6f5

cleanup pr and remove explicit floats

6664d47

slaren reviewed Jan 16, 2025

View reviewed changes

common/sampling.cpp Outdated Show resolved Hide resolved

added top-k sampler to improve performance

c6123e6

VJHack mentioned this pull request Jan 17, 2025

evaluation code pls Tomorrowdawn/top_nsigma#1

Closed

VJHack added 2 commits January 19, 2025 22:40

changed sigma to float

6c1ca58

fixed string format to float

a52e023

VJHack requested a review from slaren January 21, 2025 04:01

slaren approved these changes Jan 24, 2025

View reviewed changes

ggerganov approved these changes Feb 12, 2025

View reviewed changes

VJHack and others added 4 commits February 12, 2025 12:27

Update src/llama-sampling.cpp

36bcff8

Co-authored-by: Georgi Gerganov <[email protected]>

Update common/sampling.cpp

abedbec

Co-authored-by: Georgi Gerganov <[email protected]>

Update src/llama-sampling.cpp

1c05285

Co-authored-by: Georgi Gerganov <[email protected]>

Update src/llama-sampling.cpp

37ed1ad

Co-authored-by: Georgi Gerganov <[email protected]>

VJHack and others added 4 commits February 12, 2025 12:28

Update src/llama-sampling.cpp

f1320cd

Co-authored-by: Georgi Gerganov <[email protected]>

Update src/llama-sampling.cpp

2d452bb

Co-authored-by: Georgi Gerganov <[email protected]>

Merge branch 'master' into nsigma-sampling

933d601

added llama_sampler_init

38de7ec

ggerganov merged commit 27e8a23 into ggml-org:master Feb 13, 2025
46 checks passed

MaggotHATE mentioned this pull request Feb 16, 2025

sampling: add Top-nσ sampler to llama-server #11896

Closed

focomfy mentioned this pull request Mar 4, 2025

Please allow setting the parameter of top_n_sigma in ModelFile ollama/ollama#9495

Open

oobabooga mentioned this pull request May 2, 2025

sampling: Integrate Top-nσ into main sampling chain (and add it to the server) #13264

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sampling: add Top-nσ sampler#11223

sampling: add Top-nσ sampler#11223
ggerganov merged 23 commits intoggml-org:masterfrom
VJHack:nsigma-sampling

VJHack commented Jan 14, 2025 •

edited

Loading

Uh oh!

MaggotHATE commented Jan 14, 2025

Uh oh!

Uh oh!

hdu-hh commented Feb 11, 2025

Uh oh!

VJHack commented Feb 12, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

SmallAndSoft commented Jul 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Conversation

VJHack commented Jan 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Top-nσ: Not All Logits Are You Need

Uh oh!

MaggotHATE commented Jan 14, 2025

Uh oh!

Uh oh!

hdu-hh commented Feb 11, 2025

Uh oh!

VJHack commented Feb 12, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

SmallAndSoft commented Jul 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

VJHack commented Jan 14, 2025 •

edited

Loading