sampling: add Top-nσ sampler#11223
Conversation
|
Thank you for this implementation! Top-nσ is definitely special and needs a lot of testing. I like the results so far, especially since high temperature is not a problem, as shown in the paper, and I'm going to test it more and see what its limitations are. |
|
Sorry for the ping, but what is needed so that this approved PR gets merged? I like the new feature very much as it allows the benefits of a higher temperature without the usual drawbacks. |
|
@hdu-hh I agree! This would be a cool feature to have but I don't think it's a priority for the maintainers at the moment. |
Co-authored-by: Georgi Gerganov <[email protected]>
Co-authored-by: Georgi Gerganov <[email protected]>
Co-authored-by: Georgi Gerganov <[email protected]>
Co-authored-by: Georgi Gerganov <[email protected]>
Co-authored-by: Georgi Gerganov <[email protected]>
Co-authored-by: Georgi Gerganov <[email protected]>
* initial sampling changes: * completed top nsigma sampler implementation * apply parameter to only llama-cli * updated readme * added tests and fixed nsigma impl * cleaned up pr * format * format * format * removed commented tests * cleanup pr and remove explicit floats * added top-k sampler to improve performance * changed sigma to float * fixed string format to float * Update src/llama-sampling.cpp Co-authored-by: Georgi Gerganov <[email protected]> * Update common/sampling.cpp Co-authored-by: Georgi Gerganov <[email protected]> * Update src/llama-sampling.cpp Co-authored-by: Georgi Gerganov <[email protected]> * Update src/llama-sampling.cpp Co-authored-by: Georgi Gerganov <[email protected]> * Update src/llama-sampling.cpp Co-authored-by: Georgi Gerganov <[email protected]> * Update src/llama-sampling.cpp Co-authored-by: Georgi Gerganov <[email protected]> * added llama_sampler_init --------- Co-authored-by: Georgi Gerganov <[email protected]>
* initial sampling changes: * completed top nsigma sampler implementation * apply parameter to only llama-cli * updated readme * added tests and fixed nsigma impl * cleaned up pr * format * format * format * removed commented tests * cleanup pr and remove explicit floats * added top-k sampler to improve performance * changed sigma to float * fixed string format to float * Update src/llama-sampling.cpp Co-authored-by: Georgi Gerganov <[email protected]> * Update common/sampling.cpp Co-authored-by: Georgi Gerganov <[email protected]> * Update src/llama-sampling.cpp Co-authored-by: Georgi Gerganov <[email protected]> * Update src/llama-sampling.cpp Co-authored-by: Georgi Gerganov <[email protected]> * Update src/llama-sampling.cpp Co-authored-by: Georgi Gerganov <[email protected]> * Update src/llama-sampling.cpp Co-authored-by: Georgi Gerganov <[email protected]> * added llama_sampler_init --------- Co-authored-by: Georgi Gerganov <[email protected]>
- update libllama to match `llama.cpp/include/llama.h@71e90e8813f90097701e62f7fce137d96ddf41e2` - add top-n-sigma sampler (ref: [llama.cpp#11223](ggml-org/llama.cpp#11223)) - add missing newlines to Gemma3 prompt format - add new `Llama.n_head_kv()`, `Llama.bpw()`, `Llama.warmup()` - add sampler parameter info the the sampler chain string - adopt new sampler parameter defaults (temp=1.0, min_p=0.1, all others neutral) - improved logic for applying the penalties sampler - warming-up the model now does a single token decode as well as a full batch decode - fix `Llama.chat_template()` - fix return type annotations for some token methods of Llama (`int` -> `Optional[int]`) - remove old sampler presets - continue working on new server + webui
|
We are just a small step away from bringing it to users: #14637 |
Top-nσ: Not All Logits Are You Need
https://arxiv.org/pdf/2411.07641
The authors of this paper propose a new sampling method known as Top-nσ. The main feature of this sampler is that "unlike existing methods (e.g., top-p, min-p) that inadvertently include more noise tokens at higher temperatures, top-nσ maintains a stable sampling space regardless of temperature scaling". They discovered that logits natually separate into a gaussian-distributed noisy region and an informative region.
This PR implements the sampling method proposed in the paper. Here the algorithm implemented from the paper:

Since the manipulation is done directly on the logits pre-softmax, I added it as a stand-alone sampler instead of chaining it with the common samplers. The changes only add support for
llama-cli.sampler chain: logits -> logit-bias -> temp -> top-n-sigma -> distI'm aware that this algorithm is still in it's early phases so we could tag this as demo for now but I'll leave that choice up to the maintainers.
resolves #11057
Relavent Links:
https://huggingface.co/papers/2411.07641
https://arxiv.org/pdf/2411.07641
https://github.com/Tomorrowdawn/top_nsigma
#11057