Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
92 commits
Select commit Hold shift + click to select a range
774cf23
initial commit for branch
ddh0 Dec 11, 2025
5ab4ff7
simplify constants
ddh0 Dec 11, 2025
66e2d17
Merge branch 'ggml-org:master' into power-law-sampler
ddh0 Dec 11, 2025
88fb0f3
add params to `struct common_params_sampling`, add reference to PR
ddh0 Dec 11, 2025
374bfd4
explicitly clamp `min_target` and `max_target` to `[0.0, 1.0]`
ddh0 Dec 11, 2025
ffe1639
add args, rename `queue_size` -> `window_size`
ddh0 Dec 11, 2025
4959878
improved comments
ddh0 Dec 11, 2025
f3457a8
minor
ddh0 Dec 11, 2025
9316959
remove old unused code from algorithm
ddh0 Dec 11, 2025
b3aea57
minor
ddh0 Dec 11, 2025
cd7de7c
add power law case to `common_sampler_init`, add sampler name mappings
ddh0 Dec 11, 2025
534cb4f
clarify behaviour when `window_size = 0`
ddh0 Dec 11, 2025
dcada03
add missing enums
ddh0 Dec 11, 2025
2d62bbe
remove `target_range` param, make `target == 1` no-op, cleanup code
ddh0 Dec 12, 2025
5c78b79
oops, straggler
ddh0 Dec 12, 2025
53380c1
add missing parameters in `server-task.cpp`
ddh0 Dec 13, 2025
94cb883
copy from author
ddh0 Dec 13, 2025
0a19a3f
remove old debug log, style nit
ddh0 Dec 13, 2025
824bb3a
fix compiler warning, add commented-out logging per token
ddh0 Dec 13, 2025
1879fc6
Merge branch 'ggml-org:master' into power-law-sampler
ddh0 Dec 13, 2025
67a7336
Merge branch 'ggml-org:master' into power-law-sampler
ddh0 Dec 13, 2025
a96ddd7
re-write + change parameters + simplify
ddh0 Dec 14, 2025
b8a9626
oops forgot args.cpp
ddh0 Dec 14, 2025
965bcc9
fix leftover `window_size`
ddh0 Dec 14, 2025
d1e5c60
add missing values to `common_params_sampling::print()`
ddh0 Dec 14, 2025
9613c48
with logging
ddh0 Dec 14, 2025
2a3f579
does this fix it?
ddh0 Dec 14, 2025
ec54fe5
no, but does this?
ddh0 Dec 14, 2025
667b70f
update default decay
ddh0 Dec 14, 2025
36b526d
Merge branch 'master' into power-law-sampler
ddh0 Dec 14, 2025
6934780
optimize
ddh0 Dec 14, 2025
f5d0872
fix bad merge
ddh0 Dec 15, 2025
493bf30
silence `missing initializer for member`
ddh0 Dec 15, 2025
6854325
update default decay to 0.9
ddh0 Dec 15, 2025
b5ed673
fix logging
ddh0 Dec 15, 2025
4e28eb2
format (double)
ddh0 Dec 15, 2025
1c58e9a
add power law to the new `samplers` vector
ddh0 Dec 15, 2025
4e04bd1
log sampler init values
ddh0 Dec 15, 2025
6e66095
Merge branch 'ggml-org:master' into power-law-sampler
ddh0 Dec 15, 2025
9c50b57
improve logging messages in llama_sampler_power_law
ddh0 Dec 15, 2025
0344068
remove extraneous logging
ddh0 Dec 15, 2025
1c2d2e9
simplify target computation
ddh0 Dec 16, 2025
85b6e52
Merge branch 'ggml-org:master' into power-law-sampler
ddh0 Dec 16, 2025
fcb5129
remove debug logging, explicitly clamp params at init
ddh0 Dec 16, 2025
58aa1c6
Merge branch 'ggml-org:master' into power-law-sampler
ddh0 Dec 16, 2025
27dda80
Merge branch 'ggml-org:master' into power-law-sampler
ddh0 Dec 17, 2025
7752998
add `use_power_law` flag + logic, minor cleanup
ddh0 Dec 17, 2025
6023572
Merge branch 'ggml-org:master' into power-law-sampler
ddh0 Dec 18, 2025
dedbe36
Merge branch 'ggml-org:master' into power-law-sampler
ddh0 Dec 18, 2025
f4703d4
Merge branch 'ggml-org:master' into power-law-sampler
ddh0 Dec 19, 2025
89ebdf0
Merge branch 'ggml-org:master' into power-law-sampler
ddh0 Dec 21, 2025
55ad4a8
Merge branch 'ggml-org:master' into power-law-sampler
ddh0 Dec 21, 2025
6bad4ae
Merge branch 'ggml-org:master' into power-law-sampler
ddh0 Dec 22, 2025
295d1d8
Merge branch 'ggml-org:master' into power-law-sampler
ddh0 Dec 23, 2025
ed2890e
Merge branch 'ggml-org:master' into power-law-sampler
ddh0 Dec 25, 2025
51070e0
Merge branch 'ggml-org:master' into power-law-sampler
ddh0 Dec 26, 2025
90f3bfb
Merge branch 'ggml-org:master' into power-law-sampler
ddh0 Dec 27, 2025
b95b088
update `power-law` -> `adaptive-p`
ddh0 Dec 27, 2025
f0d3f13
Merge branch 'ggml-org:master' into power-law-sampler
ddh0 Dec 29, 2025
e7a8920
fix cold start EMA
ddh0 Dec 29, 2025
05d7dc9
Merge branch 'ggml-org:master' into power-law-sampler
ddh0 Dec 30, 2025
2d67b1c
Merge branch 'ggml-org:master' into power-law-sampler
ddh0 Dec 30, 2025
c6a6f63
update `SHARPNESS` constant to `10.0f`
ddh0 Dec 30, 2025
0807499
minor style fixes
ddh0 Dec 30, 2025
eb854e7
minor style fixes cont.
ddh0 Dec 30, 2025
55757dc
Merge branch 'ggml-org:master' into power-law-sampler
ddh0 Dec 31, 2025
660a3b2
Merge branch 'ggml-org:master' into power-law-sampler
ddh0 Jan 2, 2026
7173e84
Merge branch 'ggml-org:master' into power-law-sampler
ddh0 Jan 4, 2026
c27df51
update `llama_sampler_adaptive_p_i` for backend sampling (ref: #17004)
ddh0 Jan 4, 2026
5fdc530
Merge branch 'ggml-org:master' into power-law-sampler
ddh0 Jan 5, 2026
0400611
Merge branch 'ggml-org:master' into power-law-sampler
ddh0 Jan 5, 2026
684c5ff
Merge branch 'ggml-org:master' into power-law-sampler
ddh0 Jan 6, 2026
7ffd3a8
Merge branch 'ggml-org:master' into power-law-sampler
ddh0 Jan 7, 2026
f48413c
Merge branch 'ggml-org:master' into power-law-sampler
ddh0 Jan 7, 2026
bef75d9
separate into `apply` + `accept` functions
ddh0 Jan 8, 2026
8b1292a
Merge branch 'ggml-org:master' into power-law-sampler
ddh0 Jan 8, 2026
e99a4a6
`pending_token_idx`: switch from `llama_token` to `int32`
ddh0 Jan 9, 2026
af0596c
Merge branch 'ggml-org:master' into power-law-sampler
ddh0 Jan 10, 2026
5f04265
Merge branch 'ggml-org:master' into power-law-sampler
ddh0 Jan 11, 2026
7f40928
Merge branch 'ggml-org:master' into power-law-sampler
ddh0 Jan 12, 2026
3aa23f3
don't transform logits <= -1e9f
ddh0 Jan 12, 2026
1eff502
Merge branch 'ggml-org:master' into power-law-sampler
ddh0 Jan 13, 2026
d21c87e
fix masking in backend top-p, min-p
ddh0 Jan 13, 2026
4b92e3a
Merge branch 'ggml-org:master' into power-law-sampler
ddh0 Jan 13, 2026
33c635e
address review comments
ddh0 Jan 14, 2026
4b06e08
typo in comments `RND` -> `RNG`
ddh0 Jan 14, 2026
42af39d
add docs
ddh0 Jan 14, 2026
81af54c
add recommended values in completion docs
ddh0 Jan 14, 2026
40fd48f
address PR feedback
ddh0 Jan 14, 2026
b6041b1
remove trailing whitespace (for CI `editorconfig`)
ddh0 Jan 15, 2026
f222e17
Merge branch 'ggml-org:master' into power-law-sampler
ddh0 Jan 15, 2026
d7e3b86
add to adaptive-p to `common_sampler_types_from_chars`
ddh0 Jan 15, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 20 additions & 0 deletions common/arg.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1729,6 +1729,26 @@ common_params_context common_params_parser_init(common_params & params, llama_ex
}
}
).set_sparam());
add_opt(common_arg(
{"--adaptive-target"}, "N",
string_format("adaptive-p: select tokens near this probability (valid range 0.0 "
"to 1.0; negative = disabled) (default: %.2f)\n"
"[(more info)](https://github.com/ggml-org/llama.cpp/pull/17927)",
(double)params.sampling.adaptive_target),
[](common_params & params, const std::string & value) {
params.sampling.adaptive_target = std::stof(value);
}
).set_sparam());
add_opt(common_arg(
{"--adaptive-decay"}, "N",
string_format("adaptive-p: decay rate for target adaptation over time. lower values "
"are more reactive, higher values are more stable.\n"
"(valid range 0.0 to 0.99) (default: %.2f)",
(double)params.sampling.adaptive_decay),
[](common_params & params, const std::string & value) {
params.sampling.adaptive_decay = std::stof(value);
}
).set_sparam());
add_opt(common_arg(
{"--dynatemp-range"}, "N",
string_format("dynamic temperature range (default: %.1f, 0.0 = disabled)", (double)params.sampling.dynatemp_range),
Expand Down
53 changes: 28 additions & 25 deletions common/common.h
Original file line number Diff line number Diff line change
Expand Up @@ -119,6 +119,7 @@ enum common_sampler_type {
COMMON_SAMPLER_TYPE_INFILL = 9,
COMMON_SAMPLER_TYPE_PENALTIES = 10,
COMMON_SAMPLER_TYPE_TOP_N_SIGMA = 11,
COMMON_SAMPLER_TYPE_ADAPTIVE_P = 12,
};

// dimensionality reduction methods, used by cvector-generator
Expand Down Expand Up @@ -166,32 +167,34 @@ enum common_params_sampling_config : uint64_t {
struct common_params_sampling {
uint32_t seed = LLAMA_DEFAULT_SEED; // the seed used to initialize llama_sampler

int32_t n_prev = 64; // number of previous tokens to remember
int32_t n_probs = 0; // if greater than 0, output the probabilities of top n_probs tokens.
int32_t min_keep = 0; // 0 = disabled, otherwise samplers should return at least min_keep tokens
int32_t top_k = 40; // <= 0 to use vocab size
float top_p = 0.95f; // 1.0 = disabled
float min_p = 0.05f; // 0.0 = disabled
float xtc_probability = 0.00f; // 0.0 = disabled
float xtc_threshold = 0.10f; // > 0.5 disables XTC
float typ_p = 1.00f; // typical_p, 1.0 = disabled
float temp = 0.80f; // <= 0.0 to sample greedily, 0.0 to not output probabilities
float dynatemp_range = 0.00f; // 0.0 = disabled
float dynatemp_exponent = 1.00f; // controls how entropy maps to temperature in dynamic temperature sampler
int32_t penalty_last_n = 64; // last n tokens to penalize (0 = disable penalty, -1 = context size)
float penalty_repeat = 1.00f; // 1.0 = disabled
float penalty_freq = 0.00f; // 0.0 = disabled
float penalty_present = 0.00f; // 0.0 = disabled
float dry_multiplier = 0.0f; // 0.0 = disabled; DRY repetition penalty for tokens extending repetition:
float dry_base = 1.75f; // 0.0 = disabled; multiplier * base ^ (length of sequence before token - allowed length)
int32_t dry_allowed_length = 2; // tokens extending repetitions beyond this receive penalty
int32_t dry_penalty_last_n = -1; // how many tokens to scan for repetitions (0 = disable penalty, -1 = context size)
int32_t mirostat = 0; // 0 = disabled, 1 = mirostat, 2 = mirostat 2.0
float top_n_sigma = -1.00f;// -1.0 = disabled
float mirostat_tau = 5.00f; // target entropy
float mirostat_eta = 0.10f; // learning rate
int32_t n_prev = 64; // number of previous tokens to remember
int32_t n_probs = 0; // if greater than 0, output the probabilities of top n_probs tokens.
int32_t min_keep = 0; // 0 = disabled, otherwise samplers should return at least min_keep tokens
int32_t top_k = 40; // <= 0 to use vocab size
float top_p = 0.95f; // 1.0 = disabled
float min_p = 0.05f; // 0.0 = disabled
float xtc_probability = 0.00f; // 0.0 = disabled
float xtc_threshold = 0.10f; // > 0.5 disables XTC
float typ_p = 1.00f; // typical_p, 1.0 = disabled
float temp = 0.80f; // <= 0.0 to sample greedily, 0.0 to not output probabilities
float dynatemp_range = 0.00f; // 0.0 = disabled
float dynatemp_exponent = 1.00f; // controls how entropy maps to temperature in dynamic temperature sampler
int32_t penalty_last_n = 64; // last n tokens to penalize (0 = disable penalty, -1 = context size)
float penalty_repeat = 1.00f; // 1.0 = disabled
float penalty_freq = 0.00f; // 0.0 = disabled
float penalty_present = 0.00f; // 0.0 = disabled
float dry_multiplier = 0.0f; // 0.0 = disabled; DRY repetition penalty for tokens extending repetition:
float dry_base = 1.75f; // 0.0 = disabled; multiplier * base ^ (length of sequence before token - allowed length)
int32_t dry_allowed_length = 2; // tokens extending repetitions beyond this receive penalty
int32_t dry_penalty_last_n = -1; // how many tokens to scan for repetitions (0 = disable penalty, -1 = context size)
float adaptive_target = -1.0f; // select tokens near this probability (valid range 0.0 to 1.0; negative = disabled)
float adaptive_decay = 0.90f; // EMA decay for adaptation; history ≈ 1/(1-decay) tokens (0.0 - 0.99)
int32_t mirostat = 0; // 0 = disabled, 1 = mirostat, 2 = mirostat 2.0
float top_n_sigma = -1.00f; // -1.0 = disabled
float mirostat_tau = 5.00f; // target entropy
float mirostat_eta = 0.10f; // learning rate
bool ignore_eos = false;
bool no_perf = false; // disable performance metrics
bool no_perf = false; // disable performance metrics
bool timing_per_token = false;

uint64_t user_sampling_config = 0; // bitfield to track user-specified samplers
Expand Down
47 changes: 33 additions & 14 deletions common/sampling.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -167,11 +167,11 @@ std::string common_params_sampling::print() const {
"\trepeat_last_n = %d, repeat_penalty = %.3f, frequency_penalty = %.3f, presence_penalty = %.3f\n"
"\tdry_multiplier = %.3f, dry_base = %.3f, dry_allowed_length = %d, dry_penalty_last_n = %d\n"
"\ttop_k = %d, top_p = %.3f, min_p = %.3f, xtc_probability = %.3f, xtc_threshold = %.3f, typical_p = %.3f, top_n_sigma = %.3f, temp = %.3f\n"
"\tmirostat = %d, mirostat_lr = %.3f, mirostat_ent = %.3f",
"\tmirostat = %d, mirostat_lr = %.3f, mirostat_ent = %.3f, adaptive_target = %.3f, adaptive_decay = %.3f",
penalty_last_n, penalty_repeat, penalty_freq, penalty_present,
dry_multiplier, dry_base, dry_allowed_length, dry_penalty_last_n,
top_k, top_p, min_p, xtc_probability, xtc_threshold, typ_p, top_n_sigma, temp,
mirostat, mirostat_eta, mirostat_tau);
mirostat, mirostat_eta, mirostat_tau, adaptive_target, adaptive_decay);

return std::string(result);
}
Expand Down Expand Up @@ -255,6 +255,9 @@ struct common_sampler * common_sampler_init(const struct llama_model * model, st
}

if (params.mirostat == 0) {

bool use_adaptive_p = false; // see below

for (const auto & cnstr : params.samplers) {
switch (cnstr) {
case COMMON_SAMPLER_TYPE_DRY:
Expand All @@ -264,43 +267,54 @@ struct common_sampler * common_sampler_init(const struct llama_model * model, st
for (const auto & str : params.dry_sequence_breakers) {
c_breakers.push_back(str.c_str());
}

samplers.push_back(llama_sampler_init_dry (vocab, llama_model_n_ctx_train(model), params.dry_multiplier, params.dry_base, params.dry_allowed_length, params.dry_penalty_last_n, c_breakers.data(), c_breakers.size()));
samplers.push_back(llama_sampler_init_dry(vocab, llama_model_n_ctx_train(model), params.dry_multiplier, params.dry_base, params.dry_allowed_length, params.dry_penalty_last_n, c_breakers.data(), c_breakers.size()));
}
break;
case COMMON_SAMPLER_TYPE_TOP_K:
samplers.push_back(llama_sampler_init_top_k (params.top_k));
samplers.push_back(llama_sampler_init_top_k(params.top_k));
break;
case COMMON_SAMPLER_TYPE_TOP_P:
samplers.push_back(llama_sampler_init_top_p (params.top_p, params.min_keep));
samplers.push_back(llama_sampler_init_top_p(params.top_p, params.min_keep));
break;
case COMMON_SAMPLER_TYPE_TOP_N_SIGMA:
samplers.push_back(llama_sampler_init_top_n_sigma(params.top_n_sigma));
break;
case COMMON_SAMPLER_TYPE_MIN_P:
samplers.push_back(llama_sampler_init_min_p (params.min_p, params.min_keep));
samplers.push_back(llama_sampler_init_min_p(params.min_p, params.min_keep));
break;
case COMMON_SAMPLER_TYPE_XTC:
samplers.push_back(llama_sampler_init_xtc (params.xtc_probability, params.xtc_threshold, params.min_keep, params.seed));
samplers.push_back(llama_sampler_init_xtc(params.xtc_probability, params.xtc_threshold, params.min_keep, params.seed));
break;
case COMMON_SAMPLER_TYPE_TYPICAL_P:
samplers.push_back(llama_sampler_init_typical (params.typ_p, params.min_keep));
samplers.push_back(llama_sampler_init_typical(params.typ_p, params.min_keep));
break;
case COMMON_SAMPLER_TYPE_TEMPERATURE:
samplers.push_back(llama_sampler_init_temp_ext (params.temp, params.dynatemp_range, params.dynatemp_exponent));
samplers.push_back(llama_sampler_init_temp_ext(params.temp, params.dynatemp_range, params.dynatemp_exponent));
break;
case COMMON_SAMPLER_TYPE_INFILL:
samplers.push_back(llama_sampler_init_infill (vocab));
samplers.push_back(llama_sampler_init_infill(vocab));
break;
case COMMON_SAMPLER_TYPE_PENALTIES:
samplers.push_back(llama_sampler_init_penalties (params.penalty_last_n, params.penalty_repeat, params.penalty_freq, params.penalty_present));
samplers.push_back(llama_sampler_init_penalties(params.penalty_last_n, params.penalty_repeat, params.penalty_freq, params.penalty_present));
break;
case COMMON_SAMPLER_TYPE_ADAPTIVE_P:
// the `adaptive-p` sampler is like `dist` and `mirostat` in that it selects
// a single token, so we will add `dist` at the end of the chain by default,
// unless the user specifically included `adaptive-p`. we set this flag here
// so we know to add the sampler at the very end.
use_adaptive_p = true;
break;
default:
GGML_ASSERT(false && "unknown sampler type");
}
}

samplers.push_back(llama_sampler_init_dist(params.seed));
if (use_adaptive_p) {
// only if user explicitly included adaptive-p sampler
samplers.push_back(llama_sampler_init_adaptive_p(params.adaptive_target, params.adaptive_decay, params.seed));
} else {
// default: sample from distribution
samplers.push_back(llama_sampler_init_dist(params.seed));
}
} else if (params.mirostat == 1) {
samplers.push_back(llama_sampler_init_temp(params.temp));
samplers.push_back(llama_sampler_init_mirostat(llama_vocab_n_tokens(vocab), params.seed, params.mirostat_tau, params.mirostat_eta, 100));
Expand Down Expand Up @@ -625,6 +639,7 @@ char common_sampler_type_to_chr(enum common_sampler_type cnstr) {
case COMMON_SAMPLER_TYPE_XTC: return 'x';
case COMMON_SAMPLER_TYPE_INFILL: return 'i';
case COMMON_SAMPLER_TYPE_PENALTIES: return 'e';
case COMMON_SAMPLER_TYPE_ADAPTIVE_P: return 'a';
default : return '?';
}
}
Expand All @@ -641,6 +656,7 @@ std::string common_sampler_type_to_str(enum common_sampler_type cnstr) {
case COMMON_SAMPLER_TYPE_XTC: return "xtc";
case COMMON_SAMPLER_TYPE_INFILL: return "infill";
case COMMON_SAMPLER_TYPE_PENALTIES: return "penalties";
case COMMON_SAMPLER_TYPE_ADAPTIVE_P: return "adaptive_p";
default : return "";
}
}
Expand All @@ -657,6 +673,7 @@ std::vector<common_sampler_type> common_sampler_types_from_names(const std::vect
{ "xtc", COMMON_SAMPLER_TYPE_XTC },
{ "infill", COMMON_SAMPLER_TYPE_INFILL },
{ "penalties", COMMON_SAMPLER_TYPE_PENALTIES },
{ "adaptive_p", COMMON_SAMPLER_TYPE_ADAPTIVE_P },
};

// since samplers names are written multiple ways
Expand All @@ -672,6 +689,7 @@ std::vector<common_sampler_type> common_sampler_types_from_names(const std::vect
{ "typ", COMMON_SAMPLER_TYPE_TYPICAL_P },
{ "min-p", COMMON_SAMPLER_TYPE_MIN_P },
{ "temp", COMMON_SAMPLER_TYPE_TEMPERATURE },
{ "adaptive-p", COMMON_SAMPLER_TYPE_ADAPTIVE_P },
};

std::vector<common_sampler_type> samplers;
Expand Down Expand Up @@ -708,6 +726,7 @@ std::vector<common_sampler_type> common_sampler_types_from_chars(const std::stri
{ common_sampler_type_to_chr(COMMON_SAMPLER_TYPE_XTC), COMMON_SAMPLER_TYPE_XTC },
{ common_sampler_type_to_chr(COMMON_SAMPLER_TYPE_INFILL), COMMON_SAMPLER_TYPE_INFILL },
{ common_sampler_type_to_chr(COMMON_SAMPLER_TYPE_PENALTIES), COMMON_SAMPLER_TYPE_PENALTIES },
{ common_sampler_type_to_chr(COMMON_SAMPLER_TYPE_ADAPTIVE_P), COMMON_SAMPLER_TYPE_ADAPTIVE_P },
};

std::vector<common_sampler_type> samplers;
Expand Down
27 changes: 27 additions & 0 deletions include/llama.h
Original file line number Diff line number Diff line change
Expand Up @@ -1395,6 +1395,33 @@ extern "C" {
const char ** seq_breakers,
size_t num_breakers);

/// adaptive-p: select tokens near a configurable target probability over time.
///
/// the adaptive-p sampler transforms the token probability distribution to favor tokens
/// that fall near a user-configurable probability target.
///
/// internally, the sampler maintains an exponential moving average of the *ORIGINAL*
/// probabilities of selected tokens at each sampling step. it uses this EMA to compute an
/// adapted target probability at each sampling step, thus maintaining the desired target
/// probability over time.
///
/// adaptive-p selects a token ID rather than just mutating candidates, so it must be last
/// in the sampler chain (like mirostat, dist, greedy).
///
/// only mild truncation before this sampler is recommended. we suggest applying min-p
/// before adaptive-p as the only other active sampler in the chain.
///
/// @param target select tokens near this probability (valid range 0.0 to 1.0; negative = disabled)
/// @param decay EMA decay for adaptation; history ≈ 1/(1-decay) tokens (valid range 0.0 - 0.99)
/// @param seed RNG seed
///
/// ref: https://github.com/ggml-org/llama.cpp/pull/17927
///
LLAMA_API struct llama_sampler * llama_sampler_init_adaptive_p(
float target,
float decay,
uint32_t seed);

LLAMA_API struct llama_sampler * llama_sampler_init_logit_bias(
int32_t n_vocab,
int32_t n_logit_bias,
Expand Down
Loading
Loading