Skip to content

UPSTREAM PR #18056: common: fix --override-kv to support comma-separated values#576

Open
loci-dev wants to merge 1 commit intomainfrom
upstream-PR18056-branch_ServeurpersoCom-fix/override-kv-csv
Open

UPSTREAM PR #18056: common: fix --override-kv to support comma-separated values#576
loci-dev wants to merge 1 commit intomainfrom
upstream-PR18056-branch_ServeurpersoCom-fix/override-kv-csv

Conversation

@loci-dev
Copy link
Copy Markdown

Mirrored from ggml-org/llama.cpp#18056

Make sure to read the contributing guidelines before submitting a PR

Two KV override working :

(root|~/llama.cpp.pascal) ./build/bin/llama-server --port 8081 \
  --model /var/www/ia/models/mradermacher/gemma-3-1b-it-i1-GGUF/gemma-3-1b-it.i1-Q6_K.gguf \
  --override-kv "tokenizer.ggml.add_bos_token=bool:false,tokenizer.ggml.add_eos_token=bool:false"
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
... etc...
print_info: file size   = 958.64 MiB (8.04 BPW)
validate_override: Using metadata override ( bool) 'tokenizer.ggml.add_bos_token' = false
validate_override: Using metadata override ( bool) 'tokenizer.ggml.add_eos_token' = false
load: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect
... etc...

Help message :

(root|~/llama.cpp.pascal) ./build/bin/llama-server --port 8081 \
  --model /var/www/ia/models/mradermacher/gemma-3-1b-it-i1-GGUF/gemma-3-1b-it.i1-Q6_K.gguf \
  --override-kv "tokenizer.ggml.add_bos_token=bool:false,tokenizer.ggml.add_eos_token=INVALID:blah"
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 5090, compute capability 12.0, VMM: yes
string_parse_kv_override: invalid type for KV override 'tokenizer.ggml.add_eos_token=INVALID:blah'
error while handling argument "--override-kv": error: Invalid type for KV override: tokenizer.ggml.add_eos_token=INVALID:blah


usage:
--override-kv KEY=TYPE:VALUE,...        advanced option to override model metadata by key. use comma-separated
                                        list of overrides.
                                        types: int, float, bool, str. example: --override-kv
                                        tokenizer.ggml.add_bos_token=bool:false,tokenizer.ggml.add_eos_token=bool:false


to show complete usage, run with -h
(root|~/llama.cpp.pascal)

Fixes #18040

@loci-review
Copy link
Copy Markdown

loci-review bot commented Dec 15, 2025

Explore the complete analysis inside the Version Insights

Performance Analysis Summary: PR #576

Project: llama.cpp
Change: Enable comma-separated values for --override-kv argument
Modified File: common/arg.cpp (1 file, +7/-5 lines)


Analysis Result

This PR introduces a usability enhancement to the --override-kv command-line argument parser, adding support for comma-separated key-value overrides. The code change replaces a single string_parse_kv_override() call with a loop that splits the input string by commas and processes each override individually.

Performance Impact:

The observed performance changes are confined to initialization code executed during argument parsing at program startup. The modified lambda handler now invokes string_split<std::string>(value, ',') which creates temporary STL containers and iterators. This accounts for the measured increases in STL accessor functions:

  • std::vector<minja::Value>::end(): +118 ns response time
  • tty_can_use_colors(): +122 ns response time

Inference Path Analysis:

No functions in the inference pipeline show performance changes. The tokenization and generation functions remain unaffected:

  • llama_decode: No change detected
  • llama_encode: No change detected
  • llama_tokenize: No change detected
  • llama_sampling_sample(): No change detected
  • ggml_mul_mat(): No change detected

Tokens per Second: No impact. The modified code executes only during startup argument parsing, not during inference execution.

Power Consumption:

  • llama-tts: +96 nJ (+0.037%)
  • llama-cvector-generator: +5 nJ (+0.002%)
  • Core inference libraries (libllama.so, libggml.so): No measurable change

The minimal power consumption increase reflects the one-time initialization overhead with no sustained runtime cost.

@loci-dev loci-dev force-pushed the main branch 26 times, most recently from 02b3f55 to ba4079a Compare December 17, 2025 21:09
@loci-dev loci-dev force-pushed the main branch 30 times, most recently from 37b9287 to eebd4bb Compare December 23, 2025 10:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants