UPSTREAM PR #18322: server: (preset) add `unsafe-allow-api-override` by loci-dev · Pull Request #673 · auroralabs-loci/llama.cpp

loci-dev · 2025-12-23T11:35:13Z

Ref discussion: ggml-org/llama.cpp#18261 (comment)

@ServeurpersoCom I think we need to add a test with INI preset at some point

Example preset for this PR:

[THUDM/glm-edge-v-5b-gguf:Q4_K_M]
no-mmap = 0
temp = 123.000
autoload = 1
unsafe-allow-api-override = no-mmap,c

And API request:

{
        "model": "THUDM/glm-edge-v-5b-gguf:Q4_K_M",
        "overrides": {"c": "512"}
}

Returns:

{
    "success": true,
    "args": [
        "........../build/bin/llama-server",
        "--host",
        "127.0.0.1",
        "--mmap",
        "--port",
        "65054",
        "--temp",
        "123.000",
        "--alias",
        "THUDM/glm-edge-v-5b-gguf:Q4_K_M",
        "--ctx-size",
        "512",
        "--hf-repo",
        "THUDM/glm-edge-v-5b-gguf:Q4_K_M"
    ]
}

loci-review · 2025-12-23T12:30:04Z

Explore the complete analysis inside the Version Insights

Performance Analysis Summary

PR #673: Server Router API Override Feature

This PR introduces a security-controlled mechanism for dynamic parameter overrides via the /models/load API endpoint. The changes affect 7 files with modifications primarily in server router and preset management subsystems.

Key Findings

Performance-Critical Areas Impact:

The changes do not affect core inference functions. No modifications were made to llama_decode, llama_encode, llama_tokenize, or sampling operations. All performance changes occur in initialization and configuration paths outside the inference pipeline.

Tokens Per Second Impact:

Zero impact on inference throughput. The functions responsible for tokenization and inference remain unchanged. Based on the reference model (smollm:135m on 12th Gen Intel i7-1255U), which shows 7% tokens per second reduction when llama_decode is 2 ms slower, this PR introduces no inference degradation as llama_decode response time is unaffected.

Impacted Functions:

The function common_params_add_preset_options shows +21,691 ns response time increase in llama-tts and +20,680 ns in llama-cvector-generator. However, this function executes only during model initialization, not during inference. The absolute overhead of 21 microseconds occurs once per model load operation.

New function common_preset_context::load_from_map adds 1-5 microseconds overhead when API overrides are provided, with zero overhead when the override map is empty (fast path).

Power Consumption Analysis:

Binary-level analysis shows llama-cvector-generator increased by 586 nJ (+0.23%) and llama-tts decreased by 164 nJ (-0.06%). These changes are within measurement noise and reflect initialization path modifications rather than inference loop changes. Core libraries (libggml-base.so, libggml-cpu.so, libllama.so) show zero power consumption change, confirming inference paths remain unaffected.

Code Changes:

The implementation adds whitelist-based parameter override validation, refactors preset parsing logic into load_from_map for code reuse, and extends the /models/load endpoint to accept override parameters. The security model requires explicit whitelisting via unsafe-allow-api-override preset parameter, with type validation ensuring only string values are accepted.

server: (preset) add unsafe-allow-api-override

878b63f

loci-dev temporarily deployed to PROD__AL_DEMO December 23, 2025 11:35 — with GitHub Actions Inactive

loci-dev force-pushed the main branch 27 times, most recently from 5bb9d21 to 1946e3d Compare December 28, 2025 01:38

loci-dev force-pushed the main branch 30 times, most recently from 76fc6ba to 945c525 Compare January 2, 2026 11:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UPSTREAM PR #18322: server: (preset) add `unsafe-allow-api-override`#673

UPSTREAM PR #18322: server: (preset) add `unsafe-allow-api-override`#673
loci-dev wants to merge 1 commit intomainfrom
upstream-PR18322-branch_ngxson-xsn/server_router_overrides

loci-dev commented Dec 23, 2025

Uh oh!

loci-review bot commented Dec 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

loci-dev commented Dec 23, 2025

Uh oh!

loci-review bot commented Dec 23, 2025

Performance Analysis Summary

Key Findings

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants