UPSTREAM PR #17911: cli: enable jinja by default by loci-dev · Pull Request #515 · auroralabs-loci/llama.cpp

loci-dev · 2025-12-10T16:42:55Z

Mirrored from ggml-org/llama.cpp#17911

enable jinja by default for: server and CLI

disabled by default for: mtmd-cli and llama-completion

loci-review · 2025-12-10T17:32:02Z

Explore the complete analysis inside the Version Insights

Pull Request #515 Performance Review

PR Title: cli: enable jinja by default
Change Scope: 4 files modified (8 additions, 9 deletions)

Summary

This PR changes the default value of use_jinja from false to true in common_params structure, enabling Jinja template processing by default for chat operations across CLI and server tools. The change removes example-specific initialization logic from common_params_parser_init() and adds explicit overrides in completion.cpp and mtmd-cli.cpp to maintain their existing behavior (Jinja disabled). The modifications affect template instantiation paths in STL containers and JSON parsing operations, resulting in micro-level performance variations in non-critical utility functions.

Analysis

Code Changes:

common/common.h: Changed bool use_jinja = false to bool use_jinja = true (line 467)
common/arg.cpp: Removed 6 lines of example-specific initialization logic that set params.use_jinja = true for LLAMA_EXAMPLE_SERVER
tools/completion/completion.cpp: Added explicit params.use_jinja = false before parameter parsing
tools/mtmd/mtmd-cli.cpp: Added explicit params.use_jinja = false before parameter parsing

Performance Impact:

The observed performance variations occur in STL template instantiations and JSON operations within llama-tts and llama-cvector-generator binaries. These functions are not part of the inference pipeline and do not affect tokenization or model execution:

Vector iterator operations show 60-226% throughput changes with absolute deltas of 24-135 ns
JSON operations show 27-174% throughput changes with absolute deltas of 80-121 ns
All affected functions are utility operations for parameter handling, file management, and HTTP client operations

Inference Impact:

No functions in the core inference pipeline are affected. The following critical functions show zero performance change:

llama_decode - unchanged
llama_encode - unchanged
llama_tokenize - unchanged
ggml_mul_mat - unchanged
llama_graph_compute - unchanged

Tokens per second impact: None. The performance variations are isolated to initialization and parameter parsing code paths that execute once at startup, not during token generation.

Power Consumption:

llama-tts: +0.094% (+239 nJ)
llama-cvector-generator: -0.064% (-160 nJ)
All inference libraries (libllama.so, libggml-base.so, libggml-cpu.so): 0% change

The power consumption changes are within measurement noise and reflect the cumulative effect of STL template instantiation differences during parameter initialization, not runtime inference operations.

cli: enable jinja by default

bb30b93

loci-dev temporarily deployed to PROD__AL_DEMO December 10, 2025 16:43 — with GitHub Actions Inactive

loci-dev force-pushed the main branch 27 times, most recently from c05b224 to e70bc15 Compare December 14, 2025 08:10

loci-dev force-pushed the main branch 30 times, most recently from 81e654d to c785ce2 Compare December 18, 2025 13:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UPSTREAM PR #17911: cli: enable jinja by default#515

UPSTREAM PR #17911: cli: enable jinja by default#515
loci-dev wants to merge 1 commit intomainfrom
upstream-PR17911-branch_ngxson-xsn/cli_jinja_default

loci-dev commented Dec 10, 2025

Uh oh!

loci-review bot commented Dec 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

loci-dev commented Dec 10, 2025

Uh oh!

loci-review bot commented Dec 10, 2025

Pull Request #515 Performance Review

Summary

Analysis

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants