Skip to content

UPSTREAM PR #18236: cli: buffering info log, only show if model load failed#644

Closed
loci-dev wants to merge 3 commits intomainfrom
upstream-PR18236-branch_ngxson-xsn/cli_buffered_logs
Closed

UPSTREAM PR #18236: cli: buffering info log, only show if model load failed#644
loci-dev wants to merge 3 commits intomainfrom
upstream-PR18236-branch_ngxson-xsn/cli_buffered_logs

Conversation

@loci-dev
Copy link
Copy Markdown

Mirrored from ggml-org/llama.cpp#18236

A QoL change that allow having more useful logs in case model loading failed.

Currently, the CLI uses ERROR as the default level, which make debugging quite tricky in some cases (in some cases INFO and WARN are also needed)

This PR introduce a "buffering" API to common_log that allows log lines to still be recorded, but can be dropped or flushed based on model load failed or success.

The behavior:

  • If the default level is used (ERROR), buffering ERROR+WARN+INFO log lines. They will only show if model load failed
  • Otherwise, if other log level is used, skip buffering (also skip the loading animation) --> real time logging as normal

CC @JohannesGaessler this can be useful for debugging problem with -fit. Although I know DEBUG level is preferred, it can be quite difficult to copy-paste if the log is too verbose. WDYT?


Demo loading a broken GGUF:

$ llama-cli -m ../models/oai_random/model.gguf

Loading model...  
srv    load_model: loading model '../models/oai_random/model.gguf'
common_init_result: fitting params to device memory, for bugs during this step try to reproduce them with -fit off, or provide --verbose logs if the bug only occurs with -fit on
llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'openai-moe'
llama_model_load_from_file_impl: failed to load model
llama_params_fit: failed to fit params to free device memory: failed to load model
llama_params_fit: fitting params to free memory took 0.13 seconds
llama_model_load_from_file_impl: using device Metal (Apple M3 Max) (unknown id) - 27647 MiB free
llama_model_loader: loaded meta data with 31 key-value pairs and 231 tensors from ../models/oai_random/model.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = openai-moe
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                         general.size_label str              = 4x56M

[truncated]

llama_model_loader: - kv  28:                tokenizer.ggml.eos_token_id u32              = 199999
llama_model_loader: - kv  29:            tokenizer.ggml.unknown_token_id u32              = 199999
llama_model_loader: - kv  30:            tokenizer.ggml.padding_token_id u32              = 0
llama_model_loader: - type  f32:  145 tensors
llama_model_loader: - type  f16:   86 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type   = F16
print_info: file size   = 120.47 MiB (16.02 BPW) 
llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'openai-moe'
llama_model_load_from_file_impl: failed to load model
common_init_from_params: failed to load model '../models/oai_random/model.gguf'
srv    load_model: failed to load model, '../models/oai_random/model.gguf'

Failed to load the model, see logs above

@loci-dev loci-dev force-pushed the main branch 4 times, most recently from 7ceec3c to c8dcfe6 Compare December 21, 2025 10:08
@loci-dev loci-dev force-pushed the main branch 7 times, most recently from 26a6f0f to cf53bc9 Compare December 22, 2025 14:09
@DajanaV DajanaV closed this Dec 22, 2025
@DajanaV DajanaV deleted the upstream-PR18236-branch_ngxson-xsn/cli_buffered_logs branch December 22, 2025 14:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants