UPSTREAM PR #20087: Hybrid model cache: add --checkpoint-every-nb#1222
UPSTREAM PR #20087: Hybrid model cache: add --checkpoint-every-nb#1222
--checkpoint-every-nb#1222Conversation
OverviewAnalysis of 112,748 functions across 15 binaries reveals minimal performance impact from adding checkpoint management functionality. Modified: 180 functions (0.16%), new: 6, removed: 0, unchanged: 112,562 (99.84%). All changes confined to command-line argument parsing infrastructure, with no modifications to inference hot paths. Power Consumption Changes:
Function AnalysisLambda #16 (--dry-allowed-length handler) in llama-cvector-generator and llama-tts shows response time increases of 13,120% and 13,081% respectively (14.5ns→1,923ns and 14.5ns→1,918ns). Source code unchanged; regression stems from infrastructure overhead changes in argument parsing framework. Executes once during startup. Lambda #34 (--main-gpu handler) shows 4,831% and 4,715% response time increases (111ns→5,473ns and 113ns→5,450ns) due to increased overhead in Lambda #18 regressions (7,758% and 7,739%) are false positives caused by lambda position renumbering—the addition of Other analyzed functions (lambdas #35, #57, #60) show 435-673% increases from compiler optimization differences affecting inlining decisions. All changes occur in one-time initialization code with cumulative overhead of ~8.4 microseconds per application launch, negligible compared to model loading time (seconds). Additional FindingsZero impact on inference operations: no changes to matrix operations, attention mechanisms, KV cache, quantization kernels, or GPU backends (CUDA, Metal, HIP). The new 🔎 Full breakdown: Loci Inspector |
|
llama-tts I need flamegraph before and after to understand |
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
…ort form `-ctxcp` for `--ctx-checkpoints`)
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
|
No summary available at this time. Visit Loci Inspector to review detailed analysis. |
61601b2 to
56aaa36
Compare
e3ea641 to
efc22ce
Compare
88f82d8 to
8c39ead
Compare
Note
Source pull request: ggml-org/llama.cpp#20087
Add an option to create checkpoints after processing every
nbatches during prompt processing.Hopefully solves #19794 #19298 #18497 and similar.
Usage:
llama-server -m model.gguf --checkpoint-every-nb 3: creates a checkpoint every 3 batches.