UPSTREAM PR #18169: presets: refactor, allow cascade presets from different sources#614
UPSTREAM PR #18169: presets: refactor, allow cascade presets from different sources#614
Conversation
|
Explore the complete analysis inside the Version Insights Pull Request #614 Performance Analysis SummaryOverviewPR #614 implements a preset configuration system refactoring for the llama-server multi-model router. The changes introduce a unified preset loading mechanism with cascading configuration support, affecting 6 files with 347 additions and 260 deletions. Performance analysis reveals no material impact on inference operations. Key FindingsPerformance Impact on Inference: Affected Components:
STL Iterator Performance Variations: Power Consumption: Code Changes Analysis: Conclusion: |
|
Explore the complete analysis inside the Version Insights |
c8dcfe6 to
ac107ae
Compare
8754d0f to
8645b59
Compare
Mirrored from ggml-org/llama.cpp#18169
Alternative to ggml-org/llama.cpp#17959
Fix ggml-org/llama.cpp#17948
Before this PR, the logic for loading models from different sources (cache / local / custom ini) was quite messy and doesn't allow ini preset to take precedence over other sources.
With this PR, we unify the method for loading server models and presets:
preset.cppis responsible for collecting all model sources (cache / local) and generate a base preset for each of the known GGUFpreset.cppthen load INI and parse the global section ([*])server-models.cpp) to decide how to cascade these presetsThe current cascading rule can be found in server's docs:
llama-server(highest priority)[ggml-org/MY-MODEL...])[*])