Skip to content

bug: llama.cpp extension only supports provider-level settings, missing per-model configuration in v0.6.6 #5818

@louis-jan

Description

@louis-jan

an v0.6.6 llama.cpp extension only allows configuration at the provider level but lacks per-model settings, limiting user flexibility for model-specific optimizations.

Current Behavior

  • Settings like context size and chat template can only be configured at the provider level
  • All models using llama.cpp provider inherit the same configuration
  • No way to customize settings for individual models
  • Users must create separate providers for different model configurations

Expected Behavior

As a user, I should be able to:

  • Set context size per individual model
  • Configure chat templates specific to each model
  • Override provider-level settings on a per-model basis
  • Have fine-grained control over model-specific parameters

Use Cases

  1. Different Context Sizes: Some models work better with 4K context, others with 32K+
  2. Model-Specific Templates: Different models require different chat templates (ChatML, Alpaca, etc.)
  3. Performance Tuning: Adjust parameters like n_gpu_layers, n_threads per model
  4. Memory Management: Set different n_ctx values based on available RAM per model

Proposed Solution

Modify the load() function to accept overridden setting parameters:

// Current
load(model: Model): Promise<void>

// Proposed
load(model: Model, overrideSettings?: Partial<LlamaCppSettings>): Promise<void>

This would allow:

  • Model-specific settings to override provider defaults
  • Backward compatibility with existing provider-level configuration
  • Flexible per-model customization without breaking existing setups

Metadata

Metadata

Labels

No labels
No labels

Type

Projects

Status

Done

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions