-
Notifications
You must be signed in to change notification settings - Fork 2.4k
Closed
Milestone
Description
an v0.6.6 llama.cpp extension only allows configuration at the provider level but lacks per-model settings, limiting user flexibility for model-specific optimizations.
Current Behavior
- Settings like context size and chat template can only be configured at the provider level
- All models using llama.cpp provider inherit the same configuration
- No way to customize settings for individual models
- Users must create separate providers for different model configurations
Expected Behavior
As a user, I should be able to:
- Set context size per individual model
- Configure chat templates specific to each model
- Override provider-level settings on a per-model basis
- Have fine-grained control over model-specific parameters
Use Cases
- Different Context Sizes: Some models work better with 4K context, others with 32K+
- Model-Specific Templates: Different models require different chat templates (ChatML, Alpaca, etc.)
- Performance Tuning: Adjust parameters like
n_gpu_layers,n_threadsper model - Memory Management: Set different
n_ctxvalues based on available RAM per model
Proposed Solution
Modify the load() function to accept overridden setting parameters:
// Current
load(model: Model): Promise<void>
// Proposed
load(model: Model, overrideSettings?: Partial<LlamaCppSettings>): Promise<void>This would allow:
- Model-specific settings to override provider defaults
- Backward compatibility with existing provider-level configuration
- Flexible per-model customization without breaking existing setups
Metadata
Metadata
Assignees
Labels
No labels
Type
Projects
Status
Done