-
Notifications
You must be signed in to change notification settings - Fork 2.4k
feat: support per-model overrides in llama.cpp load() #5820
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Extend the `load()` method in the llama.cpp extension to accept optional `overrideSettings`, allowing fine-grained per-model configuration. This enables users to override provider-level settings such as `ctx_size`, `chat_template`, `n_gpu_layers`, etc., when loading a specific model. Fixes: #5818 (Feature Request - Jan v0.6.6) Use cases enabled: - Different context sizes per model (e.g., 4K vs 32K) - Model-specific chat templates (ChatML, Alpaca, etc.) - Performance tuning (threads, GPU layers) - Better memory management per deployment Maintains full backward compatibility with existing provider config.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important
Looks good to me! 👍
Reviewed everything up to 41d2d8d in 49 seconds. Click for details.
- Reviewed
23lines of code in1files - Skipped
0files when reviewing. - Skipped posting
2draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. extensions/llamacpp-extension/src/index.ts:762
- Draft comment:
Add a JSDoc comment for 'overrideSettings' to explain its purpose and expected properties. - Reason this comment was not posted:
Confidence changes required:0%<= threshold50%None
2. extensions/llamacpp-extension/src/index.ts:777
- Draft comment:
Shallow merge of config and overrideSettings works now; ensure this is sufficient if nested objects are added in the future. - Reason this comment was not posted:
Confidence changes required:0%<= threshold50%None
Workflow ID: wflow_V5h26q9OTWgClQoQ
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
louis-jan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Barecheck - Code coverage reportTotal: 35%Your code coverage diff: 0.00% ▴ ✅ All code changes are covered |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important
Looks good to me! 👍
Reviewed a0d5c24 in 1 minute and 39 seconds. Click for details.
- Reviewed
14lines of code in1files - Skipped
0files when reviewing. - Skipped posting
2draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. extensions/llamacpp-extension/src/index.ts:760
- Draft comment:
Changing the order of parameters in load() may break existing calls using positional arguments. For example, a call like load('model', true) now treats true as overrideSettings. Consider using an options object or overloading to ensure backward compatibility. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 30% vs. threshold = 50% This is a valid concern about API compatibility. The parameter reordering could silently break existing code that uses positional arguments, causing runtime errors where boolean values get misinterpreted as settings objects. However, I need to consider if this is something that needs to be flagged in a PR comment. The comment suggests using an options object or overloading, but doesn't consider that this might be an intentional breaking change. Also, we don't know if there are any actual callers using positional arguments. While the technical concern is valid, breaking changes are sometimes necessary and intentional. The author may have already considered the impact and planned for it. The comment raises a valid concern about a breaking API change, but since we don't have strong evidence that this will cause actual problems, and breaking changes may be intentional, we should err on the side of trusting the author.
2. extensions/llamacpp-extension/src/index.ts:767
- Draft comment:
Typographical note: The error message at line 767 uses double exclamation marks ('Model already loaded!!'). Consider using a single exclamation mark for a more conventional tone. - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
Workflow ID: wflow_rIrPaeFPfzRsO5aa
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
Extend the
load()method in the llama.cpp extension to accept optionaloverrideSettings, allowing fine-grained per-model configuration.This enables users to override provider-level settings such as
ctx_size,chat_template,n_gpu_layers, etc., when loading a specific model.Fixes: #5818 (Feature Request - Jan v0.6.6)
Use cases enabled:
Maintains full backward compatibility with existing provider config.
Important
Extend
load()inindex.tsto support per-model configuration overrides, enabling settings likectx_sizeandchat_templatefor individual models.load()inindex.tsto acceptoverrideSettingsfor per-model configuration.ctx_size,chat_template,n_gpu_layers, etc., for specific models.This description was created by
for a0d5c24. You can customize this summary. It will automatically update as commits are pushed.