feat: Add support for overriding tensor buffer type #6062

qnixsynapse · 2025-08-05T11:52:19Z

Describe Your Changes

This commit introduces a new configuration option, override_tensor_buffer_t, which allows users to specify a regex for matching tensor names to override their buffer type. This is an advanced setting primarily useful for optimizing the performance of large models, particularly Mixture of Experts (MoE) models.

By overriding the tensor buffer type, users can keep critical parts of the model, like the attention layers, on the GPU while offloading other parts, such as the expert feed-forward networks, to the CPU. This can lead to significant speed improvements for massive models.

Additionally, this change refines the error message to be more specific when a model fails to load. The previous message "Failed to load llama-server" has been updated to "Failed to load model" to be more accurate.

BE change
FE change

Fixes Issues

Closes feat: Add support for overriding tensor buffer type in llamacpp #6061

Self Checklist

Added relevant comments, esp in complex areas
Updated docs (for bug fixes / features)
Created issues for follow-up changes or refactoring needed

Important

Adds override_tensor_buffer_t option to index.ts for tensor buffer type override and refines model loading error message.

Behavior:
- Adds override_tensor_buffer_t option in LlamacppConfig in index.ts to specify regex for tensor names to override buffer type.
- Updates error message from "Failed to load llama-server" to "Failed to load model" in performLoad() in index.ts.
Misc:
- Closes issue feat: Add support for overriding tensor buffer type in llamacpp #6061.

^{This description was created by}^{for 9745451. You can customize this summary. It will automatically update as commits are pushed.}

ellipsis-dev

Important

Looks good to me! 👍

Reviewed everything up to 9745451 in 1 minute and 22 seconds. Click for details.

Reviewed 38 lines of code in 1 files
Skipped 0 files when reviewing.
Skipped posting 4 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. extensions/llamacpp-extension/src/index.ts:42

Draft comment:
Consider providing a default value (e.g. empty string) for override_tensor_buffer_t to avoid unintended falsy checks.
Reason this comment was not posted:
Confidence changes required: 50% <= threshold 50% None

2. extensions/llamacpp-extension/src/index.ts:1269

Draft comment:
Validate that override_tensor_buffer_t is non-empty and, if applicable, a valid regex before pushing the '--override-tensors' argument.
Reason this comment was not posted:
Confidence changes required: 50% <= threshold 50% None

3. extensions/llamacpp-extension/src/index.ts:1340

Draft comment:
Updated error logging now refers to 'model' instead of 'llama-server'. Ensure consistency with frontend error handling, if any.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50% None

4. extensions/llamacpp-extension/src/index.ts:1269

Draft comment:
Typographical suggestion: Consider revising the comment wording on this line. Instead of "This is an expert level settings and should only be used by people who knows what they are doing.", you might change it to "This is an expert-level setting and should only be used by people who know what they are doing."
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% While the comment is about a changed line and points out real typos, comments about typos in comments are generally not important enough to warrant a PR comment. The meaning is still clear despite the typos. This is a very minor stylistic issue that doesn't affect functionality. The typos do make the code look slightly less polished and professional. Multiple typos in one line could indicate rushed work. While polish is good, fixing comment typos is too minor to warrant a PR comment. This kind of feedback is better handled through general code review guidelines or style guides. Delete this comment as it points out typos that are too minor to warrant a PR comment.

Workflow ID: wflow_dEmJOMdTirEAHdbG

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

github-actions · 2025-08-05T11:57:04Z

Barecheck - Code coverage report

Total: 33.19%

Your code coverage diff: 0.00% ▴

Uncovered files and lines

File	Lines
web-app/src/containers/ModelSetting.tsx	25-30, 33-35, 37-41, 44-56, 59, 61, 63, 66, 69-71, 74-78, 80-87, 89-103, 105-113, 115-133, 135-138, 140
web-app/src/hooks/useModelProvider.ts	33-38, 40-41, 44-54, 56, 58-65, 67-69, 71-104, 106-136, 138-149, 159-161, 163, 165-167, 170-173, 175-176, 178, 180-182, 184-197, 199-202, 204-209, 215, 231-247, 250-253, 255-257, 260-263, 266-277, 280-283, 285-287, 290-302, 304-305

This commit introduces a new configuration option, `override_tensor_buffer_t`, which allows users to specify a regex for matching tensor names to override their buffer type. This is an advanced setting primarily useful for optimizing the performance of large models, particularly Mixture of Experts (MoE) models. By overriding the tensor buffer type, users can keep critical parts of the model, like the attention layers, on the GPU while offloading other parts, such as the expert feed-forward networks, to the CPU. This can lead to significant speed improvements for massive models. Additionally, this change refines the error message to be more specific when a model fails to load. The previous message "Failed to load llama-server" has been updated to "Failed to load model" to be more accurate.

github-project-automation bot added this to Jan Aug 5, 2025

github-actions bot assigned qnixsynapse Aug 5, 2025

ellipsis-dev bot reviewed Aug 5, 2025

View reviewed changes

urmauur force-pushed the feat/6061 branch from 9745451 to 2405965 Compare August 5, 2025 15:39

qnixsynapse and others added 2 commits August 6, 2025 11:37

chore: update FE to suppoer override-tensor

8a9f52e

qnixsynapse force-pushed the feat/6061 branch from 2405965 to 8a9f52e Compare August 6, 2025 06:07

Minh141120 approved these changes Aug 7, 2025

View reviewed changes

qnixsynapse merged commit 1f1605b into dev Aug 7, 2025
16 checks passed

qnixsynapse deleted the feat/6061 branch August 7, 2025 05:01

github-project-automation bot moved this to QA in Jan Aug 7, 2025

github-actions bot added this to the v0.6.7 milestone Aug 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Add support for overriding tensor buffer type #6062

feat: Add support for overriding tensor buffer type #6062

Uh oh!

qnixsynapse commented Aug 5, 2025 •

edited by urmauur

Loading

Uh oh!

ellipsis-dev bot left a comment

Uh oh!

github-actions bot commented Aug 5, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

feat: Add support for overriding tensor buffer type #6062

feat: Add support for overriding tensor buffer type #6062

Uh oh!

Conversation

qnixsynapse commented Aug 5, 2025 • edited by urmauur Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Describe Your Changes

Fixes Issues

Self Checklist

Uh oh!

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Aug 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Barecheck - Code coverage report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

qnixsynapse commented Aug 5, 2025 •

edited by urmauur

Loading

github-actions bot commented Aug 5, 2025 •

edited

Loading