feat: Add support for llamacpp MoE offloading setting #6748

qnixsynapse · 2025-10-06T14:16:35Z

Describe Your Changes

Introduces the n_cpu_moe configuration setting for the llamacpp provider. This allows users to specify the number of Mixture of Experts (MoE) layers whose weights should be offloaded to the CPU via the --n-cpu-moe flag in llama.cpp.

This is useful for running large MoE models by balancing resource usage, for example, by keeping attention on the GPU and offloading expert FFNs to the CPU.

The changes include:

Updating the llamacpp-extension to accept and pass the --n-cpu-moe argument.
Adding the input field to the Model Settings UI (ModelSetting.tsx).
Including model setting migration logic and bumping the store version to 4.

TODOs

Verify migration
Implement boolean toggle cpu-moe

Fixes Issues

Closes feat: expose the --cpu-moe and --n-cpu-moe llama.cpp flags in GUI #6695
Closes #

Self Checklist

Added relevant comments, esp in complex areas
Updated docs (for bug fixes / features)
Created issues for follow-up changes or refactoring needed

github-actions · 2025-10-07T02:47:26Z

Barecheck - Code coverage report

Total: 30.43%

Your code coverage diff: 0.02% ▴

Uncovered files and lines

File	Lines
web-app/src/containers/ModelSetting.tsx	1-3, 5, 13-18, 26-33, 35, 38-40, 42-48, 50, 53, 55-58, 61, 64, 66-68, 70-75, 77-94, 96-104, 106-111, 114-125, 127-130, 133, 135-150, 153, 155, 157, 160, 163-165, 168-177, 179, 181-191, 193-197, 200-212, 215, 217, 219, 222, 225-227, 231-239, 241-251, 253-260, 262-274, 277-287, 290-296, 298-300, 302, 304, 306-308, 310, 312-314, 316-324, 326-344, 346-349, 351
web-app/src/hooks/useModelProvider.ts	33-38, 69, 86, 93, 115-126, 143-154, 164-166, 168, 170-172, 175-178, 180-181, 183, 185-187, 189-202, 209-214, 220, 235-236, 238-246, 249-251, 254-257, 260-267, 269-277, 279-290, 292-293, 295-303, 306-308, 310-321, 323-324, 326-329, 332-350, 352-364, 366-371, 374-395, 397-398, 400-402, 404-411, 413-424, 426-427

Introduces the n_cpu_moe configuration setting for the llamacpp provider. This allows users to specify the number of Mixture of Experts (MoE) layers whose weights should be offloaded to the CPU via the --n-cpu-moe flag in llama.cpp. This is useful for running large MoE models by balancing resource usage, for example, by keeping attention on the GPU and offloading expert FFNs to the CPU. The changes include: - Updating the llamacpp-extension to accept and pass the --n-cpu-moe argument. - Adding the input field to the Model Settings UI (ModelSetting.tsx). - Including model setting migration logic and bumping the store version to 4.

qnixsynapse requested a review from urmauur October 6, 2025 14:16

github-project-automation bot added this to Jan Oct 6, 2025

github-actions bot assigned qnixsynapse Oct 6, 2025

urmauur approved these changes Oct 7, 2025

View reviewed changes

github-roushan mentioned this pull request Oct 7, 2025

feat(llamacpp): add MoE CPU offload optimization settings #6705

Closed

3 tasks

qnixsynapse added 3 commits October 7, 2025 20:13

remove unused import

bc4bf94

feat: add cpu-moe boolean flag

beb050f

urmauur force-pushed the feat/6695 branch from 71be071 to beb050f Compare October 7, 2025 13:13

urmauur added 3 commits October 7, 2025 20:14

chore: remove unused migration cont_batching

62be2f7

chore: fix migration delete old key and add new one

f8eb195

chore: fix migration

31441ac

qnixsynapse merged commit 706dad2 into dev Oct 7, 2025
20 checks passed

qnixsynapse deleted the feat/6695 branch October 7, 2025 14:08

github-project-automation bot moved this to QA in Jan Oct 7, 2025

github-actions bot added this to the v0.7.2 milestone Oct 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Add support for llamacpp MoE offloading setting #6748

feat: Add support for llamacpp MoE offloading setting #6748

Uh oh!

qnixsynapse commented Oct 6, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Oct 7, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

feat: Add support for llamacpp MoE offloading setting #6748

feat: Add support for llamacpp MoE offloading setting #6748

Uh oh!

Conversation

qnixsynapse commented Oct 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Describe Your Changes

TODOs

Fixes Issues

Self Checklist

Uh oh!

github-actions bot commented Oct 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Barecheck - Code coverage report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

qnixsynapse commented Oct 6, 2025 •

edited

Loading

github-actions bot commented Oct 7, 2025 •

edited

Loading