Skip to content

Conversation

@qnixsynapse
Copy link
Contributor

@qnixsynapse qnixsynapse commented Oct 6, 2025

Describe Your Changes

Introduces the n_cpu_moe configuration setting for the llamacpp provider. This allows users to specify the number of Mixture of Experts (MoE) layers whose weights should be offloaded to the CPU via the --n-cpu-moe flag in llama.cpp.

This is useful for running large MoE models by balancing resource usage, for example, by keeping attention on the GPU and offloading expert FFNs to the CPU.

The changes include:

  • Updating the llamacpp-extension to accept and pass the --n-cpu-moe argument.

  • Adding the input field to the Model Settings UI (ModelSetting.tsx).

  • Including model setting migration logic and bumping the store version to 4.

Screenshot From 2025-10-06 19-40-29

TODOs

  • Verify migration
  • Implement boolean toggle cpu-moe

Fixes Issues

Self Checklist

  • Added relevant comments, esp in complex areas
  • Updated docs (for bug fixes / features)
  • Created issues for follow-up changes or refactoring needed

@github-actions
Copy link
Contributor

github-actions bot commented Oct 7, 2025

Introduces the n_cpu_moe configuration setting for the llamacpp provider. This allows users to specify the number of Mixture of Experts (MoE) layers whose weights should be offloaded to the CPU via the --n-cpu-moe flag in llama.cpp.

This is useful for running large MoE models by balancing resource usage, for example, by keeping attention on the GPU and offloading expert FFNs to the CPU.

The changes include:

 - Updating the llamacpp-extension to accept and pass the --n-cpu-moe argument.

 - Adding the input field to the Model Settings UI (ModelSetting.tsx).

 - Including model setting migration logic and bumping the store version to 4.
@qnixsynapse qnixsynapse merged commit 706dad2 into dev Oct 7, 2025
20 checks passed
@qnixsynapse qnixsynapse deleted the feat/6695 branch October 7, 2025 14:08
@github-project-automation github-project-automation bot moved this to QA in Jan Oct 7, 2025
@github-actions github-actions bot added this to the v0.7.2 milestone Oct 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: QA

Development

Successfully merging this pull request may close these issues.

feat: expose the --cpu-moe and --n-cpu-moe llama.cpp flags in GUI

3 participants