Replies: 1 comment 1 reply
-
|
Note Co-written with gptoss 120B. I'm testing out if a bot to help with llama-swap's yaml config could be created. :) Short answer: Why the restriction existsA group is the unit that llama‑swap uses to decide:
If the same model appeared in two different groups, llama‑swap would have to guess which group’s policy should apply when you request that model or when another group is loaded. To keep the behaviour deterministic, the validator rejects duplicate membership. You could try something like this. It would keep the embedding model loaded and "persistent" so swapping between the two Qwen3 30B models won't affect it. # -------------------------------------------------
# 1️⃣ Embedding model – never gets unloaded
# -------------------------------------------------
groups:
embedding:
# The model lives here forever (other groups can’t unload it)
persistent: true
# These two flags are optional – they just make the group
# a polite neighbour that never tries to unload anyone else.
swap: false
exclusive: false
members:
- Qwen3-Embedding-0.6B-F16
# -------------------------------------------------
# 2️⃣ All other models – fall back to the *default* group
# -------------------------------------------------
models:
Qwen3-30B-A3B-Instruct-2507-GGUF-UD-Q4_K_XL:
cmd: llama-server --port ${PORT} -m Qwen3-30B-A3B-Instruct-2507.gguf
Qwen3-Coder-30B-A3B-Instruct-UD-Q4_K_XL:
cmd: llama-server --port ${PORT} -m Qwen3-Coder-30B-A3B-Instruct.gguf
Qwen3-Embedding-0.6B-F16:
cmd: ...
# -------------------------------------------------
# 3️⃣ (Optional) start the embedding model at startup
# -------------------------------------------------
hooks:
on_startup:
preload:
- Qwen3-Embedding-0.6B-F16 |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Is it possible to have the same model listed in multiple groups?
I seem to get an error that a model is used in multiple groups now.
Shouldn't it be able to work?
For example I tried this config:
It would be nice to have multiple models paired for example with the embedding model so that it does not unload it every time.
Beta Was this translation helpful? Give feedback.
All reactions