Skip to content

Feature proposal: Allow definition of variables, and include them in /v1/models #264

@bjodah

Description

@bjodah

I would like to make a humble suggestion for an additional feature in llama-swap. When I'm
configuring front ends, I would like to access meta-data about the different models provided by my
llama-swap instance, preferably in the response from /v1/models

Take for example the context window, I'd like to specify it at exactly one place, and have this
information used for both the command line options, and the response of /v1/models, I could
envision a syntax loosely along the lines of:

models:

  llamacpp-mistral-small-3.2-24b-2506:
    macros:
       - context_len=24000
       - n_concurrent=2
    cmd: |
      llama-server
        --port ${PORT}
        --hf-repo bartowski/mistralai_Mistral-Small-3.2-24B-Instruct-2506-GGUF:Q6_K_L
        --jinja
        --ctx-size ${context_len}
        --cache-type-k q8_0
        --cache-type-v q5_1
        --parallel ${n_concurrent}
        --flash-attn
        --temp 0.15
    metadata:
      meta:
        - context_window: ${context_len}
        - concurrency: ${n_concurrent}
        - mime-types:
            - "image/jpeg"
            - "image/png"
    proxy: http://127.0.0.1:${PORT}

Here macros could now be scoped to respective model (and not only global as I believe is the case today), and metadata would be a new keyword in llama-swap's yaml parser. And the
response from /v1/models could perhaps look like:

{
  "object": "list",
  "data": [
    {
      "id": "llamacpp-mistral-small-3.2-24b-2506",
      "object": "model",
      "created": 1686935002,
      "owned_by": "llama-swap"
      "meta": {
        "context_window": 24000,
        "concurrency": 2,
        "mime-types": [
            "image/jpeg",
            "image/png"
        ]
      }
    },
  ],
  "object": "list"
}

There is some precedence to adding extra fields to the response in /v1/models, consider e.g. what
llama.cpp does:

$ curl -s -X GET http://localhost:8686/upstream/llamacpp-Qwen3-Coder-30B-A3B-it/v1/models | jq
{
  "models": [
    {
      "name": "/root/.cache/llama.cpp/unsloth_Qwen3-Coder-30B-A3B-Instruct-GGUF_Qwen3-Coder-30B-A3B-Instruct-UD-Q4_K_XL.gguf",
      "model": "/root/.cache/llama.cpp/unsloth_Qwen3-Coder-30B-A3B-Instruct-GGUF_Qwen3-Coder-30B-A3B-Instruct-UD-Q4_K_XL.gguf",
      "modified_at": "",
      "size": "",
      "digest": "",
      "type": "model",
      "description": "",
      "tags": [
        ""
      ],
      "capabilities": [
        "completion"
      ],
      "parameters": "",
      "details": {
        "parent_model": "",
        "format": "gguf",
        "family": "",
        "families": [
          ""
        ],
        "parameter_size": "",
        "quantization_level": ""
      }
    }
  ],
  "object": "list",
  "data": [
    {
      "id": "/root/.cache/llama.cpp/unsloth_Qwen3-Coder-30B-A3B-Instruct-GGUF_Qwen3-Coder-30B-A3B-Instruct-UD-Q4_K_XL.gguf",
      "object": "model",
      "created": 1755869917,
      "owned_by": "llamacpp",
      "meta": {
        "vocab_type": 2,
        "n_vocab": 151936,
        "n_ctx_train": 262144,
        "n_embd": 2048,
        "n_params": 30532122624,
        "size": 17659361280
      }
    }
  ]
}

In this case, not all information I need to configure the front end is available, anyhow it would be
infeasible for me to use the /upstream/ path, since that would mean loading each and every model
in my llama-swap config.

Resources

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions