Skip to content

arg: clarify auto kvu/np being set on server#17997

Merged
ngxson merged 3 commits intoggml-org:masterfrom
ngxson:xsn/server_clarify_kvu_np
Dec 16, 2025
Merged

arg: clarify auto kvu/np being set on server#17997
ngxson merged 3 commits intoggml-org:masterfrom
ngxson:xsn/server_clarify_kvu_np

Conversation

@ngxson
Copy link
Contributor

@ngxson ngxson commented Dec 13, 2025

Fix #17989

Related discussion: #16736 (comment)

Argument Explanation
--kv-unified, -kvu use single unified KV buffer shared across all sequences (default: enabled if number of slots is auto)
(env: LLAMA_ARG_KV_UNIFIED)
-np, --parallel N number of server slots (default: -1, -1 = auto)
(env: LLAMA_ARG_N_PARALLEL)

Copy link
Member

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking care of this 👍

@ngxson ngxson merged commit 7b1db3d into ggml-org:master Dec 16, 2025
78 checks passed
Anico2 added a commit to Anico2/llama.cpp that referenced this pull request Jan 15, 2026
* arg: clarify auto kvu/np being set on server

* improve docs

* use invalid_argument
blime4 referenced this pull request in blime4/llama.cpp Feb 5, 2026
* arg: clarify auto kvu/np being set on server

* improve docs

* use invalid_argument
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Misc. bug: "--parallel 1" initializes 4 slots, while docs say default is 1

2 participants