gpt-oss reasoning_effort with llama-server provider #6472

mpetruc · 2025-09-15T18:38:01Z

mpetruc
Sep 15, 2025

Has anyone been able to enable dynamic (i.e. for each request) setting of the reasoning_effort flag for gpt-oss models when using a remote llama-server provider?
I have added

chat-template-kwargs String {"reasoning_effort":"high"}

as a new Assistant parameter. The request does include the parameter:

{"messages":[{"role":"system","content":"You are a helpful AI assistant. "},{"role":"user","content":"explain addition to a two year old."}],"model":"gpt-oss-120b-F16","reasoning_effort":null,"stream":true,"temperature":1,"top_p":1,"top_k":1000,"chat_template_kwargs":"{\"reasoning_effort\":\"high\"}","min-p":"0.01"}

but seems to be ignored by the provider. ("Thought: Explain simply")
However, when i add the same as a Custom JSON config in llama-server UI client like so:

{  
  "chat_template_kwargs": {"reasoning_effort": "high"}
}

the request appears to be similar:

 {"messages":[{"role":"system","content":"You are a helpful AI assistant. "},{"role":"user","content":"explain addition to a two year old."}],"stream":true,"cache_prompt":true,"reasoning_format":"none","samplers":"edkypmxt","temperature":1,"dynatemp_range":0,"dynatemp_exponent":1,"top_k":1000,"top_p":0.95,"min_p":0.1,"typical_p":1,"xtc_probability":0,"xtc_threshold":0.1,"repeat_last_n":64,"repeat_penalty":1,"presence_penalty":0,"frequency_penalty":0,"dry_multiplier":0,"dry_base":1.75,"dry_allowed_length":2,"dry_penalty_last_n":-1,"max_tokens":-1,"timings_per_token":true,"chat_template_kwargs":{"reasoning_effort":"high"}}

and the model here thinks really hard (about 250 tokens versus 3 above).
The main difference seems to be that Jan sends the value as a quoted string: "{"reasoning_effort":"high"}", while the llama-server UI sends it like a JSON object (unquoted): {"reasoning_effort":"high"}.

mpetruc · 2025-09-15T21:10:17Z

mpetruc
Sep 15, 2025
Author

... and the answer should be obvious by now: the parameter needs to be of type 'Json', not 'String'!

chat_template_kwargs Json {  "reasoning_effort": "high"}

works great.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Jan

gpt-oss reasoning_effort with llama-server provider #6472

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Jan

gpt-oss reasoning_effort with llama-server provider #6472

Uh oh!

Uh oh!

mpetruc Sep 15, 2025

Replies: 1 comment

Uh oh!

mpetruc Sep 15, 2025 Author

mpetruc
Sep 15, 2025

mpetruc
Sep 15, 2025
Author