-
Notifications
You must be signed in to change notification settings - Fork 13.8k
Closed
Labels
Description
Name and Version
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA RTX PRO 6000 Blackwell Workstation Edition, compute capability 12.0, VMM: yes
version: 6097 (9515c61)
built with cc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0 for x86_64-linux-gnu
Operating systems
Linux
GGML backends
CUDA
Hardware
dual EPYC 9355, blackwell 6000 pro
Models
gpt-oss-120b-F16.gguf
Problem description & steps to reproduce
reasoning_effort seems to have no effect.
Startup command:
./build/bin/llama-server \
--model /data/gpt-oss-120b-GGUF/gpt-oss-120b-F16.gguf \
--alias gpt-oss-120b-F16 \
--no-webui \
--numa numactl \
--threads 32 \
--ctx-size 131072 \
--n-gpu-layers 37 \
-ot "exps.*\.blk.*\.ffn_.*=CUDA0" \
--no-op-offload \
-ub 4096 -b 4096 \
--seed 3407 \
--temp 0.6 \
--top-p 1.0 \
--log-colors \
--flash-attn \
--host 0.0.0.0 \
--jinja \
--chat-template-kwargs '{"reasoning_effort": "high"}' \
--port 11434First Bad Commit
No response
Relevant log output
output says:
Knowledge cutoff: 2024-06
Current date: 2025-08-06
Reasoning: mediumnai-kon and RobinBially