Eval bug: Autoparser misplaces non-thinking content with NVIDIA-Nemotron-Nano-9B-v2

### Name and Version

Compiled from current master branch (https://github.com/ggml-org/llama.cpp/commit/c96f608d9861f7e8466bc1b6ac2ff4e3c6f96641), since the model won't load without #20270.

```
C:\llama.cpp-master\build\bin\Debug>llama-cli.exe --version
register_backend: registered backend CPU (1 devices)
register_device: registered device CPU (Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz)
load_backend: failed to find ggml_backend_init in C:\llama.cpp-master\build\bin\Debug\ggml-cpu.dll
version: 0 (unknown)
built with MSVC 19.44.35223.0 for x64
```

### Operating systems

Windows, Linux

### GGML backends

CPU, CUDA, Vulkan

### Hardware

I don't think this issue is related to hardware; I've tested on three different machines with different backends, and they all have the same issue.

### Models

NVIDIA-Nemotron-Nano-9B-v2 IQ2_M from bartowski: (<https://huggingface.co/bartowski/nvidia_NVIDIA-Nemotron-Nano-9B-v2-GGUF/blob/main/nvidia_NVIDIA-Nemotron-Nano-9B-v2-IQ2_M.gguf>). Tried both the model built-in chat template and the one in llama.cpp repo; both have the exact same issue.

I originally discovered this issue when trying to implement `/no_think` with a custom chat template for [Qwen3.5-35B-A3B](https://huggingface.co/unsloth/Qwen3.5-35B-A3B-GGUF/blob/main/Qwen3.5-35B-A3B-UD-Q4_K_XL.gguf). The modified chat template is [here](https://github.com/user-attachments/files/25858532/chat_template.mod.jinja.txt); it works correctly before the autoparser PR.

### Problem description & steps to reproduce

NVIDIA-Nemotron-Nano-9B-v2 supports both thinking and non-thinking mode in a single model, and supports switching between them in-conversation with a chat template trick, as documented in [model card](https://huggingface.co/nvidia/NVIDIA-Nemotron-Nano-9B-v2#prompt-format). It works correctly before, however since the new autoparser PR, after switching into non-thinking mode with `/no_think`, while the model correctly skips thinking, the model's output is no longer being treated as normal content as it should, but as thinking content.

To reproduce this issue, just run llama-cli or llama-server with this model, and include `/no_think` in your message; the former shows a `[Start thinking]` line, and the latter puts the entirety of model output into a reasoning block.

I'm not at all sure how this autoparser works (tried playing with `llama-debug-template-parser` and `llama-template-analysis` but got no meaningful insights), but my guess is that the new parser simply assumes the model begins generation with thinking content if it believes the chat template supports thinking, which is true for most models but irrecoverably breaks any attempt to implement in-coversation thinking mode switching.

### First Bad Commit

Build b8227 (autoparser PR) has the issue, while the previous build, b8226, does not. Thus I'm fairly confident that the issue is related to autoparser.

### Relevant log output

<details>
<summary>Logs</summary>


Commit c96f608 (the model "starts thinking" even though it's not)

```console
C:\llama.cpp-master\build\bin\Debug>llama-cli.exe -m "C:\nvidia_NVIDIA-Nemotron-Nano-9B-v2-IQ2_M.gguf" --no-repack --ctx-size 4096 -fit off --chat-template-file "C:\NVIDIA-Nemotron-Nano-v2.jinja"
register_backend: registered backend CPU (1 devices)
register_device: registered device CPU (Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz)
load_backend: failed to find ggml_backend_init in C:\llama.cpp-master\build\bin\Debug\ggml-cpu.dll

Loading model...


▄▄ ▄▄
██ ██
██ ██  ▀▀█▄ ███▄███▄  ▀▀█▄    ▄████ ████▄ ████▄
██ ██ ▄█▀██ ██ ██ ██ ▄█▀██    ██    ██ ██ ██ ██
██ ██ ▀█▄██ ██ ██ ██ ▀█▄██ ██ ▀████ ████▀ ████▀
                                    ██    ██
                                    ▀▀    ▀▀

build      : b0-unknown
model      : nvidia_NVIDIA-Nemotron-Nano-9B-v2-IQ2_M.gguf
modalities : text

available commands:
  /exit or Ctrl+C     stop or exit
  /regen              regenerate the last response
  /clear              clear the chat history
  /read               add a text file


> /no_think hello

[Start thinking]
Hello! How can I assist you today?


[ Prompt: 0.3 t/s | Generation: 0.3 t/s ]

>
```

Build b8226 (output content is normal):

```console
C:\llama-b8226-bin-win-cpu-x64>llama-cli.exe -m "C:\nvidia_NVIDIA-Nemotron-Nano-9B-v2-IQ2_M.gguf" --no-repack --ctx-size 4096 -fit off --chat-template-file "C:\NVIDIA-Nemotron-Nano-v2.jinja"
load_backend: loaded RPC backend from C:\llama-b8226-bin-win-cpu-x64\ggml-rpc.dll
load_backend: loaded CPU backend from C:\llama-b8226-bin-win-cpu-x64\ggml-cpu-haswell.dll

Loading model...


▄▄ ▄▄
██ ██
██ ██  ▀▀█▄ ███▄███▄  ▀▀█▄    ▄████ ████▄ ████▄
██ ██ ▄█▀██ ██ ██ ██ ▄█▀██    ██    ██ ██ ██ ██
██ ██ ▀█▄██ ██ ██ ██ ▀█▄██ ██ ▀████ ████▀ ████▀
                                    ██    ██
                                    ▀▀    ▀▀

build      : b8226-34df42f7b
model      : nvidia_NVIDIA-Nemotron-Nano-9B-v2-IQ2_M.gguf
modalities : text

available commands:
  /exit or Ctrl+C     stop or exit
  /regen              regenerate the last response
  /clear              clear the chat history
  /read               add a text file


> /no_think hello

Hello! How can I assist you today? 😊


[ Prompt: 5.0 t/s | Generation: 4.1 t/s ]

>
```
</details>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Eval bug: Autoparser misplaces non-thinking content with NVIDIA-Nemotron-Nano-9B-v2 #20325

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Eval bug: Autoparser misplaces non-thinking content with NVIDIA-Nemotron-Nano-9B-v2 #20325

Description

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions