Prerequisites
Feature Description
When the reasoning format is deepseek, the reasoning part (things between <think> and </think>) would be place in message.reasoning_content. Is it possible to put the grammar / json schema enforcement after the </think>?
Motivation
The model should be free to reason, but strict with an answer format. When the users use deepseek reasoning format, it means they don't care about the reasoning so much, just want to have the answer separately.
Say I need to model to return the answer in a json format. If the model is free to reason for a while instead of putting the answer right in the json, the performance might be better.
Possible Implementation
A. Update the grammar root, enable a thinking section wrapped in <think> and </think> is the reasoning format is deepseek
or
B. An ugly way: let the model generate until it hits </think>, then apply grammar. (This is the current work around method I'm using)