Add reasoning parser support by waybarrios · Pull Request #33 · waybarrios/vllm-mlx

waybarrios · 2026-02-03T01:40:58Z

Add --reasoning-parser flag support following vLLM style architecture where reasoning and tool parsing are separate systems.

Features

Reasoning Parsers

Two built-in parsers for extracting <think> content:

Parser	Models	Description
`qwen3`	Qwen3 series	Requires both `<think>` and `</think>` tags
`deepseek_r1`	DeepSeek-R1	Handles implicit `<think>` tag (just `</think>`)

Think Tag Stripping in Tool Parsers

Prevents parsing failures when models produce <think> tags with tool calls (fixes Ring-Mini-Linear-2.0 + hermes issue mentioned by @TomLucidor).

Usage

Server:

vllm-mlx serve mlx-community/Qwen3-8B-4bit --reasoning-parser qwen3

Client:

response = client.chat.completions.create(
    model="default",
    messages=[{"role": "user", "content": "What is 17 × 23?"}]
)
print("Thinking:", response.choices[0].message.reasoning)
print("Answer:", response.choices[0].message.content)

Implementation

Reasoning Parser Base Class:

class ReasoningParser(ABC):
    def __init__(self, tokenizer: Any | None = None):
        self.tokenizer = tokenizer

    @abstractmethod
    def extract_reasoning(self, model_output: str) -> tuple[str | None, str | None]:
        """Returns (reasoning, content) tuple."""

    @abstractmethod
    def extract_reasoning_streaming(self, previous_text, current_text, delta_text) -> DeltaMessage | None:
        """For streaming responses."""

Think Tag Stripping:

# In abstract_tool_parser.py
THINK_TAG_PATTERN = re.compile(r"<think>.*?</think>", re.DOTALL)

@staticmethod
def strip_think_tags(text: str) -> str:
    return THINK_TAG_PATTERN.sub("", text).strip()

# Used in hermes/qwen parsers before parsing tool calls
cleaned_text = self.strip_think_tags(model_output)

Files Changed

vllm_mlx/reasoning/ - New module (5 files, 529 lines)
vllm_mlx/tool_parsers/abstract_tool_parser.py - Added strip_think_tags
vllm_mlx/tool_parsers/hermes_tool_parser.py - Uses strip_think_tags
vllm_mlx/tool_parsers/qwen_tool_parser.py - Uses strip_think_tags
docs/guides/reasoning.md - Documentation
tests/test_reasoning_parser.py - 59 tests
tests/test_tool_parsers.py - 4 new tests for think tag stripping

Tests

============================= 120 passed in 0.91s ==============================

Closes #26

TomLucidor · 2026-02-03T02:13:32Z

Could you test this with OpenCode and see if the thoughts are hidden? (Using OpenCode as an example tool for integration test seems to be good since they use the OpenAI-compatible API format)

% vllm-mlx serve mlx-community/Ring-mini-linear-2.0-6bit --port 8000 --enable-auto-tool-choice --tool-call-parser hermes --reasoning-parser deepseek_r1
usage: vllm-mlx [-h] {serve,bench,bench-detok} ...
vllm-mlx: error: unrecognized arguments: --reasoning-parser deepseek_r1

If testing small models are a necessity please use 4B models like https://huggingface.co/mlx-community/Jan-v3-4B-base-instruct-4bit or https://huggingface.co/nightmedia/Qwen3-4B-Engineer14-qx86-hi-mlx

Side note: -h needs an update

Addendum: how many reasoning parsers are there in vLLM? https://huggingface.co/Ex0bit/GLM-4.7-Flash-PRISM#vllm https://docs.vllm.ai/en/v0.9.0/api/vllm/reasoning/index.html#vllm.reasoning.DeepSeekR1ReasoningParser

TomLucidor · 2026-02-03T04:33:10Z

@waybarrios please add the parameters to cli.py cus it is still missing from the branch.

- Add --reasoning-parser CLI flag (qwen3, deepseek-r1) - Extract <think>...</think> content into reasoning_content field - Support implicit think mode when <think> is injected in prompt - Strip think tags in tool parsers to prevent parsing failures - Remove broken vllm-mlx-serve entry point Fixes #26

waybarrios · 2026-02-04T04:59:45Z

Sorry for the mess in this PR, I will clean all the code @TomLucidor But it seems working when using opencode.

I think the issue with the tool function has already been resolved, and it is now working properly to ensure compatibility with OpenCode.

I will keep this PR as a draft, until more experiments are done

waybarrios · 2026-02-04T05:06:54Z

I am doing several tests calling tools like websearch on OpenCode.

TomLucidor · 2026-02-04T05:24:37Z

If you have the resources, could you check mlx-community/Ring-mini-linear-2.0-4bit or mlx-community/Ring-mini-linear-2.0-6bit? The reasoning part completely disappears now with OpenCode + OMOC (assume we are using the Sisyphus agent as default). Could be a prompting issue from OMOC or maybe something else is happening. Is the Qwen3 reasoning parser too strict?

- Add computed_field to serialize reasoning_content in API responses for backwards compatibility - Add fallback patterns in Hermes tool parser for malformed tool_call tags - Update tests to reflect implicit think mode support where only </think> appears in output

waybarrios · 2026-02-08T02:54:22Z

Overall the reasoning parser implementation looks solid and clean architecture with the abstract base class, good test coverage (120 tests!), and nice handling of the implicit think mode for agents like OpenCode. The strip_think_tags integration with the existing tool parsers is a thoughtful addition too.

Found 1 thing worth fixing before merge:

CLI help text references the wrong field name

In cli.py, the --reasoning-parser help text says tags are extracted into the reasoning_content field, but the actual primary field in the API model is reasoning. The reasoning_content is just a computed alias for backwards compatibility. The docs in docs/guides/reasoning.md already use the correct field name, so this is just the help text being out of sync.

vllm-mlx/vllm_mlx/cli.py

Lines 558 to 563 in 9c6b28b

    
               help=( 
        
                   "Enable reasoning content extraction with specified parser. " 
        
                   "Extracts <think>...</think> tags into reasoning_content field. " 
        
                   f"Options: {', '.join(reasoning_choices)}." 
        
               ), 
        
           )

Quick fix -- change "reasoning_content field" to "reasoning field" here:

vllm-mlx/vllm_mlx/api/models.py

Lines 183 to 191 in 9c6b28b

    
           content: str | None = None 
        
           reasoning: str | None = None  # Reasoning/thinking content (when --reasoning-parser is used) 
        
           tool_calls: list[ToolCall] | None = None 
        
           @computed_field 
        
           @property 
        
           def reasoning_content(self) -> str | None: 
        
               """Alias for reasoning field. Serialized for backwards compatibility with clients expecting reasoning_content.""" 
        
               return self.reasoning

A couple of things I noticed that aren't blockers but worth keeping in mind:

Streaming vs non-streaming behavior with non-thinking models: The non-streaming extract_reasoning (line 87 of think_parser.py) returns content when no tags are found, but the streaming version (line 140) treats everything as reasoning until </think> appears. The comments explain this is intentional for implicit think mode, which makes sense -- just something to document clearly so users know --reasoning-parser should only be used with thinking models.
Nested JSON in hermes fallback patterns: The TOOL_CALL_LENIENT_PATTERN and RAW_JSON_TOOL_PATTERN use [^}]* which won't match nested objects. Since these are fallback patterns that only kick in when primary parsing fails, it's low impact, but could be improved later with balanced brace matching.

Nice work on this feature! it addresses #26 well and follows the existing codebase patterns nicely.

…port

TomLucidor mentioned this pull request Feb 3, 2026

Testing using OpenCode #34

Open

waybarrios force-pushed the feature/reasoning-support branch 2 times, most recently from 637cb38 to 4151384 Compare February 4, 2026 04:14

waybarrios marked this pull request as draft February 4, 2026 04:31

waybarrios added WORK IN PROGRESS WORK IN PROGRESS enhancement New feature or request labels Feb 4, 2026

waybarrios force-pushed the feature/reasoning-support branch from 4151384 to 24bfa63 Compare February 4, 2026 04:47

waybarrios mentioned this pull request Feb 4, 2026

Reasoning flag support #26

Closed

waybarrios self-assigned this Feb 4, 2026

Fix black formatting in models.py and hermes_tool_parser.py

38efd0a

waybarrios force-pushed the feature/reasoning-support branch from b3ba9ab to 38efd0a Compare February 8, 2026 03:08

waybarrios marked this pull request as ready for review February 8, 2026 03:09

Merge remote-tracking branch 'origin/main' into feature/reasoning-sup…

e79ebe8

…port

waybarrios merged commit e146bb6 into main Feb 8, 2026
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add reasoning parser support#33

Add reasoning parser support#33
waybarrios merged 4 commits intomainfrom
feature/reasoning-support

waybarrios commented Feb 3, 2026

Uh oh!

TomLucidor commented Feb 3, 2026 •

edited

Loading

Uh oh!

TomLucidor commented Feb 3, 2026

Uh oh!

waybarrios commented Feb 4, 2026

Uh oh!

waybarrios commented Feb 4, 2026

Uh oh!

TomLucidor commented Feb 4, 2026 •

edited

Loading

Uh oh!

waybarrios commented Feb 8, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

waybarrios commented Feb 3, 2026

Features

Reasoning Parsers

Think Tag Stripping in Tool Parsers

Usage

Implementation

Files Changed

Tests

Uh oh!

TomLucidor commented Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

TomLucidor commented Feb 3, 2026

Uh oh!

waybarrios commented Feb 4, 2026

Uh oh!

waybarrios commented Feb 4, 2026

Uh oh!

TomLucidor commented Feb 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

waybarrios commented Feb 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

TomLucidor commented Feb 3, 2026 •

edited

Loading

TomLucidor commented Feb 4, 2026 •

edited

Loading

waybarrios commented Feb 8, 2026 •

edited

Loading