Skip to content

Add reasoning parser support#33

Merged
waybarrios merged 4 commits intomainfrom
feature/reasoning-support
Feb 8, 2026
Merged

Add reasoning parser support#33
waybarrios merged 4 commits intomainfrom
feature/reasoning-support

Conversation

@waybarrios
Copy link
Copy Markdown
Owner

Add --reasoning-parser flag support following vLLM style architecture where reasoning and tool parsing are separate systems.

Features

Reasoning Parsers

Two built-in parsers for extracting <think> content:

Parser Models Description
qwen3 Qwen3 series Requires both <think> and </think> tags
deepseek_r1 DeepSeek-R1 Handles implicit <think> tag (just </think>)

Think Tag Stripping in Tool Parsers

Prevents parsing failures when models produce <think> tags with tool calls (fixes Ring-Mini-Linear-2.0 + hermes issue mentioned by @TomLucidor).

Usage

Server:

vllm-mlx serve mlx-community/Qwen3-8B-4bit --reasoning-parser qwen3

Client:

response = client.chat.completions.create(
    model="default",
    messages=[{"role": "user", "content": "What is 17 × 23?"}]
)
print("Thinking:", response.choices[0].message.reasoning)
print("Answer:", response.choices[0].message.content)

Implementation

Reasoning Parser Base Class:

class ReasoningParser(ABC):
    def __init__(self, tokenizer: Any | None = None):
        self.tokenizer = tokenizer

    @abstractmethod
    def extract_reasoning(self, model_output: str) -> tuple[str | None, str | None]:
        """Returns (reasoning, content) tuple."""

    @abstractmethod
    def extract_reasoning_streaming(self, previous_text, current_text, delta_text) -> DeltaMessage | None:
        """For streaming responses."""

Think Tag Stripping:

# In abstract_tool_parser.py
THINK_TAG_PATTERN = re.compile(r"<think>.*?</think>", re.DOTALL)

@staticmethod
def strip_think_tags(text: str) -> str:
    return THINK_TAG_PATTERN.sub("", text).strip()

# Used in hermes/qwen parsers before parsing tool calls
cleaned_text = self.strip_think_tags(model_output)

Files Changed

  • vllm_mlx/reasoning/ - New module (5 files, 529 lines)
  • vllm_mlx/tool_parsers/abstract_tool_parser.py - Added strip_think_tags
  • vllm_mlx/tool_parsers/hermes_tool_parser.py - Uses strip_think_tags
  • vllm_mlx/tool_parsers/qwen_tool_parser.py - Uses strip_think_tags
  • docs/guides/reasoning.md - Documentation
  • tests/test_reasoning_parser.py - 59 tests
  • tests/test_tool_parsers.py - 4 new tests for think tag stripping

Tests

============================= 120 passed in 0.91s ==============================

Closes #26

@TomLucidor
Copy link
Copy Markdown

TomLucidor commented Feb 3, 2026

Could you test this with OpenCode and see if the thoughts are hidden? (Using OpenCode as an example tool for integration test seems to be good since they use the OpenAI-compatible API format)

% vllm-mlx serve mlx-community/Ring-mini-linear-2.0-6bit --port 8000 --enable-auto-tool-choice --tool-call-parser hermes --reasoning-parser deepseek_r1
usage: vllm-mlx [-h] {serve,bench,bench-detok} ...
vllm-mlx: error: unrecognized arguments: --reasoning-parser deepseek_r1

If testing small models are a necessity please use 4B models like https://huggingface.co/mlx-community/Jan-v3-4B-base-instruct-4bit or https://huggingface.co/nightmedia/Qwen3-4B-Engineer14-qx86-hi-mlx

Side note: -h needs an update

Addendum: how many reasoning parsers are there in vLLM? https://huggingface.co/Ex0bit/GLM-4.7-Flash-PRISM#vllm https://docs.vllm.ai/en/v0.9.0/api/vllm/reasoning/index.html#vllm.reasoning.DeepSeekR1ReasoningParser

@TomLucidor
Copy link
Copy Markdown

@waybarrios please add the parameters to cli.py cus it is still missing from the branch.

@waybarrios waybarrios force-pushed the feature/reasoning-support branch 2 times, most recently from 637cb38 to 4151384 Compare February 4, 2026 04:14
@waybarrios waybarrios marked this pull request as draft February 4, 2026 04:31
@waybarrios waybarrios added WORK IN PROGRESS WORK IN PROGRESS enhancement New feature or request labels Feb 4, 2026
- Add --reasoning-parser CLI flag (qwen3, deepseek-r1)
- Extract <think>...</think> content into reasoning_content field
- Support implicit think mode when <think> is injected in prompt
- Strip think tags in tool parsers to prevent parsing failures
- Remove broken vllm-mlx-serve entry point

Fixes #26
@waybarrios waybarrios force-pushed the feature/reasoning-support branch from 4151384 to 24bfa63 Compare February 4, 2026 04:47
@waybarrios
Copy link
Copy Markdown
Owner Author

Screenshot 2026-02-03 at 11 51 09 PM Sorry for the mess in this PR, I will clean all the code @TomLucidor But it seems working when using opencode. Screenshot 2026-02-03 at 11 55 21 PM Screenshot 2026-02-03 at 11 56 47 PM I think the issue with the tool function has already been resolved, and it is now working properly to ensure compatibility with OpenCode.

I will keep this PR as a draft, until more experiments are done

@waybarrios waybarrios mentioned this pull request Feb 4, 2026
@waybarrios waybarrios self-assigned this Feb 4, 2026
@waybarrios
Copy link
Copy Markdown
Owner Author

Screenshot 2026-02-04 at 12 06 12 AM I am doing several tests calling tools like websearch on OpenCode.

@TomLucidor
Copy link
Copy Markdown

TomLucidor commented Feb 4, 2026

If you have the resources, could you check mlx-community/Ring-mini-linear-2.0-4bit or mlx-community/Ring-mini-linear-2.0-6bit? The reasoning part completely disappears now with OpenCode + OMOC (assume we are using the Sisyphus agent as default). Could be a prompting issue from OMOC or maybe something else is happening. Is the Qwen3 reasoning parser too strict?

- Add computed_field to serialize reasoning_content in API responses for backwards compatibility
- Add fallback patterns in Hermes tool parser for malformed tool_call tags
- Update tests to reflect implicit think mode support where only </think> appears in output
@waybarrios
Copy link
Copy Markdown
Owner Author

waybarrios commented Feb 8, 2026

Overall the reasoning parser implementation looks solid and clean architecture with the abstract base class, good test coverage (120 tests!), and nice handling of the implicit think mode for agents like OpenCode. The strip_think_tags integration with the existing tool parsers is a thoughtful addition too.

Found 1 thing worth fixing before merge:

CLI help text references the wrong field name

In cli.py, the --reasoning-parser help text says tags are extracted into the reasoning_content field, but the actual primary field in the API model is reasoning. The reasoning_content is just a computed alias for backwards compatibility. The docs in docs/guides/reasoning.md already use the correct field name, so this is just the help text being out of sync.

vllm-mlx/vllm_mlx/cli.py

Lines 558 to 563 in 9c6b28b

help=(
"Enable reasoning content extraction with specified parser. "
"Extracts <think>...</think> tags into reasoning_content field. "
f"Options: {', '.join(reasoning_choices)}."
),
)

Quick fix -- change "reasoning_content field" to "reasoning field" here:

content: str | None = None
reasoning: str | None = None # Reasoning/thinking content (when --reasoning-parser is used)
tool_calls: list[ToolCall] | None = None
@computed_field
@property
def reasoning_content(self) -> str | None:
"""Alias for reasoning field. Serialized for backwards compatibility with clients expecting reasoning_content."""
return self.reasoning

A couple of things I noticed that aren't blockers but worth keeping in mind:

  • Streaming vs non-streaming behavior with non-thinking models: The non-streaming extract_reasoning (line 87 of think_parser.py) returns content when no tags are found, but the streaming version (line 140) treats everything as reasoning until </think> appears. The comments explain this is intentional for implicit think mode, which makes sense -- just something to document clearly so users know --reasoning-parser should only be used with thinking models.

  • Nested JSON in hermes fallback patterns: The TOOL_CALL_LENIENT_PATTERN and RAW_JSON_TOOL_PATTERN use [^}]* which won't match nested objects. Since these are fallback patterns that only kick in when primary parsing fails, it's low impact, but could be improved later with balanced brace matching.

Nice work on this feature! it addresses #26 well and follows the existing codebase patterns nicely.

@waybarrios waybarrios force-pushed the feature/reasoning-support branch from b3ba9ab to 38efd0a Compare February 8, 2026 03:08
@waybarrios waybarrios marked this pull request as ready for review February 8, 2026 03:09
@waybarrios waybarrios merged commit e146bb6 into main Feb 8, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request WORK IN PROGRESS WORK IN PROGRESS

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Reasoning flag support

2 participants