Conversation
|
Could you test this with OpenCode and see if the thoughts are hidden? (Using OpenCode as an example tool for integration test seems to be good since they use the OpenAI-compatible API format) If testing small models are a necessity please use 4B models like https://huggingface.co/mlx-community/Jan-v3-4B-base-instruct-4bit or https://huggingface.co/nightmedia/Qwen3-4B-Engineer14-qx86-hi-mlx Side note: Addendum: how many reasoning parsers are there in vLLM? https://huggingface.co/Ex0bit/GLM-4.7-Flash-PRISM#vllm https://docs.vllm.ai/en/v0.9.0/api/vllm/reasoning/index.html#vllm.reasoning.DeepSeekR1ReasoningParser |
|
@waybarrios please add the parameters to |
637cb38 to
4151384
Compare
- Add --reasoning-parser CLI flag (qwen3, deepseek-r1) - Extract <think>...</think> content into reasoning_content field - Support implicit think mode when <think> is injected in prompt - Strip think tags in tool parsers to prevent parsing failures - Remove broken vllm-mlx-serve entry point Fixes #26
4151384 to
24bfa63
Compare
Sorry for the mess in this PR, I will clean all the code @TomLucidor But it seems working when using opencode.
I think the issue with the tool function has already been resolved, and it is now working properly to ensure compatibility with OpenCode.
I will keep this PR as a draft, until more experiments are done |
|
If you have the resources, could you check |
- Add computed_field to serialize reasoning_content in API responses for backwards compatibility - Add fallback patterns in Hermes tool parser for malformed tool_call tags - Update tests to reflect implicit think mode support where only </think> appears in output
|
Overall the reasoning parser implementation looks solid and clean architecture with the abstract base class, good test coverage (120 tests!), and nice handling of the implicit think mode for agents like OpenCode. The Found 1 thing worth fixing before merge: CLI help text references the wrong field name In Lines 558 to 563 in 9c6b28b Quick fix -- change vllm-mlx/vllm_mlx/api/models.py Lines 183 to 191 in 9c6b28b A couple of things I noticed that aren't blockers but worth keeping in mind:
Nice work on this feature! it addresses #26 well and follows the existing codebase patterns nicely. |
b3ba9ab to
38efd0a
Compare




Add
--reasoning-parserflag support following vLLM style architecture where reasoning and tool parsing are separate systems.Features
Reasoning Parsers
Two built-in parsers for extracting
<think>content:qwen3<think>and</think>tagsdeepseek_r1<think>tag (just</think>)Think Tag Stripping in Tool Parsers
Prevents parsing failures when models produce
<think>tags with tool calls (fixes Ring-Mini-Linear-2.0 + hermes issue mentioned by @TomLucidor).Usage
Server:
Client:
Implementation
Reasoning Parser Base Class:
Think Tag Stripping:
Files Changed
vllm_mlx/reasoning/- New module (5 files, 529 lines)vllm_mlx/tool_parsers/abstract_tool_parser.py- Added strip_think_tagsvllm_mlx/tool_parsers/hermes_tool_parser.py- Uses strip_think_tagsvllm_mlx/tool_parsers/qwen_tool_parser.py- Uses strip_think_tagsdocs/guides/reasoning.md- Documentationtests/test_reasoning_parser.py- 59 teststests/test_tool_parsers.py- 4 new tests for think tag strippingTests
Closes #26