fix(openai_compat): strip leaked Thinking/Final tags from content#1237
fix(openai_compat): strip leaked Thinking/Final tags from content#1237horsley wants to merge 1 commit intosipeed:mainfrom
Conversation
nikolasdehor
left a comment
There was a problem hiding this comment.
Well-implemented defensive stripping of leaked thinking/reasoning tags from LLM provider responses.
Review notes:
-
Three-phase approach -- quick regex check -> strip
<final>wrappers -> remove thinking blocks. ThequickReasoningTagREearly-exit is a good performance optimization for the common case where no tags are present. -
Escaped unicode handling -- converting
\u003c/\u003e(and uppercase variants) before regex matching handles providers that double-escape angle brackets in JSON payloads. This is a known issue with some API implementations. -
Unclosed thinking block handling -- when
inThinkingis still true at the end of parsing, the trailing content is dropped. This is the correct behavior: if a thinking block was opened but never closed, everything after it is likely reasoning content. -
Regex patterns --
(?i)for case insensitivity is appropriate. The patterns handle variants like<think>,<thinking>,<thought>,<reasoning>, and<final>with optional whitespace and attributes. -
Test coverage -- three focused tests covering:
- Normal thinking+final tags
- Escaped unicode bracket forms
- Unclosed thinking blocks
-
Placement in the pipeline -- applying this in
parseResponse(before returning to the caller) is the right layer. This ensures all consumers of the OpenAI-compatible provider get sanitized content regardless of which model/vendor is behind the API.
LGTM.
| return &LLMResponse{ | ||
| Content: choice.Message.Content, | ||
| Content: stripThinkingAndFinalTags(choice.Message.Content), | ||
| ReasoningContent: choice.Message.ReasoningContent, |
There was a problem hiding this comment.
It may be better to judge whether it is a think mode by whether these contents exist or not. If the content is judged by regular expression, it may misjudge. It is not a think model, but the original text has these keywords in the communication process.
| body := strings.NewReader(`{ | ||
| "choices": [{ | ||
| "message": { | ||
| "content": "<think>internal reasoning</think><final>The answer is 2</final>" |
There was a problem hiding this comment.
Non-keyword-starting sentences should be added for testing.
|
|
||
| // stripThinkingAndFinalTags removes leaked reasoning blocks while preserving | ||
| // user-facing answer text (e.g. <final>answer</final> -> answer). | ||
| func stripThinkingAndFinalTags(content string) string { |
There was a problem hiding this comment.
This is not a dependent function., it can be placed in the tool class directory as a string tool.
| } | ||
| } | ||
|
|
||
| func TestParseResponse_StripsThinkingAndFinalTags(t *testing.T) { |
There was a problem hiding this comment.
You can put all three tests in one test function, there is no need to split them into so many. A functional test function should be single-point.
|
@horsley Hi! This PR has had no activity for over 2 weeks, so I'm closing it for now to keep things tidy. If it's still relevant, feel free to reopen it anytime and we'll pick it back up. |
📝 Description
This PR fixes a leak path in
openai_compatresponse parsing where provider-emitted reasoning wrappers (e.g.<think>...</think>) can surface in user-facingcontent.Changes:
parseResponseto strip thinking/reasoning blocks<final>wrappers while preserving final answer text\\u003c/\\u003e)🗣️ Type of Change
🤖 AI Code Generation
🔗 Related Issue
Fixes #1235
📚 Technical Context (Skip for Docs)
message.contentdirectly inopenai_compat, so malformed/legacy provider payloads could expose reasoning tags.🧪 Test Environment
📸 Evidence (Optional)
Click to view Logs/Screenshots
make check # => PASS ok github.com/sipeed/picoclaw/pkg/providers/openai_compat 0.044s☑️ Checklist