feat: add extended thinking support for Anthropic models#1076
feat: add extended thinking support for Anthropic models#1076yinwm merged 3 commits intosipeed:mainfrom
Conversation
yinwm
left a comment
There was a problem hiding this comment.
Review: Extended Thinking Support for Anthropic Models
Thanks for this well-structured PR! The implementation is clean and well-tested. However, I have some architectural concerns that should be addressed before merging.
✅ What's Done Well
- Clean architecture flow: Config → AgentInstance → loop.go → provider.go
- Comprehensive tests: 22 test cases covering edge cases
- API constraint handling: Temperature clearing and budget_tokens clamping
- Backward compatible: Default is
off, no breaking changes
⚠️ Issues to Address
1. Configuration Should Be Per-Model, Not Global
Current design:
AgentDefaults.ThinkingLevel (global)
↓
All agents inherit this config
↓
Only Anthropic provider recognizes it
↓
Other providers silently ignore it ❌
Problem: thinking_level in AgentDefaults applies to all models, but only Anthropic supports it. If a user configures thinking_level: high globally and uses OpenAI/Qwen models, the setting is silently ignored.
Suggestion: Move ThinkingLevel to ModelConfig level instead:
// config/config.go
type ModelConfig struct {
// ... existing fields
ThinkingLevel string `json:"thinking_level,omitempty"`
}This way, each model can have its own thinking configuration appropriate for its provider.
2. Silent Configuration Ignoring
When thinking_level is set but the provider doesn't support it, there's no warning. Users will be confused why their configuration has no effect.
Suggestion: Add a warning when the configuration is ignored:
// loop.go
if agent.ThinkingLevel != ThinkingOff {
if !supportsThinking(agent.Provider) {
log.Printf("WARN: thinking_level=%s is set but provider %s does not support it",
agent.ThinkingLevel, agent.Provider.Name())
} else {
llmOpts["thinking_level"] = string(agent.ThinkingLevel)
}
}3. Temperature Cleared Silently
When thinking is enabled, temperature is cleared (Anthropic API requirement), but users aren't notified:
// provider.go
params.Temperature = anthropic.MessageNewParams{}.TemperatureSuggestion: Add a warning when temperature is being cleared:
if params.Temperature.Valid() {
log.Printf("WARN: temperature is cleared because thinking is enabled (level=%s)", level)
}4. Budget vs MaxTokens Relationship Undocumented
The mapping from levels to token budgets isn't documented in config:
low= 4,096 tokensmedium= 10,000 tokenshigh= 32,000 tokensxhigh= 128,000 tokens
Users may not realize that thinking_level: xhigh (128K) could consume most of their output budget if max_tokens is small.
Suggestion:
- Document the token budgets in config example
- Add a warning when budget would exceed 80% of max_tokens
5. Missing Test for parseResponse Thinking Block
Tests cover applyThinkingConfig but not the parseResponse logic:
case "thinking":
tb := block.AsThinking()
reasoning.WriteString(tb.Thinking)Suggestion: Add a test case verifying thinking blocks are correctly parsed into the Reasoning field.
🔍 Architectural Context
Different providers have completely different thinking mechanisms:
| Provider | Control Method | Output Format |
|---|---|---|
| Anthropic | thinking.budget_tokens param |
thinking block |
| OpenAI compat (Qwen/DeepSeek) | chat_template_kwargs (not supported yet) |
reasoning_content field |
| Gemini | thought_signature |
Special field |
The current PR only addresses Anthropic, which is fine, but the global configuration approach creates confusion for multi-provider setups.
📋 Summary
| Issue | Priority | Suggested Fix |
|---|---|---|
| Global vs per-model config | High | Move to ModelConfig |
| Silent ignoring of config | Medium | Add warning log |
| Silent temperature clearing | Medium | Add warning log |
| Budget documentation | Low | Add to config example |
| Missing parseResponse test | Low | Add test case |
Recommendation: Request changes to address the per-model configuration issue and add appropriate warnings. The core implementation is solid, just needs better UX for configuration handling.
📊 Architecture Analysis & Provider ComparisonFollowing up on my review, here's a detailed analysis of how different providers handle the Options Processing by ProviderCurrent Parameters Passed via
|
| Provider | max_tokens |
temperature |
prompt_cache_key |
thinking_level |
|---|---|---|---|---|
| anthropic | ✅ | ✅ | ❌ | ✅ (this PR) |
| openai_compat | ✅ | ✅ | ✅ | ❌ silently ignored |
| codex | ✅ | ✅ | ✅ | ❌ silently ignored |
| antigravity | ✅ | ✅ | ❌ | ❌ silently ignored |
Why OpenAI Compat Provider Ignores thinking_level
Looking at the code in openai_compat/provider.go:
func (p *Provider) Chat(..., options map[string]any) (*LLMResponse, error) {
requestBody := map[string]any{
"model": model,
"messages": serializeMessages(messages),
}
// Only these 3-4 parameters are recognized:
if maxTokens, ok := asInt(options["max_tokens"]); ok { /* handled */ }
if temperature, ok := asFloat(options["temperature"]); ok { /* handled */ }
if cacheKey, ok := options["prompt_cache_key"].(string); ok { /* handled */ }
// ❌ No code reads options["thinking_level"]
// ❌ Other options are silently ignored
}The parameter is ignored simply because there's no code to read it.
Data Flow Architecture
┌─────────────────────────────────────────────────────────────────┐
│ AgentDefaults (Global) │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌────────────┐ │
│ │ max_tokens │ │ temperature │ │thinking_level│ │ ... │ │
│ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ └────────────┘ │
└─────────┼───────────────┼───────────────┼───────────────────────┘
│ │ │
▼ ▼ ▼
┌─────────────────────────────────────────────────────────────────┐
│ loop.go │
│ llmOpts := map[string]any{...} │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────┴─────────────────────┐
▼ ▼ ▼
┌──────────┐ ┌────────────┐ ┌───────────┐
│ Anthropic│ │openai_compat│ │ Others │
└────┬─────┘ └─────┬──────┘ └─────┬─────┘
│ │ │
▼ ▼ ▼
┌──────────┐ ┌────────────┐ ┌───────────┐
│Recognizes│ │ Ignores │ │ Ignores │
│ ALL │ │thinking_ │ │thinking_ │
│ params │ │ level │ │ level │
└──────────┘ └────────────┘ └───────────┘
Different Thinking Mechanisms by Provider
This is the key architectural issue: different providers have completely different thinking implementations.
| Provider | Client Controllable? | Control Parameter | Output Format | Supported in PR? |
|---|---|---|---|---|
| Anthropic | ✅ Yes | thinking.budget_tokens |
thinking block |
✅ Yes |
| OpenAI native | ❌ No | N/A | N/A | N/A |
| Qwen (llama.cpp) | chat_template_kwargs |
reasoning_content |
❌ No | |
| DeepSeek | Model internal | reasoning_content |
❌ No | |
| Gemini | ✅ Yes | thought_signature |
Special field |
Semantic Differences
- Anthropic:
budget_tokenscontrols thinking "budget" (low=4K, medium=10K, high=32K, xhigh=128K) - Qwen/DeepSeek:
enable_thinkingis a boolean switch, no budget concept - Gemini: Uses
thought_signaturefor thinking continuity
The thinking_level abstraction only makes sense for Anthropic.
Recommended Fix: Per-Model Configuration
Instead of global AgentDefaults.ThinkingLevel:
{
"model_list": [
{
"model_name": "claude-sonnet",
"provider": "anthropic",
"thinking_level": "high"
},
{
"model_name": "gpt-4",
"provider": "openai"
// No thinking_level - not applicable
},
{
"model_name": "qwen-3.5",
"provider": "openai_compat",
"extra_body": {
"chat_template_kwargs": {"enable_thinking": true}
}
}
]
}This approach:
- ✅ Each model uses its provider's native mechanism
- ✅ No silent configuration ignoring
- ✅ Clear to users what applies where
- ✅ Follows YAGNI principle (no forced abstraction)
Summary
| Issue | Root Cause | Fix |
|---|---|---|
thinking_level ignored by non-Anthropic |
No code to read it in other providers | Per-model config or provider check |
| Different thinking mechanisms | Each provider has its own API | Don't force a unified abstraction |
| User confusion | Silent ignoring + global config | Warnings + per-model config |
Support configurable thinking levels (off/low/medium/high/xhigh/adaptive) via `agents.defaults.thinking_level` config field. - "adaptive": uses Anthropic's adaptive thinking API (Claude 4.6+) - "low/medium/high/xhigh": uses budget_tokens (all thinking-capable models) - "off": disables thinking (default) API constraints handled: - Temperature cleared when thinking is enabled - budget_tokens clamped to max_tokens-1 - Thinking response blocks parsed into Reasoning field Relates to sipeed#645, sipeed#966
- Add ThinkingCapable interface for provider capability detection - Warn when thinking_level is set but provider doesn't support it - Warn when temperature is cleared due to thinking enabled - Adjust budget values per Anthropic best practices (medium=16K, xhigh=64K) - Add budget clamp warning and 80% threshold warning - Add parseResponse thinking block tests - Add thinking_level field to config.example.json
2ad075c to
6ed8f8c
Compare
yinwm
left a comment
There was a problem hiding this comment.
Approving based on the updates. The author addressed the key issues: provider capability check, warning logs, and test coverage. The global config vs per-model can be addressed in a follow-up PR.
yinwm
left a comment
There was a problem hiding this comment.
Reconsidering: Global Config is Still a Problem
I initially approved this, but on reflection, I think the global config is is the core issue that should be fixed in this PR.
The Problem
AgentDefaults.ThinkingLevel (global)
↓
All providers inherit this
↓
Anthropic: works ✅
OpenAI/Qwen/DeepSeek: warns every time ⚠️
If a user has a mixed setup (Anthropic + other providers), they will see repeated warnings every time they non-Anthropic model is used.
Why Warning Logs Don't Solve It
Adding warnings makes the problem visible, but doesn't solve it it user still:
- Can't configure
thinking_levelfor just Anthropic models - Will see noise in logs for every non-Anthropic request
- Has no way to opt-out per-model
The Right Fix: Per-Model Config
{
"model_list": [
{
"model_name": "claude-sonnet",
"provider": "anthropic",
"thinking_level": "high" // ← Per-model, not global
},
{
"model_name": "gpt-4",
"provider": "openai"
// No thinking_level - this model doesn't support it
}
]
}This requires moving ThinkingLevel from AgentDefaults to ModelConfig.
Why This Matters
thinking_levelis provider-specific feature (likemax_tokens_field)- Global config for provider-specific features creates friction
- Per-model config gives users control without log noise
Recommendation: Please move ThinkingLevel to ModelConfig level before merging.
📌 Clarification on Review StatusTo avoid confusion, let me clarify my final position: Status: Request Changes (not Approve) My earlier "Approving based on the updates" comment was a error - I reconsidered and the current review status is correct. Core Issue: Global Config is WrongThe Why warnings don't solve it:
Required fix: Move What I'm Asking
Sorry for the back-and-forth - wanted to be clear about what's needed. |
Thinking is a model-level capability, not a global agent property. Per-model config avoids silent ignoring on non-Anthropic providers and eliminates spurious warning logs in multi-provider setups. Addresses PR sipeed#1076 review feedback from @yinwm.
|
Done. Moved Changes:
|
yinwm
left a comment
There was a problem hiding this comment.
✅ Approved!
Thank you for addressing the core architectural concern!
Key Fix: Per-Model Config
// pkg/agent/instance.go
var thinkingLevelStr string
if mc, err := cfg.GetModelConfig(model); err == nil {
thinkingLevelStr = mc.ThinkingLevel // ← From ModelConfig, not global!
}Now thinking_level is configured per-model:
{
"model_list": [
{
"model_name": "claude-sonnet",
"provider": "anthropic",
"thinking_level": "high" // ← Per-model
},
{
"model_name": "gpt-4",
"provider": "openai"
// No thinking_level needed
}
]
}What's Good
| Item | Status |
|---|---|
| Per-model config | ✅ Fixed |
ThinkingCapable interface |
✅ |
| Warning logs | ✅ |
| Temperature clearing log | ✅ |
| Budget > 80% warning | ✅ |
| Test coverage | ✅ |
| Config example | ✅ |
Summary
This is the right design - thinking_level is now a provider-specific feature configured at the model level, not a global setting that would cause warnings for non-Anthropic providers.
Great work! 🎉
* feat: add extended thinking support for Anthropic models Support configurable thinking levels (off/low/medium/high/xhigh/adaptive) via `agents.defaults.thinking_level` config field. - "adaptive": uses Anthropic's adaptive thinking API (Claude 4.6+) - "low/medium/high/xhigh": uses budget_tokens (all thinking-capable models) - "off": disables thinking (default) API constraints handled: - Temperature cleared when thinking is enabled - budget_tokens clamped to max_tokens-1 - Thinking response blocks parsed into Reasoning field Relates to sipeed#645, sipeed#966 * fix: address PR review feedback for thinking support - Add ThinkingCapable interface for provider capability detection - Warn when thinking_level is set but provider doesn't support it - Warn when temperature is cleared due to thinking enabled - Adjust budget values per Anthropic best practices (medium=16K, xhigh=64K) - Add budget clamp warning and 80% threshold warning - Add parseResponse thinking block tests - Add thinking_level field to config.example.json * refactor: move ThinkingLevel from AgentDefaults to ModelConfig Thinking is a model-level capability, not a global agent property. Per-model config avoids silent ignoring on non-Anthropic providers and eliminates spurious warning logs in multi-provider setups. Addresses PR sipeed#1076 review feedback from @yinwm.
* feat: add extended thinking support for Anthropic models Support configurable thinking levels (off/low/medium/high/xhigh/adaptive) via `agents.defaults.thinking_level` config field. - "adaptive": uses Anthropic's adaptive thinking API (Claude 4.6+) - "low/medium/high/xhigh": uses budget_tokens (all thinking-capable models) - "off": disables thinking (default) API constraints handled: - Temperature cleared when thinking is enabled - budget_tokens clamped to max_tokens-1 - Thinking response blocks parsed into Reasoning field Relates to sipeed#645, sipeed#966 * fix: address PR review feedback for thinking support - Add ThinkingCapable interface for provider capability detection - Warn when thinking_level is set but provider doesn't support it - Warn when temperature is cleared due to thinking enabled - Adjust budget values per Anthropic best practices (medium=16K, xhigh=64K) - Add budget clamp warning and 80% threshold warning - Add parseResponse thinking block tests - Add thinking_level field to config.example.json * refactor: move ThinkingLevel from AgentDefaults to ModelConfig Thinking is a model-level capability, not a global agent property. Per-model config avoids silent ignoring on non-Anthropic providers and eliminates spurious warning logs in multi-provider setups. Addresses PR sipeed#1076 review feedback from @yinwm.
|
@larrykoo711 Great addition bringing extended thinking support with the configurable We have the PicoClaw Dev Group on Discord for contributors. If you'd like to join, send an email to |
* feat: add extended thinking support for Anthropic models Support configurable thinking levels (off/low/medium/high/xhigh/adaptive) via `agents.defaults.thinking_level` config field. - "adaptive": uses Anthropic's adaptive thinking API (Claude 4.6+) - "low/medium/high/xhigh": uses budget_tokens (all thinking-capable models) - "off": disables thinking (default) API constraints handled: - Temperature cleared when thinking is enabled - budget_tokens clamped to max_tokens-1 - Thinking response blocks parsed into Reasoning field Relates to sipeed#645, sipeed#966 * fix: address PR review feedback for thinking support - Add ThinkingCapable interface for provider capability detection - Warn when thinking_level is set but provider doesn't support it - Warn when temperature is cleared due to thinking enabled - Adjust budget values per Anthropic best practices (medium=16K, xhigh=64K) - Add budget clamp warning and 80% threshold warning - Add parseResponse thinking block tests - Add thinking_level field to config.example.json * refactor: move ThinkingLevel from AgentDefaults to ModelConfig Thinking is a model-level capability, not a global agent property. Per-model config avoids silent ignoring on non-Anthropic providers and eliminates spurious warning logs in multi-provider setups. Addresses PR sipeed#1076 review feedback from @yinwm.
📝 Description
Add configurable extended thinking support for Anthropic models. Users can control thinking behavior via the
agents.defaults.thinking_levelconfig field orPICOCLAW_AGENTS_DEFAULTS_THINKING_LEVELenvironment variable.Supported levels:
adaptive: Uses Anthropic's adaptive thinking API withoutput_config.effort=high(Claude 4.6+)low/medium/high/xhigh: Usesbudget_tokens(4096/10000/32000/128000) for all thinking-capable modelsoff(default): Disables thinkingAPI constraints are handled automatically:
budget_tokensis clamped tomax_tokens - 1to prevent API rejection"thinking"content type) are parsed intoLLMResponse.Reasoning🗣️ Type of Change
🤖 AI Code Generation
🔗 Related Issue
Fixes #645 — Reasoning content is no longer silently dropped;
"thinking"blocks are parsed intoLLMResponse.Reasoning.Relates to #966 —
budget_tokensclamping tomax_tokens - 1prevents the scenario where thinking budget consumes all available tokens.📚 Technical Context
LLMResponse.Reasoningfield exists inprotocoltypesbut is never populated by the Anthropic provider. This PR adds the full pipeline: config → agent → provider → response parsing.Architecture
Files Changed (7 files, +313/-16)
pkg/agent/thinking.goThinkingLeveltype, constants, case-insensitive parserpkg/agent/thinking_test.gopkg/providers/anthropic/thinking_test.gopkg/config/config.goThinkingLevelfield onAgentDefaultspkg/agent/instance.gopkg/agent/loop.gollmOpts, conditionally injectthinking_levelpkg/providers/anthropic/provider.goapplyThinkingConfig,levelToBudget, thinking block parsing🧪 Test Environment
📸 Evidence (Optional)
Click to view test output
☑️ Checklist
make checkpasses locally (deps + fmt + vet + test).