Skip to content

feat: add extended thinking support for Anthropic models#1076

Merged
yinwm merged 3 commits intosipeed:mainfrom
larrykoo711:feature/thinking-support
Mar 5, 2026
Merged

feat: add extended thinking support for Anthropic models#1076
yinwm merged 3 commits intosipeed:mainfrom
larrykoo711:feature/thinking-support

Conversation

@larrykoo711
Copy link
Contributor

📝 Description

Add configurable extended thinking support for Anthropic models. Users can control thinking behavior via the agents.defaults.thinking_level config field or PICOCLAW_AGENTS_DEFAULTS_THINKING_LEVEL environment variable.

Supported levels:

  • adaptive: Uses Anthropic's adaptive thinking API with output_config.effort=high (Claude 4.6+)
  • low/medium/high/xhigh: Uses budget_tokens (4096/10000/32000/128000) for all thinking-capable models
  • off (default): Disables thinking

API constraints are handled automatically:

  • Temperature is cleared when thinking is enabled (Anthropic API requirement)
  • budget_tokens is clamped to max_tokens - 1 to prevent API rejection
  • Thinking response blocks ("thinking" content type) are parsed into LLMResponse.Reasoning

🗣️ Type of Change

  • 🐞 Bug fix (non-breaking change which fixes an issue)
  • ✨ New feature (non-breaking change which adds functionality)
  • 📖 Documentation update
  • ⚡ Code refactoring (no functional changes, no api changes)

🤖 AI Code Generation

  • 🤖 Fully AI-generated (100% AI, 0% Human)
  • 🛠️ Mostly AI-generated (AI draft, Human verified/modified)
  • 👨‍💻 Mostly Human-written (Human lead, AI assisted or none)

🔗 Related Issue

Fixes #645 — Reasoning content is no longer silently dropped; "thinking" blocks are parsed into LLMResponse.Reasoning.

Relates to #966budget_tokens clamping to max_tokens - 1 prevents the scenario where thinking budget consumes all available tokens.

📚 Technical Context

  • Reference URL: https://docs.anthropic.com/en/docs/build-with-claude/extended-thinking
  • Reasoning: PicoClaw currently has no thinking/reasoning support for Anthropic models. The LLMResponse.Reasoning field exists in protocoltypes but is never populated by the Anthropic provider. This PR adds the full pipeline: config → agent → provider → response parsing.

Architecture

Config (thinking_level)
  → AgentInstance.ThinkingLevel (parsed via parseThinkingLevel)
    → loop.go injects into llmOpts map
      → buildParams() calls applyThinkingConfig()
        → API params: thinking config + temperature cleared + budget clamped
          → parseResponse() extracts "thinking" blocks → Reasoning field

Files Changed (7 files, +313/-16)

File Change
pkg/agent/thinking.go NewThinkingLevel type, constants, case-insensitive parser
pkg/agent/thinking_test.go New — 14 test cases including case/whitespace tolerance
pkg/providers/anthropic/thinking_test.go New — 8 tests: adaptive, budget levels, clamp, integration
pkg/config/config.go +1 line: ThinkingLevel field on AgentDefaults
pkg/agent/instance.go +4 lines: parse and store thinking level
pkg/agent/loop.go Refactored: extract llmOpts, conditionally inject thinking_level
pkg/providers/anthropic/provider.go +61 lines: applyThinkingConfig, levelToBudget, thinking block parsing

🧪 Test Environment

  • Hardware: MacBook Pro (Apple M4 Max)
  • OS: macOS 15.4
  • Model/Provider: Anthropic Claude Sonnet 4.6 (unit tests with SDK mocks)
  • Channels: N/A (provider-level change, no channel integration needed)

📸 Evidence (Optional)

Click to view test output
=== RUN   TestParseThinkingLevel
=== RUN   TestParseThinkingLevel/off
=== RUN   TestParseThinkingLevel/empty
=== RUN   TestParseThinkingLevel/low
...
=== RUN   TestParseThinkingLevel/upper_HIGH
=== RUN   TestParseThinkingLevel/both_spaces
--- PASS: TestParseThinkingLevel (0.00s)

=== RUN   TestApplyThinkingConfig_Adaptive
--- PASS: TestApplyThinkingConfig_Adaptive (0.00s)
=== RUN   TestApplyThinkingConfig_BudgetLevels
--- PASS: TestApplyThinkingConfig_BudgetLevels (0.00s)
=== RUN   TestApplyThinkingConfig_BudgetClamp
--- PASS: TestApplyThinkingConfig_BudgetClamp (0.00s)
=== RUN   TestBuildParams_ThinkingClearsTemperature
--- PASS: TestBuildParams_ThinkingClearsTemperature (0.00s)
=== RUN   TestBuildParams_NoThinkingKeepsTemperature
--- PASS: TestBuildParams_NoThinkingKeepsTemperature (0.00s)

make check: PASS (0 issues)

☑️ Checklist

  • My code/docs follow the style of this project.
  • I have performed a self-review of my own changes.
  • I have updated the documentation accordingly.
  • make check passes locally (deps + fmt + vet + test).
  • All new code is covered by tests (22 new test cases).
  • AI involvement disclosed above.

Copy link
Collaborator

@yinwm yinwm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: Extended Thinking Support for Anthropic Models

Thanks for this well-structured PR! The implementation is clean and well-tested. However, I have some architectural concerns that should be addressed before merging.


✅ What's Done Well

  1. Clean architecture flow: Config → AgentInstance → loop.go → provider.go
  2. Comprehensive tests: 22 test cases covering edge cases
  3. API constraint handling: Temperature clearing and budget_tokens clamping
  4. Backward compatible: Default is off, no breaking changes

⚠️ Issues to Address

1. Configuration Should Be Per-Model, Not Global

Current design:

AgentDefaults.ThinkingLevel (global)
         ↓
All agents inherit this config
         ↓
Only Anthropic provider recognizes it
         ↓
Other providers silently ignore it ❌

Problem: thinking_level in AgentDefaults applies to all models, but only Anthropic supports it. If a user configures thinking_level: high globally and uses OpenAI/Qwen models, the setting is silently ignored.

Suggestion: Move ThinkingLevel to ModelConfig level instead:

// config/config.go
type ModelConfig struct {
    // ... existing fields
    ThinkingLevel string `json:"thinking_level,omitempty"`
}

This way, each model can have its own thinking configuration appropriate for its provider.


2. Silent Configuration Ignoring

When thinking_level is set but the provider doesn't support it, there's no warning. Users will be confused why their configuration has no effect.

Suggestion: Add a warning when the configuration is ignored:

// loop.go
if agent.ThinkingLevel != ThinkingOff {
    if !supportsThinking(agent.Provider) {
        log.Printf("WARN: thinking_level=%s is set but provider %s does not support it",
                   agent.ThinkingLevel, agent.Provider.Name())
    } else {
        llmOpts["thinking_level"] = string(agent.ThinkingLevel)
    }
}

3. Temperature Cleared Silently

When thinking is enabled, temperature is cleared (Anthropic API requirement), but users aren't notified:

// provider.go
params.Temperature = anthropic.MessageNewParams{}.Temperature

Suggestion: Add a warning when temperature is being cleared:

if params.Temperature.Valid() {
    log.Printf("WARN: temperature is cleared because thinking is enabled (level=%s)", level)
}

4. Budget vs MaxTokens Relationship Undocumented

The mapping from levels to token budgets isn't documented in config:

  • low = 4,096 tokens
  • medium = 10,000 tokens
  • high = 32,000 tokens
  • xhigh = 128,000 tokens

Users may not realize that thinking_level: xhigh (128K) could consume most of their output budget if max_tokens is small.

Suggestion:

  • Document the token budgets in config example
  • Add a warning when budget would exceed 80% of max_tokens

5. Missing Test for parseResponse Thinking Block

Tests cover applyThinkingConfig but not the parseResponse logic:

case "thinking":
    tb := block.AsThinking()
    reasoning.WriteString(tb.Thinking)

Suggestion: Add a test case verifying thinking blocks are correctly parsed into the Reasoning field.


🔍 Architectural Context

Different providers have completely different thinking mechanisms:

Provider Control Method Output Format
Anthropic thinking.budget_tokens param thinking block
OpenAI compat (Qwen/DeepSeek) chat_template_kwargs (not supported yet) reasoning_content field
Gemini thought_signature Special field

The current PR only addresses Anthropic, which is fine, but the global configuration approach creates confusion for multi-provider setups.


📋 Summary

Issue Priority Suggested Fix
Global vs per-model config High Move to ModelConfig
Silent ignoring of config Medium Add warning log
Silent temperature clearing Medium Add warning log
Budget documentation Low Add to config example
Missing parseResponse test Low Add test case

Recommendation: Request changes to address the per-model configuration issue and add appropriate warnings. The core implementation is solid, just needs better UX for configuration handling.

@yinwm
Copy link
Collaborator

yinwm commented Mar 4, 2026

📊 Architecture Analysis & Provider Comparison

Following up on my review, here's a detailed analysis of how different providers handle the thinking_level parameter.


Options Processing by Provider

Current Parameters Passed via options

// loop.go
llmOpts := map[string]any{
    "max_tokens":       agent.MaxTokens,
    "temperature":      agent.Temperature,
    "prompt_cache_key": agent.ID,
}
// PR adds:
llmOpts["thinking_level"] = string(agent.ThinkingLevel)

Which Provider Recognizes What

Provider max_tokens temperature prompt_cache_key thinking_level
anthropic ✅ (this PR)
openai_compat silently ignored
codex silently ignored
antigravity silently ignored

Why OpenAI Compat Provider Ignores thinking_level

Looking at the code in openai_compat/provider.go:

func (p *Provider) Chat(..., options map[string]any) (*LLMResponse, error) {
    requestBody := map[string]any{
        "model":    model,
        "messages": serializeMessages(messages),
    }
    
    // Only these 3-4 parameters are recognized:
    if maxTokens, ok := asInt(options["max_tokens"]); ok { /* handled */ }
    if temperature, ok := asFloat(options["temperature"]); ok { /* handled */ }
    if cacheKey, ok := options["prompt_cache_key"].(string); ok { /* handled */ }
    
    // ❌ No code reads options["thinking_level"]
    // ❌ Other options are silently ignored
}

The parameter is ignored simply because there's no code to read it.


Data Flow Architecture

┌─────────────────────────────────────────────────────────────────┐
│                      AgentDefaults (Global)                      │
│  ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌────────────┐ │
│  │ max_tokens  │ │ temperature │ │thinking_level│ │   ...      │ │
│  └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ └────────────┘ │
└─────────┼───────────────┼───────────────┼───────────────────────┘
          │               │               │
          ▼               ▼               ▼
┌─────────────────────────────────────────────────────────────────┐
│                         loop.go                                 │
│                llmOpts := map[string]any{...}                   │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
        ┌─────────────────────┴─────────────────────┐
        ▼                     ▼                     ▼
   ┌──────────┐        ┌────────────┐        ┌───────────┐
   │ Anthropic│        │openai_compat│        │  Others   │
   └────┬─────┘        └─────┬──────┘        └─────┬─────┘
        │                    │                     │
        ▼                    ▼                     ▼
   ┌──────────┐        ┌────────────┐        ┌───────────┐
   │Recognizes│        │  Ignores   │        │  Ignores  │
   │   ALL    │        │thinking_   │        │thinking_  │
   │  params  │        │  level     │        │  level    │
   └──────────┘        └────────────┘        └───────────┘

Different Thinking Mechanisms by Provider

This is the key architectural issue: different providers have completely different thinking implementations.

Provider Client Controllable? Control Parameter Output Format Supported in PR?
Anthropic ✅ Yes thinking.budget_tokens thinking block ✅ Yes
OpenAI native ❌ No N/A N/A N/A
Qwen (llama.cpp) ⚠️ Partial chat_template_kwargs reasoning_content ❌ No
DeepSeek ⚠️ Partial Model internal reasoning_content ❌ No
Gemini ✅ Yes thought_signature Special field ⚠️ Partial

Semantic Differences

  • Anthropic: budget_tokens controls thinking "budget" (low=4K, medium=10K, high=32K, xhigh=128K)
  • Qwen/DeepSeek: enable_thinking is a boolean switch, no budget concept
  • Gemini: Uses thought_signature for thinking continuity

The thinking_level abstraction only makes sense for Anthropic.


Recommended Fix: Per-Model Configuration

Instead of global AgentDefaults.ThinkingLevel:

{
  "model_list": [
    {
      "model_name": "claude-sonnet",
      "provider": "anthropic",
      "thinking_level": "high"
    },
    {
      "model_name": "gpt-4",
      "provider": "openai"
      // No thinking_level - not applicable
    },
    {
      "model_name": "qwen-3.5",
      "provider": "openai_compat",
      "extra_body": {
        "chat_template_kwargs": {"enable_thinking": true}
      }
    }
  ]
}

This approach:

  • ✅ Each model uses its provider's native mechanism
  • ✅ No silent configuration ignoring
  • ✅ Clear to users what applies where
  • ✅ Follows YAGNI principle (no forced abstraction)

Summary

Issue Root Cause Fix
thinking_level ignored by non-Anthropic No code to read it in other providers Per-model config or provider check
Different thinking mechanisms Each provider has its own API Don't force a unified abstraction
User confusion Silent ignoring + global config Warnings + per-model config

Support configurable thinking levels (off/low/medium/high/xhigh/adaptive)
via `agents.defaults.thinking_level` config field.

- "adaptive": uses Anthropic's adaptive thinking API (Claude 4.6+)
- "low/medium/high/xhigh": uses budget_tokens (all thinking-capable models)
- "off": disables thinking (default)

API constraints handled:
- Temperature cleared when thinking is enabled
- budget_tokens clamped to max_tokens-1
- Thinking response blocks parsed into Reasoning field

Relates to sipeed#645, sipeed#966
- Add ThinkingCapable interface for provider capability detection
- Warn when thinking_level is set but provider doesn't support it
- Warn when temperature is cleared due to thinking enabled
- Adjust budget values per Anthropic best practices (medium=16K, xhigh=64K)
- Add budget clamp warning and 80% threshold warning
- Add parseResponse thinking block tests
- Add thinking_level field to config.example.json
@larrykoo711 larrykoo711 force-pushed the feature/thinking-support branch from 2ad075c to 6ed8f8c Compare March 4, 2026 12:36
Copy link
Collaborator

@yinwm yinwm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving based on the updates. The author addressed the key issues: provider capability check, warning logs, and test coverage. The global config vs per-model can be addressed in a follow-up PR.

Copy link
Collaborator

@yinwm yinwm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reconsidering: Global Config is Still a Problem

I initially approved this, but on reflection, I think the global config is is the core issue that should be fixed in this PR.

The Problem

AgentDefaults.ThinkingLevel (global)
         ↓
All providers inherit this
         ↓
Anthropic: works ✅
OpenAI/Qwen/DeepSeek: warns every time ⚠️

If a user has a mixed setup (Anthropic + other providers), they will see repeated warnings every time they non-Anthropic model is used.

Why Warning Logs Don't Solve It

Adding warnings makes the problem visible, but doesn't solve it it user still:

  1. Can't configure thinking_level for just Anthropic models
  2. Will see noise in logs for every non-Anthropic request
  3. Has no way to opt-out per-model

The Right Fix: Per-Model Config

{
  "model_list": [
    {
      "model_name": "claude-sonnet",
      "provider": "anthropic",
      "thinking_level": "high"  // ← Per-model, not global
    },
    {
      "model_name": "gpt-4",
      "provider": "openai"
      // No thinking_level - this model doesn't support it
    }
  ]
}

This requires moving ThinkingLevel from AgentDefaults to ModelConfig.

Why This Matters

  • thinking_level is provider-specific feature (like max_tokens_field)
  • Global config for provider-specific features creates friction
  • Per-model config gives users control without log noise

Recommendation: Please move ThinkingLevel to ModelConfig level before merging.

@yinwm
Copy link
Collaborator

yinwm commented Mar 4, 2026

📌 Clarification on Review Status

To avoid confusion, let me clarify my final position:

Status: Request Changes (not Approve)

My earlier "Approving based on the updates" comment was a error - I reconsidered and the current review status is correct.

Core Issue: Global Config is Wrong

The thinking_level config is in AgentDefaults (global), but only Anthropic supports it. This is a fundamental design issue that should be fixed in this PR, not deferred.

Why warnings don't solve it:

  • User with Anthropic + OpenAI models gets repeated warnings
  • No way to configure thinking_level for just Anthropic models
  • Warnings = noise, not a solution

Required fix: Move ThinkingLevel from AgentDefaults to ModelConfig.

What I'm Asking

  1. Move config to per-model level
  2. Keep the ThinkingCapable interface check (good addition)
  3. Keep warning logs (good addition)

Sorry for the back-and-forth - wanted to be clear about what's needed.

Thinking is a model-level capability, not a global agent property.
Per-model config avoids silent ignoring on non-Anthropic providers
and eliminates spurious warning logs in multi-provider setups.

Addresses PR sipeed#1076 review feedback from @yinwm.
@larrykoo711
Copy link
Contributor Author

larrykoo711 commented Mar 4, 2026

Done. Moved ThinkingLevel from AgentDefaults to ModelConfig in commit 4ea4446.

Changes:

  • pkg/config/config.go — Added ThinkingLevel field to ModelConfig, removed from AgentDefaults
  • pkg/agent/instance.go — Now reads from cfg.GetModelConfig(model) instead of defaults
  • config/config.example.json — Moved field from agents.defaults to the anthropic model entry

ThinkingCapable interface check and warning logs remain unchanged. Build passes, all tests green.

Copy link
Collaborator

@yinwm yinwm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Approved!

Thank you for addressing the core architectural concern!

Key Fix: Per-Model Config

// pkg/agent/instance.go
var thinkingLevelStr string
if mc, err := cfg.GetModelConfig(model); err == nil {
    thinkingLevelStr = mc.ThinkingLevel  // ← From ModelConfig, not global!
}

Now thinking_level is configured per-model:

{
  "model_list": [
    {
      "model_name": "claude-sonnet",
      "provider": "anthropic",
      "thinking_level": "high"  // ← Per-model
    },
    {
      "model_name": "gpt-4",
      "provider": "openai"
      // No thinking_level needed
    }
  ]
}

What's Good

Item Status
Per-model config ✅ Fixed
ThinkingCapable interface
Warning logs
Temperature clearing log
Budget > 80% warning
Test coverage
Config example

Summary

This is the right design - thinking_level is now a provider-specific feature configured at the model level, not a global setting that would cause warnings for non-Anthropic providers.

Great work! 🎉

@yinwm yinwm merged commit 204038e into sipeed:main Mar 5, 2026
2 checks passed
@larrykoo711 larrykoo711 deleted the feature/thinking-support branch March 5, 2026 02:30
hyperwd pushed a commit to hyperwd/picoclaw that referenced this pull request Mar 5, 2026
* feat: add extended thinking support for Anthropic models

Support configurable thinking levels (off/low/medium/high/xhigh/adaptive)
via `agents.defaults.thinking_level` config field.

- "adaptive": uses Anthropic's adaptive thinking API (Claude 4.6+)
- "low/medium/high/xhigh": uses budget_tokens (all thinking-capable models)
- "off": disables thinking (default)

API constraints handled:
- Temperature cleared when thinking is enabled
- budget_tokens clamped to max_tokens-1
- Thinking response blocks parsed into Reasoning field

Relates to sipeed#645, sipeed#966

* fix: address PR review feedback for thinking support

- Add ThinkingCapable interface for provider capability detection
- Warn when thinking_level is set but provider doesn't support it
- Warn when temperature is cleared due to thinking enabled
- Adjust budget values per Anthropic best practices (medium=16K, xhigh=64K)
- Add budget clamp warning and 80% threshold warning
- Add parseResponse thinking block tests
- Add thinking_level field to config.example.json

* refactor: move ThinkingLevel from AgentDefaults to ModelConfig

Thinking is a model-level capability, not a global agent property.
Per-model config avoids silent ignoring on non-Anthropic providers
and eliminates spurious warning logs in multi-provider setups.

Addresses PR sipeed#1076 review feedback from @yinwm.
keithy added a commit to keithy/picoclaw that referenced this pull request Mar 5, 2026
* feat: add extended thinking support for Anthropic models

Support configurable thinking levels (off/low/medium/high/xhigh/adaptive)
via `agents.defaults.thinking_level` config field.

- "adaptive": uses Anthropic's adaptive thinking API (Claude 4.6+)
- "low/medium/high/xhigh": uses budget_tokens (all thinking-capable models)
- "off": disables thinking (default)

API constraints handled:
- Temperature cleared when thinking is enabled
- budget_tokens clamped to max_tokens-1
- Thinking response blocks parsed into Reasoning field

Relates to sipeed#645, sipeed#966

* fix: address PR review feedback for thinking support

- Add ThinkingCapable interface for provider capability detection
- Warn when thinking_level is set but provider doesn't support it
- Warn when temperature is cleared due to thinking enabled
- Adjust budget values per Anthropic best practices (medium=16K, xhigh=64K)
- Add budget clamp warning and 80% threshold warning
- Add parseResponse thinking block tests
- Add thinking_level field to config.example.json

* refactor: move ThinkingLevel from AgentDefaults to ModelConfig

Thinking is a model-level capability, not a global agent property.
Per-model config avoids silent ignoring on non-Anthropic providers
and eliminates spurious warning logs in multi-provider setups.

Addresses PR sipeed#1076 review feedback from @yinwm.
@Orgmar
Copy link
Contributor

Orgmar commented Mar 6, 2026

@larrykoo711 Great addition bringing extended thinking support with the configurable thinking_level. The tiered approach from adaptive to specific budget levels gives users nice flexibility for balancing quality and cost.

We have the PicoClaw Dev Group on Discord for contributors. If you'd like to join, send an email to [email protected] with the subject [Join PicoClaw Dev Group] larrykoo711 and we'll send you the invite!

fishtrees pushed a commit to fishtrees/picoclaw that referenced this pull request Mar 12, 2026
* feat: add extended thinking support for Anthropic models

Support configurable thinking levels (off/low/medium/high/xhigh/adaptive)
via `agents.defaults.thinking_level` config field.

- "adaptive": uses Anthropic's adaptive thinking API (Claude 4.6+)
- "low/medium/high/xhigh": uses budget_tokens (all thinking-capable models)
- "off": disables thinking (default)

API constraints handled:
- Temperature cleared when thinking is enabled
- budget_tokens clamped to max_tokens-1
- Thinking response blocks parsed into Reasoning field

Relates to sipeed#645, sipeed#966

* fix: address PR review feedback for thinking support

- Add ThinkingCapable interface for provider capability detection
- Warn when thinking_level is set but provider doesn't support it
- Warn when temperature is cleared due to thinking enabled
- Adjust budget values per Anthropic best practices (medium=16K, xhigh=64K)
- Add budget clamp warning and 80% threshold warning
- Add parseResponse thinking block tests
- Add thinking_level field to config.example.json

* refactor: move ThinkingLevel from AgentDefaults to ModelConfig

Thinking is a model-level capability, not a global agent property.
Per-model config avoids silent ignoring on non-Anthropic providers
and eliminates spurious warning logs in multi-provider setups.

Addresses PR sipeed#1076 review feedback from @yinwm.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] Reasoning Content Is Silently Dropped Instead of Being Exposed or Routed

3 participants