Skip to content

feat(routing): intelligent model routing based on structural complexity scoring#994

Merged
mengzhuo merged 8 commits intosipeed:mainfrom
is-Xiaoen:feat/model-routing
Mar 6, 2026
Merged

feat(routing): intelligent model routing based on structural complexity scoring#994
mengzhuo merged 8 commits intosipeed:mainfrom
is-Xiaoen:feat/model-routing

Conversation

@is-Xiaoen
Copy link
Contributor

Closes / relates to #295.


What this does

Adds a lightweight model routing layer that dispatches each incoming message to either a light model (fast, cheap) or the configured primary model, based entirely on structural signals extracted from the message and its session context. No keyword matching, no NLP, no external calls — just properties of the message's shape.

The routing decision is made once at the start of each conversation turn and stays fixed for all tool-follow-up iterations within that turn, so a multi-step tool chain never switches models mid-way.

Why structural scoring

As raised in the issue discussion, any approach that looks at natural-language content (keywords, patterns) breaks for non-English users. Structural features sidestep this entirely:

Feature Signal
Token estimate (rune_count / 3) Verbosity proxy, CJK-safe
Fenced code blocks (``` pairs) Coding/technical task
Recent tool call density (last 6 msgs) Active agentic workflow
Conversation depth Accumulated complexity
Attachments (data URI / media ext) Multi-modal → always heavy

The scorer is exposed behind a Classifier interface, so a future ML-based implementation only needs to implement Score(Features) float64 — feature extraction stays unchanged.

Routing logic

score >= threshold  →  primary model  (default: claude-sonnet-4-6)
score <  threshold  →  light model    (e.g. gemini-flash)

Default threshold is 0.35. At this value:

  • "hi" → 0.00 → light ✓
  • message with a code block → 0.40 → heavy ✓
  • message > 200 tokens → 0.35 → heavy ✓
  • image attachment → 1.00 → heavy ✓

Config

Fully opt-in. Existing configs are completely unaffected.

{
  "agents": {
    "defaults": {
      "model": "claude-sonnet-4-6",
      "routing": {
        "enabled": true,
        "light_model": "gemini-flash",
        "threshold": 0.35
      }
    }
  }
}

light_model references a model_name in model_list, so it works with any provider. If the name isn't found at startup, routing is silently disabled and the primary model is used — no crash, no config error.

Files changed

File Change
pkg/routing/features.go Features struct + ExtractFeatures()
pkg/routing/classifier.go Classifier interface + RuleClassifier
pkg/routing/router.go Router.SelectModel()
pkg/routing/router_test.go 34 tests, all passing
pkg/config/config.go RoutingConfig added to AgentDefaults
pkg/agent/instance.go Pre-resolve Router + LightCandidates at agent creation
pkg/agent/loop.go selectCandidates() helper + active candidate wiring

773 lines added, 33 modified, 0 new dependencies.

What's intentionally left for v2

  • ML-based classifier (the Classifier interface supports a drop-in swap)
  • Per-agent routing override (currently applies to defaults only)
  • Quality-based escalation (detect when the light model's answer is insufficient and retry with heavy)
  • /model command override to force a specific tier for one turn

Introduce RoutingConfig with three fields:
  - enabled: activates per-turn model routing
  - light_model: references a model_name in model_list
  - threshold: complexity score cutoff in [0,1]

When routing.enabled is true and the incoming message scores below
threshold, the agent switches to light_model for that turn. Absent or
disabled config leaves existing behaviour completely unchanged.

Example:
  "agents": {
    "defaults": {
      "model": "claude-sonnet-4-6",
      "routing": {
        "enabled": true,
        "light_model": "gemini-flash",
        "threshold": 0.35
      }
    }
  }
Add three new files to pkg/routing/:

features.go — ExtractFeatures(msg, history) → Features
  Computes five structural dimensions with zero keyword matching:
  - TokenEstimate: rune_count/3 (CJK-safe token proxy)
  - CodeBlockCount: ``` pairs in the message
  - RecentToolCalls: tool call count in the last 6 history entries
  - ConversationDepth: total messages in session
  - HasAttachments: data URIs or media file extensions

classifier.go — Classifier interface + RuleClassifier
  RuleClassifier uses a weighted sum that is capped at 1.0:
    code block      → +0.40  (triggers heavy model alone at 0.35 threshold)
    token > 200     → +0.35  (triggers heavy model alone)
    tool calls > 3  → +0.25
    token 50-200    → +0.15
    conversation depth > 10 → +0.10
    attachment      → 1.00 (hard gate, always heavy)

router.go — Router wraps config + Classifier
  Router.SelectModel(msg, history, primaryModel) returns either the
  configured light_model or the primary model depending on whether
  the complexity score clears the threshold. Threshold defaults to
  0.35 when zero/negative to prevent misconfiguration.

router_test.go — 34 tests covering all branches and edge cases
instance.go:
  - Add Router *routing.Router and LightCandidates []FallbackCandidate
    to AgentInstance.
  - At agent creation, when routing.enabled and light_model resolves
    successfully in model_list, pre-build the Router and resolve the
    light model candidates once. If the light model isn't in model_list,
    log a warning and disable routing for that agent gracefully.

loop.go:
  - Add selectCandidates(agent, userMsg, history) helper.
    It calls Router.SelectModel and returns either agent.Candidates /
    agent.Model (primary tier) or agent.LightCandidates / light_model
    (light tier). Returns primary unchanged when routing is disabled.
  - In runLLMIteration, resolve (activeCandidates, activeModel) once
    before entering the tool-iteration loop. The model tier is sticky
    for the entire turn so a multi-step tool chain doesn't switch
    models mid-way.
  - Replace hard-coded agent.Candidates / agent.Model references in
    callLLM and the debug log with the resolved active values.

The fallback chain and retry logic are untouched. When light_model
returns an error the fallback chain handles escalation normally.
- classifier.go: s/honour/honor/ (American English per misspell)
- router.go: break SelectModel signature across lines (golines)
- router_test.go: break long Message literal (golines)
- router_test.go: replace CJK string literal with rune slice so
  gosmopolitan does not flag the source file; behaviour is identical
Copy link

@nikolasdehor nikolasdehor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent PR. This is a well-designed, language-agnostic model routing system with solid engineering decisions throughout. Here is a detailed review:

Architecture:

  • Clean separation: Features (extraction) -> Classifier (scoring) -> Router (decision). The Classifier interface makes future ML-based implementations a drop-in swap.
  • Structural signals only (token count, code blocks, tool density, conversation depth, attachments) -- no keyword matching, works across all languages including CJK.
  • Per-turn sticky routing (selectCandidates called once per turn) prevents model switching mid-tool-chain. This is the right design.

Code quality:

  • 34 tests covering all edge cases: zero features, hard gates, boundary conditions, custom thresholds, custom classifiers.
  • The newWithClassifier testing hook is a good pattern.
  • Token estimation using rune count / 3 is a reasonable CJK-safe approximation.
  • Score capped at 1.0 to honor the [0,1] contract even when multiple signals fire.
  • Graceful degradation: if light_model is not found in model_list, routing is silently disabled.
  • Fully opt-in: existing configs are unaffected.

Minor observations (none blocking):

  1. hasAttachments checks for file extensions like .jpg/.png in message text. This could false-positive on messages like 'rename file.jpg to file.png' where no actual attachment exists. Since this is conservative (routes to heavy model), it is the right error direction.
  2. The token estimation of rune_count/3 slightly over-estimates for pure ASCII text (tokens are ~4 chars for English). This is intentional -- over-estimating routes to heavy model, which is safer.
  3. ConversationDepth uses len(history) which includes tool results and system messages. This is fine since it is a proxy for accumulated complexity, not a strict turn count.

The PR description is also one of the best I have seen -- clear problem statement, design rationale, scoring table, and explicit v2 deferral list. Ship it.

Main reformatted the fallback.Execute call to multi-line (golines);
our branch renamed agent.Candidates → activeCandidates for routing.
Kept both: multi-line formatting + routing variable.
@CLAassistant
Copy link

CLAassistant commented Mar 5, 2026

CLA assistant check
All committers have signed the CLA.

Upstream added ThinkingLevel, SummarizeMessageThreshold,
SummarizeTokenPercent, MaxMediaSize, and maybeSummarize.
Our branch added Router, LightCandidates, and selectCandidates.
Both sets of changes are kept. Dead updateToolContexts removed
(upstream deleted it; no callers exist).
Copy link
Collaborator

@mengzhuo mengzhuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please drop all .md files

Copy link
Contributor Author

@is-Xiaoen is-Xiaoen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mengzhuo ,Done — removed all accidentally committed files. Sorry about that.

Copy link
Contributor

@mingmxren mingmxren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review focusing on two issues: CJK token estimation bias and routing observability.

// estimateTokens returns a conservative token count proxy.
// Using rune count / 3 rather than / 4 because CJK characters each map to
// roughly one token, while ASCII words average ~1.3 chars/token. Dividing
// by 3 is a safe middle ground that slightly over-estimates for Latin text
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CJK token estimation is significantly biased — undermines language-agnostic claim

rune_count / 3 underestimates CJK text by ~3x:

Text type Actual ratio Estimate (/3) Effect
ASCII ~0.25 tokens/char ~0.33 tokens/char Slight overestimate → conservative (OK)
CJK ~1 token/rune ~0.33 tokens/rune 3x underestimate → biased toward light model

Concrete example: A 200-character Chinese message ≈ 200 real tokens, but estimateTokens returns 66 — falling into the "medium" bucket (score 0.15) instead of "long" (score 0.35). This means complex CJK messages get routed to the light model when they shouldn't.

This directly contradicts the stated design goal:

any approach that looks at natural-language content breaks for non-English users. Structural features sidestep this entirely.

The token estimation is a structural feature, but the divisor is calibrated for Latin text only.

Suggestion: Detect the CJK rune ratio and adjust accordingly, e.g.:

func estimateTokens(msg string) int {
    total := utf8.RuneCountInString(msg)
    if total == 0 {
        return 0
    }
    cjk := 0
    for _, r := range msg {
        if r >= 0x2E80 && r <= 0x9FFF || r >= 0xF900 && r <= 0xFAFF {
            cjk++
        }
    }
    // CJK runes ≈ 1 token each; non-CJK runes ≈ 0.25 tokens each
    return cjk + (total-cjk)/4
}

Or at minimum, use a smaller divisor (e.g., /2) as a compromise.

The test TestExtractFeatures_TokenEstimate_CJK currently asserts 9 runes / 3 = 3, which passes but validates the wrong behavior. A 9-character CJK message should estimate closer to 9 tokens, not 3.

return agent.Candidates, agent.Model
}

logger.InfoCF("agent", "Model routing: light model selected",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Routing decision log is missing the actual score

The log entry includes agent_id, light_model, and threshold but not the computed score. When debugging why a particular message was routed to the light (or heavy) model, the score is the single most important piece of information.

Also, there is no log line at all when the primary model is selected. For routing observability, it would be helpful to log both paths — at least at debug level for the primary-model case.

Suggestion: Have SelectModel return the score as a third value:

func (r *Router) SelectModel(
    msg string,
    history []providers.Message,
    primaryModel string,
) (model string, usedLight bool, score float64) {
    features := ExtractFeatures(msg, history)
    score = r.classifier.Score(features)
    if score < r.cfg.Threshold {
        return r.cfg.LightModel, true, score
    }
    return primaryModel, false, score
}

Then include it in the log:

logger.InfoCF("agent", "Model routing: light model selected",
    map[string]any{
        "agent_id":    agent.ID,
        "light_model": agent.Router.LightModel(),
        "threshold":   agent.Router.Threshold(),
        "score":       score,  // <-- critical for debugging
    })

This also avoids the current issue where selectCandidates discards the first return value from SelectModel (line 1195).

…lity

1. CJK token estimation: replace flat rune_count/3 with script-aware
   counting — CJK runes (U+2E80–U+9FFF, U+F900–U+FAFF, U+AC00–U+D7AF)
   count as 1 token each, non-CJK runes at /4. This fixes a 3x
   underestimate for Chinese/Japanese/Korean text that could incorrectly
   route complex CJK messages to the light model.

2. Routing observability: SelectModel now returns the computed score as
   a third value. selectCandidates logs the score on both paths — Info
   level for light model selection, Debug level for primary model
   selection.

3. Added tests: TestExtractFeatures_TokenEstimate_Mixed (CJK+ASCII mix),
   TestRouter_SelectModel_ReturnsScore.

Addresses review feedback from @mingmxren.
@is-Xiaoen
Copy link
Contributor Author

@mingmxren Thanks for the thorough review — both points are solid and now addressed in b84adac:

CJK token estimation: Replaced the flat rune_count/3 with script-aware counting. CJK runes (U+2E80–U+9FFF, U+F900–U+FAFF, U+AC00–U+D7AF) now count as 1 token each, non-CJK at /4. A 200-char Chinese message correctly estimates to ~200 tokens instead of 66. Added TestExtractFeatures_TokenEstimate_Mixed for CJK+ASCII mix coverage.

Routing observability: SelectModel now returns the computed score as a third value. selectCandidates logs the score on both paths — Info for light, Debug for primary. No more discarded return values.

Copy link
Collaborator

@yinwm yinwm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Summary

This is a well-designed and high-quality PR. The architecture is clean, test coverage is comprehensive, and documentation is thorough.

What I verified:

  • CJK range coverage: The range 0x2E80-0x9FFF correctly covers Hiragana (U+3040-U+309F) and Katakana (U+30A0-U+30FF) - I initially misanalyzed this, but the implementation is correct.
  • Token estimation logic: CJK runes count as ~1 token, non-CJK as ~0.25 tokens - reasonable approximation.
  • Routing decision stickiness: Correctly maintains model selection throughout a turn to avoid mid-chain switches.
  • Graceful degradation: Light model resolution failure disables routing without crashing.

Minor suggestions (non-blocking):

  1. Consider adding validation for threshold > 1.0 in addition to threshold <= 0
  2. hasAttachments could have false positives (e.g., mentioning "photo.jpg" without an actual attachment), but this is acceptable as a conservative approach

Test Coverage

  • 34 test cases covering all branches and edge cases
  • CJK mixed text estimation tests added

LGTM ✅

Copy link
Collaborator

@yinwm yinwm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mengzhuo mengzhuo merged commit 9b1e73d into sipeed:main Mar 6, 2026
4 checks passed
dj-oyu pushed a commit to dj-oyu/picoclaw that referenced this pull request Mar 8, 2026
feat(routing): intelligent model routing based on structural complexity scoring
fishtrees pushed a commit to fishtrees/picoclaw that referenced this pull request Mar 12, 2026
feat(routing): intelligent model routing based on structural complexity scoring
dj-oyu pushed a commit to dj-oyu/picoclaw that referenced this pull request Mar 14, 2026
feat(routing): intelligent model routing based on structural complexity scoring
@ashleydb ashleydb mentioned this pull request Mar 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants