feat(routing): intelligent model routing based on structural complexity scoring by is-Xiaoen · Pull Request #994 · sipeed/picoclaw

is-Xiaoen · 2026-03-02T14:45:33Z

Closes / relates to #295.

What this does

Adds a lightweight model routing layer that dispatches each incoming message to either a light model (fast, cheap) or the configured primary model, based entirely on structural signals extracted from the message and its session context. No keyword matching, no NLP, no external calls — just properties of the message's shape.

The routing decision is made once at the start of each conversation turn and stays fixed for all tool-follow-up iterations within that turn, so a multi-step tool chain never switches models mid-way.

Why structural scoring

As raised in the issue discussion, any approach that looks at natural-language content (keywords, patterns) breaks for non-English users. Structural features sidestep this entirely:

Feature	Signal
Token estimate (`rune_count / 3`)	Verbosity proxy, CJK-safe
Fenced code blocks (``` pairs)	Coding/technical task
Recent tool call density (last 6 msgs)	Active agentic workflow
Conversation depth	Accumulated complexity
Attachments (data URI / media ext)	Multi-modal → always heavy

The scorer is exposed behind a Classifier interface, so a future ML-based implementation only needs to implement Score(Features) float64 — feature extraction stays unchanged.

Routing logic

score >= threshold  →  primary model  (default: claude-sonnet-4-6)
score <  threshold  →  light model    (e.g. gemini-flash)

Default threshold is 0.35. At this value:

"hi" → 0.00 → light ✓
message with a code block → 0.40 → heavy ✓
message > 200 tokens → 0.35 → heavy ✓
image attachment → 1.00 → heavy ✓

Config

Fully opt-in. Existing configs are completely unaffected.

{
  "agents": {
    "defaults": {
      "model": "claude-sonnet-4-6",
      "routing": {
        "enabled": true,
        "light_model": "gemini-flash",
        "threshold": 0.35
      }
    }
  }
}

light_model references a model_name in model_list, so it works with any provider. If the name isn't found at startup, routing is silently disabled and the primary model is used — no crash, no config error.

Files changed

File	Change
`pkg/routing/features.go`	`Features` struct + `ExtractFeatures()`
`pkg/routing/classifier.go`	`Classifier` interface + `RuleClassifier`
`pkg/routing/router.go`	`Router.SelectModel()`
`pkg/routing/router_test.go`	34 tests, all passing
`pkg/config/config.go`	`RoutingConfig` added to `AgentDefaults`
`pkg/agent/instance.go`	Pre-resolve `Router` + `LightCandidates` at agent creation
`pkg/agent/loop.go`	`selectCandidates()` helper + active candidate wiring

773 lines added, 33 modified, 0 new dependencies.

What's intentionally left for v2

ML-based classifier (the Classifier interface supports a drop-in swap)
Per-agent routing override (currently applies to defaults only)
Quality-based escalation (detect when the light model's answer is insufficient and retry with heavy)
/model command override to force a specific tier for one turn

Introduce RoutingConfig with three fields: - enabled: activates per-turn model routing - light_model: references a model_name in model_list - threshold: complexity score cutoff in [0,1] When routing.enabled is true and the incoming message scores below threshold, the agent switches to light_model for that turn. Absent or disabled config leaves existing behaviour completely unchanged. Example: "agents": { "defaults": { "model": "claude-sonnet-4-6", "routing": { "enabled": true, "light_model": "gemini-flash", "threshold": 0.35 } } }

Add three new files to pkg/routing/: features.go — ExtractFeatures(msg, history) → Features Computes five structural dimensions with zero keyword matching: - TokenEstimate: rune_count/3 (CJK-safe token proxy) - CodeBlockCount: ``` pairs in the message - RecentToolCalls: tool call count in the last 6 history entries - ConversationDepth: total messages in session - HasAttachments: data URIs or media file extensions classifier.go — Classifier interface + RuleClassifier RuleClassifier uses a weighted sum that is capped at 1.0: code block → +0.40 (triggers heavy model alone at 0.35 threshold) token > 200 → +0.35 (triggers heavy model alone) tool calls > 3 → +0.25 token 50-200 → +0.15 conversation depth > 10 → +0.10 attachment → 1.00 (hard gate, always heavy) router.go — Router wraps config + Classifier Router.SelectModel(msg, history, primaryModel) returns either the configured light_model or the primary model depending on whether the complexity score clears the threshold. Threshold defaults to 0.35 when zero/negative to prevent misconfiguration. router_test.go — 34 tests covering all branches and edge cases

instance.go: - Add Router *routing.Router and LightCandidates []FallbackCandidate to AgentInstance. - At agent creation, when routing.enabled and light_model resolves successfully in model_list, pre-build the Router and resolve the light model candidates once. If the light model isn't in model_list, log a warning and disable routing for that agent gracefully. loop.go: - Add selectCandidates(agent, userMsg, history) helper. It calls Router.SelectModel and returns either agent.Candidates / agent.Model (primary tier) or agent.LightCandidates / light_model (light tier). Returns primary unchanged when routing is disabled. - In runLLMIteration, resolve (activeCandidates, activeModel) once before entering the tool-iteration loop. The model tier is sticky for the entire turn so a multi-step tool chain doesn't switch models mid-way. - Replace hard-coded agent.Candidates / agent.Model references in callLLM and the debug log with the resolved active values. The fallback chain and retry logic are untouched. When light_model returns an error the fallback chain handles escalation normally.

- classifier.go: s/honour/honor/ (American English per misspell) - router.go: break SelectModel signature across lines (golines) - router_test.go: break long Message literal (golines) - router_test.go: replace CJK string literal with rune slice so gosmopolitan does not flag the source file; behaviour is identical

nikolasdehor

Excellent PR. This is a well-designed, language-agnostic model routing system with solid engineering decisions throughout. Here is a detailed review:

Architecture:

Clean separation: Features (extraction) -> Classifier (scoring) -> Router (decision). The Classifier interface makes future ML-based implementations a drop-in swap.
Structural signals only (token count, code blocks, tool density, conversation depth, attachments) -- no keyword matching, works across all languages including CJK.
Per-turn sticky routing (selectCandidates called once per turn) prevents model switching mid-tool-chain. This is the right design.

Code quality:

34 tests covering all edge cases: zero features, hard gates, boundary conditions, custom thresholds, custom classifiers.
The newWithClassifier testing hook is a good pattern.
Token estimation using rune count / 3 is a reasonable CJK-safe approximation.
Score capped at 1.0 to honor the [0,1] contract even when multiple signals fire.
Graceful degradation: if light_model is not found in model_list, routing is silently disabled.
Fully opt-in: existing configs are unaffected.

Minor observations (none blocking):

hasAttachments checks for file extensions like .jpg/.png in message text. This could false-positive on messages like 'rename file.jpg to file.png' where no actual attachment exists. Since this is conservative (routes to heavy model), it is the right error direction.
The token estimation of rune_count/3 slightly over-estimates for pure ASCII text (tokens are ~4 chars for English). This is intentional -- over-estimating routes to heavy model, which is safer.
ConversationDepth uses len(history) which includes tool results and system messages. This is fine since it is a proxy for accumulated complexity, not a strict turn count.

The PR description is also one of the best I have seen -- clear problem statement, design rationale, scoring table, and explicit v2 deferral list. Ship it.

Main reformatted the fallback.Execute call to multi-line (golines); our branch renamed agent.Candidates → activeCandidates for routing. Kept both: multi-line formatting + routing variable.

CLAassistant · 2026-03-05T15:00:38Z

All committers have signed the CLA.

Upstream added ThinkingLevel, SummarizeMessageThreshold, SummarizeTokenPercent, MaxMediaSize, and maybeSummarize. Our branch added Router, LightCandidates, and selectCandidates. Both sets of changes are kept. Dead updateToolContexts removed (upstream deleted it; no callers exist).

mengzhuo

Please drop all .md files

is-Xiaoen

@mengzhuo ，Done — removed all accidentally committed files. Sorry about that.

mingmxren

Review focusing on two issues: CJK token estimation bias and routing observability.

mingmxren · 2026-03-06T04:57:39Z

pkg/routing/features.go

+// estimateTokens returns a conservative token count proxy.
+// Using rune count / 3 rather than / 4 because CJK characters each map to
+// roughly one token, while ASCII words average ~1.3 chars/token. Dividing
+// by 3 is a safe middle ground that slightly over-estimates for Latin text


CJK token estimation is significantly biased — undermines language-agnostic claim

rune_count / 3 underestimates CJK text by ~3x:

Text type Actual ratio Estimate (/3) Effect

ASCII ~0.25 tokens/char ~0.33 tokens/char Slight overestimate → conservative (OK)

CJK ~1 token/rune ~0.33 tokens/rune 3x underestimate → biased toward light model

Concrete example: A 200-character Chinese message ≈ 200 real tokens, but estimateTokens returns 66 — falling into the "medium" bucket (score 0.15) instead of "long" (score 0.35). This means complex CJK messages get routed to the light model when they shouldn't.

This directly contradicts the stated design goal:

any approach that looks at natural-language content breaks for non-English users. Structural features sidestep this entirely.

The token estimation is a structural feature, but the divisor is calibrated for Latin text only.

Suggestion: Detect the CJK rune ratio and adjust accordingly, e.g.:

func estimateTokens(msg string) int { total := utf8.RuneCountInString(msg) if total == 0 { return 0 } cjk := 0 for _, r := range msg { if r >= 0x2E80 && r <= 0x9FFF || r >= 0xF900 && r <= 0xFAFF { cjk++ } } // CJK runes ≈ 1 token each; non-CJK runes ≈ 0.25 tokens each return cjk + (total-cjk)/4 }

Or at minimum, use a smaller divisor (e.g., /2) as a compromise.

The test TestExtractFeatures_TokenEstimate_CJK currently asserts 9 runes / 3 = 3, which passes but validates the wrong behavior. A 9-character CJK message should estimate closer to 9 tokens, not 3.

mingmxren · 2026-03-06T04:57:39Z

pkg/agent/loop.go

+		return agent.Candidates, agent.Model
+	}
+
+	logger.InfoCF("agent", "Model routing: light model selected",


Routing decision log is missing the actual score

The log entry includes agent_id, light_model, and threshold but not the computed score. When debugging why a particular message was routed to the light (or heavy) model, the score is the single most important piece of information.

Also, there is no log line at all when the primary model is selected. For routing observability, it would be helpful to log both paths — at least at debug level for the primary-model case.

Suggestion: Have SelectModel return the score as a third value:

func (r *Router) SelectModel( msg string, history []providers.Message, primaryModel string, ) (model string, usedLight bool, score float64) { features := ExtractFeatures(msg, history) score = r.classifier.Score(features) if score < r.cfg.Threshold { return r.cfg.LightModel, true, score } return primaryModel, false, score }

Then include it in the log:

logger.InfoCF("agent", "Model routing: light model selected", map[string]any{ "agent_id": agent.ID, "light_model": agent.Router.LightModel(), "threshold": agent.Router.Threshold(), "score": score, // <-- critical for debugging })

This also avoids the current issue where selectCandidates discards the first return value from SelectModel (line 1195).

@mingmxren

…lity 1. CJK token estimation: replace flat rune_count/3 with script-aware counting — CJK runes (U+2E80–U+9FFF, U+F900–U+FAFF, U+AC00–U+D7AF) count as 1 token each, non-CJK runes at /4. This fixes a 3x underestimate for Chinese/Japanese/Korean text that could incorrectly route complex CJK messages to the light model. 2. Routing observability: SelectModel now returns the computed score as a third value. selectCandidates logs the score on both paths — Info level for light model selection, Debug level for primary model selection. 3. Added tests: TestExtractFeatures_TokenEstimate_Mixed (CJK+ASCII mix), TestRouter_SelectModel_ReturnsScore. Addresses review feedback from @mingmxren.

is-Xiaoen · 2026-03-06T05:15:20Z

@mingmxren Thanks for the thorough review — both points are solid and now addressed in b84adac:

CJK token estimation: Replaced the flat rune_count/3 with script-aware counting. CJK runes (U+2E80–U+9FFF, U+F900–U+FAFF, U+AC00–U+D7AF) now count as 1 token each, non-CJK at /4. A 200-char Chinese message correctly estimates to ~200 tokens instead of 66. Added TestExtractFeatures_TokenEstimate_Mixed for CJK+ASCII mix coverage.

Routing observability: SelectModel now returns the computed score as a third value. selectCandidates logs the score on both paths — Info for light, Debug for primary. No more discarded return values.

yinwm

Review Summary

This is a well-designed and high-quality PR. The architecture is clean, test coverage is comprehensive, and documentation is thorough.

What I verified:

CJK range coverage: The range 0x2E80-0x9FFF correctly covers Hiragana (U+3040-U+309F) and Katakana (U+30A0-U+30FF) - I initially misanalyzed this, but the implementation is correct.
Token estimation logic: CJK runes count as ~1 token, non-CJK as ~0.25 tokens - reasonable approximation.
Routing decision stickiness: Correctly maintains model selection throughout a turn to avoid mid-chain switches.
Graceful degradation: Light model resolution failure disables routing without crashing.

Minor suggestions (non-blocking):

Consider adding validation for threshold > 1.0 in addition to threshold <= 0
hasAttachments could have false positives (e.g., mentioning "photo.jpg" without an actual attachment), but this is acceptable as a conservative approach

Test Coverage

34 test cases covering all branches and edge cases
CJK mixed text estimation tests added

LGTM ✅

yinwm

LGTM

feat(routing): intelligent model routing based on structural complexity scoring

is-Xiaoen added 4 commits March 2, 2026 22:40

sipeed-bot bot added type: enhancement New feature or request domain: agent domain: config labels Mar 3, 2026

nikolasdehor approved these changes Mar 3, 2026

View reviewed changes

merge: resolve conflict with main in loop.go

1a922c9

Main reformatted the fallback.Execute call to multi-line (golines); our branch renamed agent.Candidates → activeCandidates for routing. Kept both: multi-line formatting + routing variable.

This was referenced Mar 3, 2026

🦞 OpenClaw 生态日报 2026-03-03 duanyytop/agents-radar#51

Closed

🦞 OpenClaw 生态日报 2026-03-03 duanyytop/agents-radar#56

Closed

🦞 OpenClaw 生态日报 2026-03-03 duanyytop/agents-radar#61

Closed

mengzhuo requested changes Mar 6, 2026

View reviewed changes

chore: remove accidentally committed local files

04ddb6b

is-Xiaoen commented Mar 6, 2026

View reviewed changes

mingmxren reviewed Mar 6, 2026

View reviewed changes

yinwm approved these changes Mar 6, 2026

View reviewed changes

mengzhuo approved these changes Mar 6, 2026

View reviewed changes

mengzhuo merged commit 9b1e73d into sipeed:main Mar 6, 2026
4 checks passed

dj-oyu pushed a commit to dj-oyu/picoclaw that referenced this pull request Mar 8, 2026

Merge pull request sipeed#994 from is-Xiaoen/feat/model-routing

7478537

feat(routing): intelligent model routing based on structural complexity scoring

fishtrees pushed a commit to fishtrees/picoclaw that referenced this pull request Mar 12, 2026

Merge pull request sipeed#994 from is-Xiaoen/feat/model-routing

7fc7b79

feat(routing): intelligent model routing based on structural complexity scoring

dj-oyu pushed a commit to dj-oyu/picoclaw that referenced this pull request Mar 14, 2026

Merge pull request sipeed#994 from is-Xiaoen/feat/model-routing

23b15a1

feat(routing): intelligent model routing based on structural complexity scoring

ashleydb mentioned this pull request Mar 22, 2026

[BUG] #1898

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(routing): intelligent model routing based on structural complexity scoring#994

feat(routing): intelligent model routing based on structural complexity scoring#994
mengzhuo merged 8 commits intosipeed:mainfrom
is-Xiaoen:feat/model-routing

is-Xiaoen commented Mar 2, 2026

Uh oh!

nikolasdehor left a comment

Uh oh!

CLAassistant commented Mar 5, 2026 •

edited

Loading

Uh oh!

mengzhuo left a comment

Uh oh!

is-Xiaoen left a comment •

edited

Loading

Uh oh!

mingmxren left a comment

Uh oh!

mingmxren Mar 6, 2026

Uh oh!

mingmxren Mar 6, 2026

Uh oh!

is-Xiaoen commented Mar 6, 2026

Uh oh!

yinwm left a comment

Uh oh!

yinwm left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Text type	Actual ratio	Estimate (`/3`)	Effect
ASCII	~0.25 tokens/char	~0.33 tokens/char	Slight overestimate → conservative (OK)
CJK	~1 token/rune	~0.33 tokens/rune	3x underestimate → biased toward light model

Conversation

is-Xiaoen commented Mar 2, 2026

What this does

Why structural scoring

Routing logic

Config

Files changed

What's intentionally left for v2

Uh oh!

nikolasdehor left a comment

Choose a reason for hiding this comment

Uh oh!

CLAassistant commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mengzhuo left a comment

Choose a reason for hiding this comment

Uh oh!

is-Xiaoen left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mingmxren left a comment

Choose a reason for hiding this comment

Uh oh!

mingmxren Mar 6, 2026

Choose a reason for hiding this comment

CJK token estimation is significantly biased — undermines language-agnostic claim

Uh oh!

mingmxren Mar 6, 2026

Choose a reason for hiding this comment

Routing decision log is missing the actual score

Uh oh!

is-Xiaoen commented Mar 6, 2026

Uh oh!

yinwm left a comment

Choose a reason for hiding this comment

Review Summary

What I verified:

Minor suggestions (non-blocking):

Test Coverage

Uh oh!

yinwm left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

CLAassistant commented Mar 5, 2026 •

edited

Loading

is-Xiaoen left a comment •

edited

Loading