feat(routing): intelligent model routing based on structural complexity scoring#994
Conversation
Introduce RoutingConfig with three fields:
- enabled: activates per-turn model routing
- light_model: references a model_name in model_list
- threshold: complexity score cutoff in [0,1]
When routing.enabled is true and the incoming message scores below
threshold, the agent switches to light_model for that turn. Absent or
disabled config leaves existing behaviour completely unchanged.
Example:
"agents": {
"defaults": {
"model": "claude-sonnet-4-6",
"routing": {
"enabled": true,
"light_model": "gemini-flash",
"threshold": 0.35
}
}
}
Add three new files to pkg/routing/:
features.go — ExtractFeatures(msg, history) → Features
Computes five structural dimensions with zero keyword matching:
- TokenEstimate: rune_count/3 (CJK-safe token proxy)
- CodeBlockCount: ``` pairs in the message
- RecentToolCalls: tool call count in the last 6 history entries
- ConversationDepth: total messages in session
- HasAttachments: data URIs or media file extensions
classifier.go — Classifier interface + RuleClassifier
RuleClassifier uses a weighted sum that is capped at 1.0:
code block → +0.40 (triggers heavy model alone at 0.35 threshold)
token > 200 → +0.35 (triggers heavy model alone)
tool calls > 3 → +0.25
token 50-200 → +0.15
conversation depth > 10 → +0.10
attachment → 1.00 (hard gate, always heavy)
router.go — Router wraps config + Classifier
Router.SelectModel(msg, history, primaryModel) returns either the
configured light_model or the primary model depending on whether
the complexity score clears the threshold. Threshold defaults to
0.35 when zero/negative to prevent misconfiguration.
router_test.go — 34 tests covering all branches and edge cases
instance.go:
- Add Router *routing.Router and LightCandidates []FallbackCandidate
to AgentInstance.
- At agent creation, when routing.enabled and light_model resolves
successfully in model_list, pre-build the Router and resolve the
light model candidates once. If the light model isn't in model_list,
log a warning and disable routing for that agent gracefully.
loop.go:
- Add selectCandidates(agent, userMsg, history) helper.
It calls Router.SelectModel and returns either agent.Candidates /
agent.Model (primary tier) or agent.LightCandidates / light_model
(light tier). Returns primary unchanged when routing is disabled.
- In runLLMIteration, resolve (activeCandidates, activeModel) once
before entering the tool-iteration loop. The model tier is sticky
for the entire turn so a multi-step tool chain doesn't switch
models mid-way.
- Replace hard-coded agent.Candidates / agent.Model references in
callLLM and the debug log with the resolved active values.
The fallback chain and retry logic are untouched. When light_model
returns an error the fallback chain handles escalation normally.
- classifier.go: s/honour/honor/ (American English per misspell) - router.go: break SelectModel signature across lines (golines) - router_test.go: break long Message literal (golines) - router_test.go: replace CJK string literal with rune slice so gosmopolitan does not flag the source file; behaviour is identical
nikolasdehor
left a comment
There was a problem hiding this comment.
Excellent PR. This is a well-designed, language-agnostic model routing system with solid engineering decisions throughout. Here is a detailed review:
Architecture:
- Clean separation: Features (extraction) -> Classifier (scoring) -> Router (decision). The Classifier interface makes future ML-based implementations a drop-in swap.
- Structural signals only (token count, code blocks, tool density, conversation depth, attachments) -- no keyword matching, works across all languages including CJK.
- Per-turn sticky routing (selectCandidates called once per turn) prevents model switching mid-tool-chain. This is the right design.
Code quality:
- 34 tests covering all edge cases: zero features, hard gates, boundary conditions, custom thresholds, custom classifiers.
- The newWithClassifier testing hook is a good pattern.
- Token estimation using rune count / 3 is a reasonable CJK-safe approximation.
- Score capped at 1.0 to honor the [0,1] contract even when multiple signals fire.
- Graceful degradation: if light_model is not found in model_list, routing is silently disabled.
- Fully opt-in: existing configs are unaffected.
Minor observations (none blocking):
- hasAttachments checks for file extensions like .jpg/.png in message text. This could false-positive on messages like 'rename file.jpg to file.png' where no actual attachment exists. Since this is conservative (routes to heavy model), it is the right error direction.
- The token estimation of rune_count/3 slightly over-estimates for pure ASCII text (tokens are ~4 chars for English). This is intentional -- over-estimating routes to heavy model, which is safer.
- ConversationDepth uses len(history) which includes tool results and system messages. This is fine since it is a proxy for accumulated complexity, not a strict turn count.
The PR description is also one of the best I have seen -- clear problem statement, design rationale, scoring table, and explicit v2 deferral list. Ship it.
Main reformatted the fallback.Execute call to multi-line (golines); our branch renamed agent.Candidates → activeCandidates for routing. Kept both: multi-line formatting + routing variable.
Upstream added ThinkingLevel, SummarizeMessageThreshold, SummarizeTokenPercent, MaxMediaSize, and maybeSummarize. Our branch added Router, LightCandidates, and selectCandidates. Both sets of changes are kept. Dead updateToolContexts removed (upstream deleted it; no callers exist).
mengzhuo
left a comment
There was a problem hiding this comment.
Please drop all .md files
mingmxren
left a comment
There was a problem hiding this comment.
Review focusing on two issues: CJK token estimation bias and routing observability.
pkg/routing/features.go
Outdated
| // estimateTokens returns a conservative token count proxy. | ||
| // Using rune count / 3 rather than / 4 because CJK characters each map to | ||
| // roughly one token, while ASCII words average ~1.3 chars/token. Dividing | ||
| // by 3 is a safe middle ground that slightly over-estimates for Latin text |
There was a problem hiding this comment.
CJK token estimation is significantly biased — undermines language-agnostic claim
rune_count / 3 underestimates CJK text by ~3x:
| Text type | Actual ratio | Estimate (/3) |
Effect |
|---|---|---|---|
| ASCII | ~0.25 tokens/char | ~0.33 tokens/char | Slight overestimate → conservative (OK) |
| CJK | ~1 token/rune | ~0.33 tokens/rune | 3x underestimate → biased toward light model |
Concrete example: A 200-character Chinese message ≈ 200 real tokens, but estimateTokens returns 66 — falling into the "medium" bucket (score 0.15) instead of "long" (score 0.35). This means complex CJK messages get routed to the light model when they shouldn't.
This directly contradicts the stated design goal:
any approach that looks at natural-language content breaks for non-English users. Structural features sidestep this entirely.
The token estimation is a structural feature, but the divisor is calibrated for Latin text only.
Suggestion: Detect the CJK rune ratio and adjust accordingly, e.g.:
func estimateTokens(msg string) int {
total := utf8.RuneCountInString(msg)
if total == 0 {
return 0
}
cjk := 0
for _, r := range msg {
if r >= 0x2E80 && r <= 0x9FFF || r >= 0xF900 && r <= 0xFAFF {
cjk++
}
}
// CJK runes ≈ 1 token each; non-CJK runes ≈ 0.25 tokens each
return cjk + (total-cjk)/4
}Or at minimum, use a smaller divisor (e.g., /2) as a compromise.
The test TestExtractFeatures_TokenEstimate_CJK currently asserts 9 runes / 3 = 3, which passes but validates the wrong behavior. A 9-character CJK message should estimate closer to 9 tokens, not 3.
| return agent.Candidates, agent.Model | ||
| } | ||
|
|
||
| logger.InfoCF("agent", "Model routing: light model selected", |
There was a problem hiding this comment.
Routing decision log is missing the actual score
The log entry includes agent_id, light_model, and threshold but not the computed score. When debugging why a particular message was routed to the light (or heavy) model, the score is the single most important piece of information.
Also, there is no log line at all when the primary model is selected. For routing observability, it would be helpful to log both paths — at least at debug level for the primary-model case.
Suggestion: Have SelectModel return the score as a third value:
func (r *Router) SelectModel(
msg string,
history []providers.Message,
primaryModel string,
) (model string, usedLight bool, score float64) {
features := ExtractFeatures(msg, history)
score = r.classifier.Score(features)
if score < r.cfg.Threshold {
return r.cfg.LightModel, true, score
}
return primaryModel, false, score
}Then include it in the log:
logger.InfoCF("agent", "Model routing: light model selected",
map[string]any{
"agent_id": agent.ID,
"light_model": agent.Router.LightModel(),
"threshold": agent.Router.Threshold(),
"score": score, // <-- critical for debugging
})This also avoids the current issue where selectCandidates discards the first return value from SelectModel (line 1195).
…lity 1. CJK token estimation: replace flat rune_count/3 with script-aware counting — CJK runes (U+2E80–U+9FFF, U+F900–U+FAFF, U+AC00–U+D7AF) count as 1 token each, non-CJK runes at /4. This fixes a 3x underestimate for Chinese/Japanese/Korean text that could incorrectly route complex CJK messages to the light model. 2. Routing observability: SelectModel now returns the computed score as a third value. selectCandidates logs the score on both paths — Info level for light model selection, Debug level for primary model selection. 3. Added tests: TestExtractFeatures_TokenEstimate_Mixed (CJK+ASCII mix), TestRouter_SelectModel_ReturnsScore. Addresses review feedback from @mingmxren.
|
@mingmxren Thanks for the thorough review — both points are solid and now addressed in b84adac: CJK token estimation: Replaced the flat Routing observability: |
yinwm
left a comment
There was a problem hiding this comment.
Review Summary
This is a well-designed and high-quality PR. The architecture is clean, test coverage is comprehensive, and documentation is thorough.
What I verified:
- CJK range coverage: The range
0x2E80-0x9FFFcorrectly covers Hiragana (U+3040-U+309F) and Katakana (U+30A0-U+30FF) - I initially misanalyzed this, but the implementation is correct. - Token estimation logic: CJK runes count as ~1 token, non-CJK as ~0.25 tokens - reasonable approximation.
- Routing decision stickiness: Correctly maintains model selection throughout a turn to avoid mid-chain switches.
- Graceful degradation: Light model resolution failure disables routing without crashing.
Minor suggestions (non-blocking):
- Consider adding validation for
threshold > 1.0in addition tothreshold <= 0 hasAttachmentscould have false positives (e.g., mentioning "photo.jpg" without an actual attachment), but this is acceptable as a conservative approach
Test Coverage
- 34 test cases covering all branches and edge cases
- CJK mixed text estimation tests added
LGTM ✅
feat(routing): intelligent model routing based on structural complexity scoring
feat(routing): intelligent model routing based on structural complexity scoring
feat(routing): intelligent model routing based on structural complexity scoring
Closes / relates to #295.
What this does
Adds a lightweight model routing layer that dispatches each incoming message to either a light model (fast, cheap) or the configured primary model, based entirely on structural signals extracted from the message and its session context. No keyword matching, no NLP, no external calls — just properties of the message's shape.
The routing decision is made once at the start of each conversation turn and stays fixed for all tool-follow-up iterations within that turn, so a multi-step tool chain never switches models mid-way.
Why structural scoring
As raised in the issue discussion, any approach that looks at natural-language content (keywords, patterns) breaks for non-English users. Structural features sidestep this entirely:
rune_count / 3)```pairs)The scorer is exposed behind a
Classifierinterface, so a future ML-based implementation only needs to implementScore(Features) float64— feature extraction stays unchanged.Routing logic
Default threshold is 0.35. At this value:
"hi"→ 0.00 → light ✓Config
Fully opt-in. Existing configs are completely unaffected.
{ "agents": { "defaults": { "model": "claude-sonnet-4-6", "routing": { "enabled": true, "light_model": "gemini-flash", "threshold": 0.35 } } } }light_modelreferences amodel_nameinmodel_list, so it works with any provider. If the name isn't found at startup, routing is silently disabled and the primary model is used — no crash, no config error.Files changed
pkg/routing/features.goFeaturesstruct +ExtractFeatures()pkg/routing/classifier.goClassifierinterface +RuleClassifierpkg/routing/router.goRouter.SelectModel()pkg/routing/router_test.gopkg/config/config.goRoutingConfigadded toAgentDefaultspkg/agent/instance.goRouter+LightCandidatesat agent creationpkg/agent/loop.goselectCandidates()helper + active candidate wiring773 lines added, 33 modified, 0 new dependencies.
What's intentionally left for v2
Classifierinterface supports a drop-in swap)defaultsonly)/modelcommand override to force a specific tier for one turn