-
Notifications
You must be signed in to change notification settings - Fork 3.6k
feat(routing): intelligent model routing based on structural complexity scoring #994
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
8 commits
Select commit
Hold shift + click to select a range
c5a21b2
feat(config): add RoutingConfig to AgentDefaults
is-Xiaoen 1943c3e
feat(routing): add language-agnostic model complexity scorer
is-Xiaoen 02e8192
feat(agent): wire model routing into the agent loop
is-Xiaoen 09e68cb
fix(routing): resolve golines, gosmopolitan and misspell lint failures
is-Xiaoen 1a922c9
merge: resolve conflict with main in loop.go
is-Xiaoen e433bb8
merge: resolve conflicts with upstream/main
is-Xiaoen 04ddb6b
chore: remove accidentally committed local files
is-Xiaoen b84adac
fix(routing): address review feedback on CJK estimation and observabi…
is-Xiaoen File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,80 @@ | ||
| package routing | ||
|
|
||
| // Classifier evaluates a feature set and returns a complexity score in [0, 1]. | ||
| // A higher score indicates a more complex task that benefits from a heavy model. | ||
| // The score is compared against the configured threshold: score >= threshold selects | ||
| // the primary (heavy) model; score < threshold selects the light model. | ||
| // | ||
| // Classifier is an interface so that future implementations (ML-based, embedding-based, | ||
| // or any other approach) can be swapped in without changing routing infrastructure. | ||
| type Classifier interface { | ||
| Score(f Features) float64 | ||
| } | ||
|
|
||
| // RuleClassifier is the v1 implementation. | ||
| // It uses a weighted sum of structural signals with no external dependencies, | ||
| // no API calls, and sub-microsecond latency. The raw sum is capped at 1.0 so | ||
| // that the returned score always falls within the [0, 1] contract. | ||
| // | ||
| // Individual weights (multiple signals can fire simultaneously): | ||
| // | ||
| // token > 200 (≈600 chars): 0.35 — very long prompts are almost always complex | ||
| // token 50-200: 0.15 — medium length; may or may not be complex | ||
| // code block present: 0.40 — coding tasks need the heavy model | ||
| // tool calls > 3 (recent): 0.25 — dense tool usage signals an agentic workflow | ||
| // tool calls 1-3 (recent): 0.10 — some tool activity | ||
| // conversation depth > 10: 0.10 — long sessions carry implicit complexity | ||
| // attachments present: 1.00 — hard gate; multi-modal always needs heavy model | ||
| // | ||
| // Default threshold is 0.35, so: | ||
| // - Pure greetings / trivial Q&A: 0.00 → light ✓ | ||
| // - Medium prose message (50–200 tokens): 0.15 → light ✓ | ||
| // - Message with code block: 0.40 → heavy ✓ | ||
| // - Long message (>200 tokens): 0.35 → heavy ✓ | ||
| // - Active tool session + medium message: 0.25 → light (acceptable) | ||
| // - Any message with an image/audio attachment: 1.00 → heavy ✓ | ||
| type RuleClassifier struct{} | ||
|
|
||
| // Score computes the complexity score for the given feature set. | ||
| // The returned value is in [0, 1]. Attachments short-circuit to 1.0. | ||
| func (c *RuleClassifier) Score(f Features) float64 { | ||
| // Hard gate: multi-modal inputs always require the heavy model. | ||
| if f.HasAttachments { | ||
| return 1.0 | ||
| } | ||
|
|
||
| var score float64 | ||
|
|
||
| // Token estimate — primary verbosity signal | ||
| switch { | ||
| case f.TokenEstimate > 200: | ||
| score += 0.35 | ||
| case f.TokenEstimate > 50: | ||
| score += 0.15 | ||
| } | ||
|
|
||
| // Fenced code blocks — strongest indicator of a coding/technical task | ||
| if f.CodeBlockCount > 0 { | ||
| score += 0.40 | ||
| } | ||
|
|
||
| // Recent tool call density — indicates an ongoing agentic workflow | ||
| switch { | ||
| case f.RecentToolCalls > 3: | ||
| score += 0.25 | ||
| case f.RecentToolCalls > 0: | ||
| score += 0.10 | ||
| } | ||
|
|
||
| // Conversation depth — accumulated context implies compound task | ||
| if f.ConversationDepth > 10 { | ||
| score += 0.10 | ||
| } | ||
|
|
||
| // Cap at 1.0 to honor the [0, 1] contract even when multiple signals fire | ||
| // simultaneously (e.g., long message + code block + tool chain = 1.10 raw). | ||
| if score > 1.0 { | ||
| score = 1.0 | ||
| } | ||
| return score | ||
| } |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Routing decision log is missing the actual score
The log entry includes
agent_id,light_model, andthresholdbut not the computed score. When debugging why a particular message was routed to the light (or heavy) model, the score is the single most important piece of information.Also, there is no log line at all when the primary model is selected. For routing observability, it would be helpful to log both paths — at least at debug level for the primary-model case.
Suggestion: Have
SelectModelreturn the score as a third value:Then include it in the log:
This also avoids the current issue where
selectCandidatesdiscards the first return value fromSelectModel(line 1195).