feat: context management for context overflow#24
feat: context management for context overflow#24AliRehman1279 wants to merge 48 commits intomainfrom
Conversation
Deploying with
|
| Status | Name | Latest Commit | Preview URL | Updated (UTC) |
|---|---|---|---|---|
| ✅ Deployment successful! View logs |
ai | 69b9b6f | Commit Preview URL Branch Preview URL |
Dec 02 2025, 06:19 PM |
mujahidkay
left a comment
There was a problem hiding this comment.
i would like the v5 bump PR to merge first which will require some modification in ymax/route.ts and context-manager.ts (at least)
|
No dependency changes detected. Learn more about Socket for GitHub. 👍 No dependency changes detected in pull request |
…into ali/context-management
Muneeb147
left a comment
There was a problem hiding this comment.
@AliRehman1279 As discussed, let's remove truncation of content as this might give us some miss-information (especially in financing world). I think that showing error will be much better instead of showing miss-information (or partial)..
The summarisation from LLM sounds right..
Thoughts? @toliaqat @mujahidkay
…into ali/context-management
…into ali/context-management
app/api/ymax/route.ts
Outdated
| // console.log("messages", messages); | ||
| // console.log( | ||
| // "parts", | ||
| // messages.map((m) => m.parts.map((p) => p)), | ||
| // ); |
There was a problem hiding this comment.
Let's keep it as it helps for debugging. I remember a previous case where this log helped on CF worker.
There was a problem hiding this comment.
reverted the commented out logs
app/api/ymax/route.ts
Outdated
| maxTokens: 70_000, | ||
| keepRecentMessages: 8, |
There was a problem hiding this comment.
constants added
app/api/ymax/route.ts
Outdated
| totalTokens: usage?.totalTokens, | ||
| }); | ||
|
|
||
| if (usage?.totalTokens && usage.totalTokens > 80_000) { |
There was a problem hiding this comment.
constants added
app/api/ymax/route.ts
Outdated
| const wrappedTools: Record<string, any> = {}; | ||
| for (const [toolName, tool] of Object.entries(mcptools)) { | ||
| const originalTool = tool as any; | ||
|
|
||
| wrappedTools[toolName] = { | ||
| ...originalTool, | ||
| execute: wrapToolExecution( | ||
| toolName, | ||
| async (args: any, options: any) => | ||
| originalTool.execute(args, options), | ||
| ), | ||
| }; | ||
| } | ||
|
|
||
| tools = { ...tools, ...wrappedTools }; |
There was a problem hiding this comment.
Let's try to avoid any whenever possible. Proper typing helps a lot.
lib/context-manager.ts
Outdated
| import { TOKEN_CONFIG } from './token-config'; | ||
|
|
||
| export function estimateTokens(content: string | ModelMessage[]): number { | ||
| if (typeof content === 'string') return Math.ceil(content.length / 3.5); |
There was a problem hiding this comment.
magic number? What does 3.5 represent?
lib/context-manager.ts
Outdated
| totalChars += msgStr.length; | ||
|
|
||
| //extra weight for tool invocations (they have significant overhead) | ||
| if ((msg as any).toolInvocations?.length) { |
There was a problem hiding this comment.
let's avoid any (applicable to all other instances in this PR)
lib/context-manager.ts
Outdated
| totalChars += toolCount * 50; // Approximate overhead per tool call | ||
| } | ||
| } | ||
|
|
||
| return Math.ceil(totalChars / 3.5); |
There was a problem hiding this comment.
These number should be consts with names that show what they are for. For example, like the comment suggests:
totalChars += toolCount * TOOL_CALL_OVERHEAD
same for the 3.5 thing
| export const DEFAULT_CONTEXT_CONFIG: Required< | ||
| Omit<ContextManagerConfig, 'contextEditConfig' | 'systemPrompt'> | ||
| > = { |
There was a problem hiding this comment.
a bit confused here. ContextManagerConfig doesn't even have a contextEditConfig property that this is supposedly omitting.
Also, I'm not sure why all properties of ContextManagerConfig are optional. Do they need to be? If not, let's forgo the optional ? and that way this type simplifies to:
| export const DEFAULT_CONTEXT_CONFIG: Required< | |
| Omit<ContextManagerConfig, 'contextEditConfig' | 'systemPrompt'> | |
| > = { | |
| export const DEFAULT_CONTEXT_CONFIG: Omit< | |
| ContextManagerConfig, 'systemPrompt'> | |
| > = { |
lib/token-config.ts
Outdated
| CONTEXT_WARNING_THRESHOLD: 0.9, | ||
| HIGH_USAGE_THRESHOLD: 80_000, | ||
| KEEP_RECENT_MESSAGES: 8, | ||
| SUMMARY_MAX_OUTPUT_TOKENS: 2000, |
There was a problem hiding this comment.
This also needs to be adjusted. 150_000 - 180_000 worth of tokens need around 10-20k tokens for effective summarization, otherwise we risk losing important information
lib/token-config.ts
Outdated
| return ( | ||
| Math.round((currentTokens / TOKEN_CONFIG.MAX_CONTEXT_TOKENS) * 1000) / 10 | ||
| ); |
There was a problem hiding this comment.
handy, but thoughts on:
((currentTokens / TOKEN_CONFIG.MAX_CONTEXT_TOKENS) * 100).toFixed(2);
We are only using it for logging purposes so the number -> string change doesn't matter.
lib/tool-result-manager.ts
Outdated
| maxChars?: number; | ||
| bypassThreshold?: number; | ||
| sliceRatioHead?: number; | ||
| sliceRatioMiddle?: number; | ||
| sliceRatioTail?: number; |
There was a problem hiding this comment.
same question, why are all of them optional?
There was a problem hiding this comment.
truncation removed - not valid now
lib/tool-result-manager.ts
Outdated
| // Reserve space for conversation + prompts + responses (~100k tokens) | ||
| // Tool results should stay under ~100k tokens (≈300k chars) to be safe |
There was a problem hiding this comment.
I think a 150-50 split makes more sense or a 130-70 split
There was a problem hiding this comment.
removed truncation
lib/tool-result-manager.ts
Outdated
| const DEFAULT_CONFIG: Required<ToolResultConfig> = { | ||
| maxChars: 200_000, // Truncate to this size if content exceeds threshold (~65k tokens) | ||
| bypassThreshold: 300_000, // Only activate truncation for responses > 300k chars (~100k tokens) | ||
| sliceRatioHead: 0.5, // Prioritize beginning (schema, initial data) | ||
| sliceRatioMiddle: 0.1, // Small middle sample for pattern detection | ||
| sliceRatioTail: 0.4, // End often has summary/totals | ||
| }; |
There was a problem hiding this comment.
Okay, I've not taken a deeper look into the rest of this file but my opinion is that we can just stop catering tool results that exceed a given threshold. Dynamically truncated (missing, partially incorrect) information in such a manner from external data sources is worse than having no information at all.
If a tool call result exceeds X tokens, we can just have it return an error message.
There was a problem hiding this comment.
Ill update it to intercept and return only a meaningful error msg for LLM
Muneeb147
left a comment
There was a problem hiding this comment.
As we have some new commits to address review comments. I'll do a final review pass.
PR implements context management to handle context window overflow. Two layers of defense:
Tool result manager (handles midstream overflow):
Prevents individual tool responses from overwhelming the context - this will only be triggered for unexpected bloated responses from tools which can cause immediate context overflow.
Th tool result manager will now return a placeholder error msg instead of doing truncation.
Context Manager:
Context management using direct API summarization.
Accounts for system prompt, message history, and overhead
Triggers summarization before hitting limits
preserves tool call/result pairs
Prevents splitting between assistant messages and their tool responses
Avoids broken conversation context that causes unpredictable behavior