Skip to content

feat: context management for context overflow#24

Open
AliRehman1279 wants to merge 48 commits intomainfrom
ali/context-management
Open

feat: context management for context overflow#24
AliRehman1279 wants to merge 48 commits intomainfrom
ali/context-management

Conversation

@AliRehman1279
Copy link
Contributor

@AliRehman1279 AliRehman1279 commented Nov 17, 2025

PR implements context management to handle context window overflow. Two layers of defense:

  1. Tool result manager (handles midstream overflow):
    Prevents individual tool responses from overwhelming the context - this will only be triggered for unexpected bloated responses from tools which can cause immediate context overflow.
    Th tool result manager will now return a placeholder error msg instead of doing truncation.

  2. Context Manager:
    Context management using direct API summarization.

    1. Token Budget Management
      Accounts for system prompt, message history, and overhead
      Triggers summarization before hitting limits
    2. Split Points
      preserves tool call/result pairs
      Prevents splitting between assistant messages and their tool responses
      Avoids broken conversation context that causes unpredictable behavior

@cloudflare-workers-and-pages
Copy link

cloudflare-workers-and-pages bot commented Nov 17, 2025

Deploying with  Cloudflare Workers  Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status Name Latest Commit Preview URL Updated (UTC)
✅ Deployment successful!
View logs
ai 69b9b6f Commit Preview URL

Branch Preview URL
Dec 02 2025, 06:19 PM

Copy link
Collaborator

@mujahidkay mujahidkay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would like the v5 bump PR to merge first which will require some modification in ymax/route.ts and context-manager.ts (at least)

@AliRehman1279 AliRehman1279 changed the title Ali/context management Feat: Context Management Nov 17, 2025
@AliRehman1279 AliRehman1279 changed the title Feat: Context Management feat: context management Nov 17, 2025
@socket-security
Copy link

socket-security bot commented Nov 19, 2025

No dependency changes detected. Learn more about Socket for GitHub.

👍 No dependency changes detected in pull request

Copy link
Contributor

@Muneeb147 Muneeb147 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AliRehman1279 As discussed, let's remove truncation of content as this might give us some miss-information (especially in financing world). I think that showing error will be much better instead of showing miss-information (or partial)..

The summarisation from LLM sounds right..

Thoughts? @toliaqat @mujahidkay

@AliRehman1279 AliRehman1279 changed the title feat: context management feat: context management for context overflow Nov 27, 2025
Comment on lines +273 to +277
// console.log("messages", messages);
// console.log(
// "parts",
// messages.map((m) => m.parts.map((p) => p)),
// );
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep it as it helps for debugging. I remember a previous case where this log helped on CF worker.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reverted the commented out logs

Comment on lines +423 to +424
maxTokens: 70_000,
keepRecentMessages: 8,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use constants.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

constants added

totalTokens: usage?.totalTokens,
});

if (usage?.totalTokens && usage.totalTokens > 80_000) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

80_000 as constant?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

constants added

Comment on lines +253 to +267
const wrappedTools: Record<string, any> = {};
for (const [toolName, tool] of Object.entries(mcptools)) {
const originalTool = tool as any;

wrappedTools[toolName] = {
...originalTool,
execute: wrapToolExecution(
toolName,
async (args: any, options: any) =>
originalTool.execute(args, options),
),
};
}

tools = { ...tools, ...wrappedTools };
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's try to avoid any whenever possible. Proper typing helps a lot.

import { TOKEN_CONFIG } from './token-config';

export function estimateTokens(content: string | ModelMessage[]): number {
if (typeof content === 'string') return Math.ceil(content.length / 3.5);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

magic number? What does 3.5 represent?

totalChars += msgStr.length;

//extra weight for tool invocations (they have significant overhead)
if ((msg as any).toolInvocations?.length) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's avoid any (applicable to all other instances in this PR)

Comment on lines +17 to +21
totalChars += toolCount * 50; // Approximate overhead per tool call
}
}

return Math.ceil(totalChars / 3.5);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These number should be consts with names that show what they are for. For example, like the comment suggests:

totalChars += toolCount * TOOL_CALL_OVERHEAD

same for the 3.5 thing

Comment on lines +31 to +33
export const DEFAULT_CONTEXT_CONFIG: Required<
Omit<ContextManagerConfig, 'contextEditConfig' | 'systemPrompt'>
> = {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a bit confused here. ContextManagerConfig doesn't even have a contextEditConfig property that this is supposedly omitting.

Also, I'm not sure why all properties of ContextManagerConfig are optional. Do they need to be? If not, let's forgo the optional ? and that way this type simplifies to:

Suggested change
export const DEFAULT_CONTEXT_CONFIG: Required<
Omit<ContextManagerConfig, 'contextEditConfig' | 'systemPrompt'>
> = {
export const DEFAULT_CONTEXT_CONFIG: Omit<
ContextManagerConfig, 'systemPrompt'>
> = {

CONTEXT_WARNING_THRESHOLD: 0.9,
HIGH_USAGE_THRESHOLD: 80_000,
KEEP_RECENT_MESSAGES: 8,
SUMMARY_MAX_OUTPUT_TOKENS: 2000,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also needs to be adjusted. 150_000 - 180_000 worth of tokens need around 10-20k tokens for effective summarization, otherwise we risk losing important information

Comment on lines +17 to +19
return (
Math.round((currentTokens / TOKEN_CONFIG.MAX_CONTEXT_TOKENS) * 1000) / 10
);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

handy, but thoughts on:

((currentTokens / TOKEN_CONFIG.MAX_CONTEXT_TOKENS) * 100).toFixed(2);

We are only using it for logging purposes so the number -> string change doesn't matter.

Comment on lines +16 to +20
maxChars?: number;
bypassThreshold?: number;
sliceRatioHead?: number;
sliceRatioMiddle?: number;
sliceRatioTail?: number;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same question, why are all of them optional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

truncation removed - not valid now

Comment on lines +25 to +26
// Reserve space for conversation + prompts + responses (~100k tokens)
// Tool results should stay under ~100k tokens (≈300k chars) to be safe
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a 150-50 split makes more sense or a 130-70 split

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed truncation

Comment on lines +28 to +34
const DEFAULT_CONFIG: Required<ToolResultConfig> = {
maxChars: 200_000, // Truncate to this size if content exceeds threshold (~65k tokens)
bypassThreshold: 300_000, // Only activate truncation for responses > 300k chars (~100k tokens)
sliceRatioHead: 0.5, // Prioritize beginning (schema, initial data)
sliceRatioMiddle: 0.1, // Small middle sample for pattern detection
sliceRatioTail: 0.4, // End often has summary/totals
};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I've not taken a deeper look into the rest of this file but my opinion is that we can just stop catering tool results that exceed a given threshold. Dynamically truncated (missing, partially incorrect) information in such a manner from external data sources is worse than having no information at all.

If a tool call result exceeds X tokens, we can just have it return an error message.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ill update it to intercept and return only a meaningful error msg for LLM

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree as well

@agoric-labs agoric-labs deleted a comment from AliRehmanOTO Dec 2, 2025
@agoric-labs agoric-labs deleted a comment from AliRehmanOTO Dec 2, 2025
@agoric-labs agoric-labs deleted a comment from AliRehmanOTO Dec 2, 2025
Copy link
Contributor

@Muneeb147 Muneeb147 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we have some new commits to address review comments. I'll do a final review pass.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants