Skip to content

Add X-Max-Output-Tokens header for output truncation#1234

Open
Ratnaditya-J wants to merge 1 commit intojina-ai:mainfrom
Ratnaditya-J:feat/max-output-tokens
Open

Add X-Max-Output-Tokens header for output truncation#1234
Ratnaditya-J wants to merge 1 commit intojina-ai:mainfrom
Ratnaditya-J:feat/max-output-tokens

Conversation

@Ratnaditya-J
Copy link

Closes #1228

This adds a new X-Max-Output-Tokens header (and corresponding maxOutputTokens body parameter) that lets callers limit how many tokens the response content contains. The content gets truncated to fit within the specified limit before being returned.

This is different from X-Token-Budget, which rejects the entire request if the cost exceeds the budget. X-Max-Output-Tokens returns partial content instead, which is what the issue author was asking for -- retrieving just the first N tokens of a page without hitting budget errors or getting the full response.

How it works:

  • New header X-Max-Output-Tokens parsed in CrawlerOptions.from()
  • Value passed through threadLocal to SnapshotFormatter
  • truncateToTokenLimit() uses the existing countGPTToken function to estimate a cut point proportional to the char/token ratio, then verifies and adjusts to stay within the limit
  • Applied to both the early return path (text/html/lm modes) and the main markdown/content path

Adds a new parameter that truncates the extracted content to fit within
a specified token budget. Unlike X-Token-Budget (which rejects requests
that exceed the budget), this parameter returns partial content up to
the limit, useful for workflows that only need the initial portion of
a page.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature request: parameter max_output_tokens

1 participant