Add X-Max-Output-Tokens header for output truncation by Ratnaditya-J · Pull Request #1234 · jina-ai/reader

Ratnaditya-J · 2026-02-28T04:44:53Z

This adds a new X-Max-Output-Tokens header (and corresponding maxOutputTokens body parameter) that lets callers limit how many tokens the response content contains. The content gets truncated to fit within the specified limit before being returned.

This is different from X-Token-Budget, which rejects the entire request if the cost exceeds the budget. X-Max-Output-Tokens returns partial content instead, which is what the issue author was asking for -- retrieving just the first N tokens of a page without hitting budget errors or getting the full response.

How it works:

New header X-Max-Output-Tokens parsed in CrawlerOptions.from()
Value passed through threadLocal to SnapshotFormatter
truncateToTokenLimit() uses the existing countGPTToken function to estimate a cut point proportional to the char/token ratio, then verifies and adjusts to stay within the limit
Applied to both the early return path (text/html/lm modes) and the main markdown/content path

Adds a new parameter that truncates the extracted content to fit within a specified token budget. Unlike X-Token-Budget (which rejects requests that exceed the budget), this parameter returns partial content up to the limit, useful for workflows that only need the initial portion of a page.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add X-Max-Output-Tokens header for output truncation#1234

Add X-Max-Output-Tokens header for output truncation#1234
Ratnaditya-J wants to merge 1 commit intojina-ai:mainfrom
Ratnaditya-J:feat/max-output-tokens

Ratnaditya-J commented Feb 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Ratnaditya-J commented Feb 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant