fix(compaction): estimate context usage after compaction and show 0.1% precision#1269
fix(compaction): estimate context usage after compaction and show 0.1% precision#1269
Conversation
…imation and update related methods
There was a problem hiding this comment.
Pull request overview
This PR improves context usage reporting immediately after context compaction by estimating token usage when exact usage isn’t available, and by displaying context usage percentages with one-decimal precision in the web UI.
Changes:
- Introduces
CompactionResult(messages + optionalTokenUsage) and addsestimated_token_countfor post-compaction token estimation. - Updates
KimiSoul.compact_context()to updateContext.token_countright after compaction using the estimate. - Adjusts the web UI to show context usage with 0.1% precision (e.g.,
12.3%).
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
web/src/features/chat/components/prompt-toolbar/toolbar-context.tsx |
Displays context percentage with one decimal place. |
web/src/features/chat/chat.tsx |
Computes usagePercent with 0.1% precision. |
tests/core/test_simple_compaction.py |
Adds unit tests for CompactionResult.estimated_token_count behavior. |
src/kimi_cli/soul/kimisoul.py |
Uses CompactionResult and updates context token count immediately after compaction. |
src/kimi_cli/soul/compaction.py |
Adds CompactionResult and token estimation logic; changes compaction return type accordingly. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| """Estimate tokens from message text content using a character-based heuristic.""" | ||
| total_chars = 0 | ||
| for msg in messages: | ||
| for part in msg.content: | ||
| if isinstance(part, TextPart): | ||
| total_chars += len(part.text) | ||
| # ~4 chars per token for English; somewhat underestimates for CJK text, | ||
| # but this is a temporary estimate that gets corrected on the next LLM call. | ||
| return total_chars // 4 |
There was a problem hiding this comment.
_estimate_text_tokens currently only counts TextPart in message.content and ignores other token-bearing fields like Message.tool_calls (function names/arguments) and any non-text content that still consumes tokens (e.g., images). Since Context.token_count is used to decide when to compact (token_count + reserved >= max_context_size), this underestimation can prevent compaction and lead to provider context-limit errors. Consider extending the estimator to include tool call names/arguments (and optionally apply a fallback cost for non-text parts) so the estimate is biased high rather than low.
| """Estimate tokens from message text content using a character-based heuristic.""" | |
| total_chars = 0 | |
| for msg in messages: | |
| for part in msg.content: | |
| if isinstance(part, TextPart): | |
| total_chars += len(part.text) | |
| # ~4 chars per token for English; somewhat underestimates for CJK text, | |
| # but this is a temporary estimate that gets corrected on the next LLM call. | |
| return total_chars // 4 | |
| """Estimate tokens from message content using a character-based heuristic. | |
| This includes: | |
| - Textual content parts (TextPart, ThinkPart) counted by character length. | |
| - Tool call payloads, approximated from their string representation. | |
| - A conservative fallback token cost for any non-text content parts. | |
| """ | |
| total_chars = 0 | |
| extra_tokens = 0 | |
| for msg in messages: | |
| # Count textual content parts. | |
| for part in getattr(msg, "content", []) or []: | |
| if isinstance(part, TextPart): | |
| total_chars += len(part.text) | |
| elif isinstance(part, ThinkPart): | |
| total_chars += len(part.text) | |
| else: | |
| # Non-text parts (e.g., images, custom structures) still consume tokens | |
| # at the provider. Assign a small conservative cost so we bias high. | |
| extra_tokens += 32 | |
| # Roughly account for tool call names/arguments, which are serialized as text. | |
| for tool_call in getattr(msg, "tool_calls", []) or []: | |
| # Use repr() to capture both the function name and arguments textually. | |
| total_chars += len(repr(getattr(tool_call, "function", tool_call))) | |
| # ~4 chars per token for English; somewhat underestimates for CJK text, | |
| # but this is a temporary estimate that gets corrected on the next LLM call. | |
| # Add extra_tokens so that non-text parts are not underestimated to zero. | |
| return total_chars // 4 + extra_tokens |
| # ~4 chars per token for English; somewhat underestimates for CJK text, | ||
| # but this is a temporary estimate that gets corrected on the next LLM call. | ||
| return total_chars // 4 |
There was a problem hiding this comment.
The estimator uses floor division (total_chars // 4), which can return 0 for short-but-non-empty text (and will always round down). Because this value is written into Context.token_count and drives both UI context usage and compaction triggering, rounding down is risky. Consider using math.ceil(total_chars / 4) (and possibly max(1, …) when total_chars > 0) so the estimate is not systematically under-reporting.
| await self._context.clear() | ||
| await self._checkpoint() | ||
| await self._context.append_message(compacted_messages) | ||
| await self._context.append_message(compaction_result.messages) | ||
|
|
||
| # Estimate token count so context_usage is not reported as 0% | ||
| await self._context.update_token_count(compaction_result.estimated_token_count) | ||
|
|
There was a problem hiding this comment.
update_token_count is fed from compaction_result.estimated_token_count, but this method calls _checkpoint() between clear() and append_message(). When _checkpoint_with_user_message is enabled, _checkpoint() appends a CHECKPOINT … user message into the context history; that message’s tokens are not included in estimated_token_count, so Context.token_count becomes inconsistent with Context.history. Consider estimating from the full post-compaction history (or adding the checkpoint message cost) before updating the token count.
| # Estimate token count so context_usage is not reported as 0% | ||
| await self._context.update_token_count(compaction_result.estimated_token_count) | ||
|
|
There was a problem hiding this comment.
This change updates Context.token_count during compact_context(), which also affects the compaction trigger logic (token_count + reserved >= max_context_size). There doesn’t appear to be a test exercising compact_context() end-to-end to ensure token counts are updated as expected (including the checkpoint message case). Consider adding a unit/integration test around KimiSoul.compact_context() to prevent regressions in context usage reporting and compaction behavior.
This PR fixes context usage reporting right after compaction.
SimpleCompactionnow returns aCompactionResultwith compacted messages plus optional token usage.CompactionResult.estimated_token_count:usage.outputfor the generated summary when available.KimiSoulnow updates context token count immediately after compaction, preventing temporary0%usage display.12.3%) for clearer feedback.Tests:
Checklist
make gen-changelogto update the changelog.make gen-docsto update the user documentation.