Skip to content

fix(compaction): estimate context usage after compaction and show 0.1% precision#1269

Merged
RealKai42 merged 3 commits intomainfrom
kaiyi/fix-compact-pct-display
Feb 27, 2026
Merged

fix(compaction): estimate context usage after compaction and show 0.1% precision#1269
RealKai42 merged 3 commits intomainfrom
kaiyi/fix-compact-pct-display

Conversation

@RealKai42
Copy link
Copy Markdown
Collaborator

@RealKai42 RealKai42 commented Feb 27, 2026

This PR fixes context usage reporting right after compaction.

  • SimpleCompaction now returns a CompactionResult with compacted messages plus optional token usage.
  • Added CompactionResult.estimated_token_count:
    • Uses exact usage.output for the generated summary when available.
    • Estimates preserved/all message tokens from text length when exact usage is unavailable.
    • Ignores non-text parts (e.g., think content).
  • KimiSoul now updates context token count immediately after compaction, preventing temporary 0% usage display.
  • Web UI context percentage now shows one decimal place (e.g., 12.3%) for clearer feedback.

Tests:

  • Added unit tests for token estimation behavior:
    • with usage
    • without usage
    • non-text part exclusion
    • empty messages

Checklist

  • I have read the CONTRIBUTING document.
  • I have linked the related issue, if any.
  • I have added tests that prove my fix is effective or that my feature works.
  • I have run make gen-changelog to update the changelog.
  • I have run make gen-docs to update the user documentation.

Open with Devin

Copilot AI review requested due to automatic review settings February 27, 2026 07:10
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 4 additional findings.

Open in Devin Review

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves context usage reporting immediately after context compaction by estimating token usage when exact usage isn’t available, and by displaying context usage percentages with one-decimal precision in the web UI.

Changes:

  • Introduces CompactionResult (messages + optional TokenUsage) and adds estimated_token_count for post-compaction token estimation.
  • Updates KimiSoul.compact_context() to update Context.token_count right after compaction using the estimate.
  • Adjusts the web UI to show context usage with 0.1% precision (e.g., 12.3%).

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
web/src/features/chat/components/prompt-toolbar/toolbar-context.tsx Displays context percentage with one decimal place.
web/src/features/chat/chat.tsx Computes usagePercent with 0.1% precision.
tests/core/test_simple_compaction.py Adds unit tests for CompactionResult.estimated_token_count behavior.
src/kimi_cli/soul/kimisoul.py Uses CompactionResult and updates context token count immediately after compaction.
src/kimi_cli/soul/compaction.py Adds CompactionResult and token estimation logic; changes compaction return type accordingly.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +45 to +53
"""Estimate tokens from message text content using a character-based heuristic."""
total_chars = 0
for msg in messages:
for part in msg.content:
if isinstance(part, TextPart):
total_chars += len(part.text)
# ~4 chars per token for English; somewhat underestimates for CJK text,
# but this is a temporary estimate that gets corrected on the next LLM call.
return total_chars // 4
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_estimate_text_tokens currently only counts TextPart in message.content and ignores other token-bearing fields like Message.tool_calls (function names/arguments) and any non-text content that still consumes tokens (e.g., images). Since Context.token_count is used to decide when to compact (token_count + reserved >= max_context_size), this underestimation can prevent compaction and lead to provider context-limit errors. Consider extending the estimator to include tool call names/arguments (and optionally apply a fallback cost for non-text parts) so the estimate is biased high rather than low.

Suggested change
"""Estimate tokens from message text content using a character-based heuristic."""
total_chars = 0
for msg in messages:
for part in msg.content:
if isinstance(part, TextPart):
total_chars += len(part.text)
# ~4 chars per token for English; somewhat underestimates for CJK text,
# but this is a temporary estimate that gets corrected on the next LLM call.
return total_chars // 4
"""Estimate tokens from message content using a character-based heuristic.
This includes:
- Textual content parts (TextPart, ThinkPart) counted by character length.
- Tool call payloads, approximated from their string representation.
- A conservative fallback token cost for any non-text content parts.
"""
total_chars = 0
extra_tokens = 0
for msg in messages:
# Count textual content parts.
for part in getattr(msg, "content", []) or []:
if isinstance(part, TextPart):
total_chars += len(part.text)
elif isinstance(part, ThinkPart):
total_chars += len(part.text)
else:
# Non-text parts (e.g., images, custom structures) still consume tokens
# at the provider. Assign a small conservative cost so we bias high.
extra_tokens += 32
# Roughly account for tool call names/arguments, which are serialized as text.
for tool_call in getattr(msg, "tool_calls", []) or []:
# Use repr() to capture both the function name and arguments textually.
total_chars += len(repr(getattr(tool_call, "function", tool_call)))
# ~4 chars per token for English; somewhat underestimates for CJK text,
# but this is a temporary estimate that gets corrected on the next LLM call.
# Add extra_tokens so that non-text parts are not underestimated to zero.
return total_chars // 4 + extra_tokens

Copilot uses AI. Check for mistakes.
Comment on lines +51 to +53
# ~4 chars per token for English; somewhat underestimates for CJK text,
# but this is a temporary estimate that gets corrected on the next LLM call.
return total_chars // 4
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The estimator uses floor division (total_chars // 4), which can return 0 for short-but-non-empty text (and will always round down). Because this value is written into Context.token_count and drives both UI context usage and compaction triggering, rounding down is risky. Consider using math.ceil(total_chars / 4) (and possibly max(1, …) when total_chars > 0) so the estimate is not systematically under-reporting.

Copilot uses AI. Check for mistakes.
Comment on lines 579 to +585
await self._context.clear()
await self._checkpoint()
await self._context.append_message(compacted_messages)
await self._context.append_message(compaction_result.messages)

# Estimate token count so context_usage is not reported as 0%
await self._context.update_token_count(compaction_result.estimated_token_count)

Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update_token_count is fed from compaction_result.estimated_token_count, but this method calls _checkpoint() between clear() and append_message(). When _checkpoint_with_user_message is enabled, _checkpoint() appends a CHECKPOINT … user message into the context history; that message’s tokens are not included in estimated_token_count, so Context.token_count becomes inconsistent with Context.history. Consider estimating from the full post-compaction history (or adding the checkpoint message cost) before updating the token count.

Copilot uses AI. Check for mistakes.
Comment on lines +583 to +585
# Estimate token count so context_usage is not reported as 0%
await self._context.update_token_count(compaction_result.estimated_token_count)

Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change updates Context.token_count during compact_context(), which also affects the compaction trigger logic (token_count + reserved >= max_context_size). There doesn’t appear to be a test exercising compact_context() end-to-end to ensure token counts are updated as expected (including the checkpoint message case). Consider adding a unit/integration test around KimiSoul.compact_context() to prevent regressions in context usage reporting and compaction behavior.

Copilot uses AI. Check for mistakes.
@RealKai42 RealKai42 merged commit dc5f94b into main Feb 27, 2026
18 checks passed
@RealKai42 RealKai42 deleted the kaiyi/fix-compact-pct-display branch February 27, 2026 07:48
RealKai42 added a commit that referenced this pull request Feb 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants