Token Usage #971

lweiler-lab · 2026-01-20T06:57:18Z

lweiler-lab
Jan 20, 2026

Since using V3 my token usage has increased significantly. I am burning thour my weekly limit in a two days. Anyone experiencing this also? Any Ideas how improve that? At this state I have to stop using V3.

UrRhb · 2026-03-25T12:12:41Z

UrRhb
Mar 25, 2026

Had the same experience with multi-agent setups — V3's orchestration layer adds significant overhead because each agent pass includes the full conversation context plus coordination prompts.

A few things that helped me get token usage under control:

1. Instrument first, optimize second
Before changing anything, figure out where the tokens are going. In my case, 70%+ of spend was in the coordinator/planner steps, not the actual worker agents. Without per-request visibility you're optimizing blind.

If you're using the Node.js SDK directly, burn0 can give you per-request cost breakdowns with a single import — it intercepts HTTP calls and logs exactly what each agent step costs. Helped me identify that one summarization step was re-sending the entire research context (~80k tokens) when it only needed the conclusions.

2. Context window management
Multi-agent frameworks tend to pass full context between steps. Look for opportunities to:

Summarize intermediate results before passing to the next agent
- Use max_tokens on responses to prevent unnecessarily verbose outputs
- If using Claude, leverage prompt caching — repeated system prompts across agent steps get cached at ~90% discount
  3. Model routing by task complexity
  Not every agent step needs the most capable model. Simple classification or routing decisions can use a smaller/cheaper model while reserving the flagship model for complex reasoning steps. The cost difference can be 10-50x.

4. Check for retry loops
V3's error handling sometimes silently retries failed tool calls, which can compound token usage fast. Watch for patterns where a single user request generates multiple API calls to the same model in quick succession.

The weekly limit issue suggests your usage jumped 3-4x, which lines up with what I've seen when moving to more sophisticated orchestration patterns. The tokens-per-task metric is the one to watch.

0 replies

roman-rr · 2026-04-04T00:51:22Z

roman-rr
Apr 4, 2026

We investigated this exact problem. Here's where your tokens are going:

Source	Tokens Added	When
300+ MCP tool definitions in context	~5,000-10,000	Every session
`[INTELLIGENCE]` pattern injection	~150-200	Every message
Router ASCII table output	~100-150	Every message
91 agent type definitions	~2,000-3,000	Every Agent invocation
Session restore verbose output	~200-300	Session start
Per-session overhead	~15,000-25,000

The intelligence layer is the biggest culprit: it builds a 100 MB graph from 5,706 entries (only ~20 unique, rest are duplicates), runs PageRank, then injects the same entry 5 times per message.

Ironically, the README claims '30-50% token reduction' — but that metric is fabricated (this.stats.totalTokensSaved += 100 hardcoded per cache hit).

Quick fixes to reduce token burn:

Disable the route hook in your Claude settings — saves ~150 tokens/message
Disable or limit intelligence hook — saves ~150 tokens/message
If you only need memory/HNSW, consider replacing ruflo with a minimal MCP server (~200 lines with hnswlib-node)

Full token cost analysis: Independent Audit

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Token Usage #971

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Token Usage #971

Uh oh!

lweiler-lab Jan 20, 2026

Replies: 2 comments

Uh oh!

UrRhb Mar 25, 2026

Uh oh!

roman-rr Apr 4, 2026

lweiler-lab
Jan 20, 2026

UrRhb
Mar 25, 2026

roman-rr
Apr 4, 2026