A high-performance plugin for ElizaOS that significantly reduces LLM costs through intelligent message batching, multi-agent coordination, and strategic planning.
- 🚀 Extreme Cost Reduction: Time-sliced batching reduces LLM calls by 20-40× across ALL rooms and agents simultaneously.
- 🕒 Time-Sliced Batching: Processes messages from multiple rooms in 50-500ms windows with a single LLM call.
- 🧠 Intelligent Planning: Assesses message complexity, knowledge requirements, and token budgets before generating responses.
- ⚡ Always-On Batching: All messages use time-sliced batching for optimal multi-agent coordination (critical messages can bypass for instant response).
- 🤖 Multi-Agent Coordination: Single planning call coordinates responses for all agents across all active rooms.
- 🎯 Smart Filtering: Topic relevance and flood detection prevent unnecessary processing.
- 🚨 Priority Routing: Critical messages bypass batching for immediate response (DMs, mentions, urgent keywords).
- 🔮 Predictive Pre-Warming: Learns daily patterns to predict and prepare for high-traffic periods.
- 🗂️ Semantic Clustering: Groups similar messages to reduce redundant processing.
- 💰 Budget Pooling: All agents share a global budget pool for optimal resource allocation.
- 📊 Rich Metrics: Detailed tracking of token usage, costs, latency, and I/O.
- 🛡️ Durable Queuing: Messages are persisted on enqueue so batches survive process restarts.
bun add @elizaos/plugin-autonomousAdd the plugin to your character configuration:
{
"plugins": ["@elizaos/plugin-autonomous"]
}Configure behavior via .env or runtime settings:
| Setting | Description | Default |
|---|---|---|
AUTONOMOUS_BATCH_THRESHOLD |
Messages/sec globally for "high load" status (batching always enabled) | 2 |
AUTONOMOUS_BATCH_MAX_SIZE |
Min messages per time slice (efficiency gate) | 2 |
AUTONOMOUS_WORKER_CONCURRENCY |
Parallel time slice processors | 4 |
AUTONOMOUS_BUDGET_DAILY_USD |
Max daily spend (USD). Omit = unlimited | undefined |
AUTONOMOUS_DEADLINE_MS |
Max processing time per message (ms) | undefined |
AUTONOMOUS_BUDGET_MODE |
Action when over budget (dynamic | reject) |
dynamic |
Time Slice Behavior:
- Slice duration dynamically adjusts from 50ms (high load) to 500ms (low load)
- Minimum 2 messages required before processing (prevents inefficient singleton batches)
- Messages from ALL rooms are collected in each time slice
- Single LLM call processes all rooms simultaneously
Advanced Features:
- Priority routing enabled by default (
AUTONOMOUS_PRIORITY_ENABLED=true) - Semantic clustering tracks similar messages automatically
- Predictive learning builds hourly patterns for load anticipation
- dynamic (Default): Automatically downgrades to cheaper/faster models when budget is tight.
- reject: Queues or rejects messages when budget is exceeded.
All messages are processed through time-sliced batching for optimal multi-agent coordination. Even in "slow" rooms with a single message, batching allows multiple agents to coordinate their responses in a single LLM call, which is far more efficient than per-agent planning.
Exception: Critical priority messages (DMs, urgent mentions) can bypass batching for instant response via the "express lane."
The plugin monitors message velocity across all agents and rooms for metrics and logging:
- Low Load (< 2 msg/s): Quiet batching (no banner).
- High Load (≥ 2 msg/s): Shows colorful console output.
When high load is detected, you'll see:
🔥 HIGH LOAD BATCHING 🔥
Rate: 3.2 msg/s (threshold: 2)
Queue Depth: 8 messages
Room Count: 2 active rooms
→ Queuing message abc123... for batch processing
Messages are collected into time slices (50-500ms windows) across all rooms simultaneously:
⏰ TIME SLICE READY ⏰
Slice ID: 42
Messages: 12
Rooms: 3
Agents: 8
Wait Time: 150ms
The planner analyzes all messages from all rooms and all active agents in a single LLM call:
- Decides who should respond, ignore, or react
- Assigns complexity scores (0-100)
- Selects optimal pipelines (Fast/Balanced/Quality)
- Filters inactive agents automatically
Time slice processing shows beautiful ANSI art:
📦 TIME SLICE PROCESSING 📦
Slice ID: 42
Rooms: 3
Messages: 12
Agents: 8 (Alice, Bob, Charlie, ...)
📋 PLANNING COMPLETE 📋
Decisions: 24
Time: 234ms
Tokens: 1,456 in, 178 out
Actions: reply:12, ignore:10, react:2
🚀 EXECUTING PLANS 🚀
Executing 14 response(s)...
✅ TIME SLICE PROCESSED ✅
Time: 567ms
Slice ID: 42
Before adding agents to a time slice, the plugin applies several filters:
- Topic Relevance: Agents with 0% relevance (and not mentioned) are skipped
- Active Status: Only running agents are included in planning
- Flood Protection: During floods (20+ msg/5s), only agents with 30%+ relevance participate
- Priority Express Lane: Critical messages (DMs, @mentions, "urgent" keywords) bypass batching
Priority Scoring (0-100):
- Direct messages: +100
- Agent @mentions: +100
- Replies to agent: +80
- Urgent keywords: +80
- Voice messages: +100
Critical priority (≥80 score) → Immediate processing, skip batch
Messages are automatically clustered by semantic similarity:
- Lightweight word-overlap clustering (no LLM calls)
- Groups "hello", "hi", "hey" together for efficient processing
- 60% similarity threshold for clustering
- 60-second cluster lifetime
The system learns your traffic patterns automatically:
- Tracks hourly message rates (0-23 hours)
- Builds confidence over time (100 samples = 100% confidence)
- Predicts load for upcoming hours
- Recommendations:
warm(prepare),normal,cool(scale down)
Example: If your Discord is always busy 9am-5pm, the system learns this and can pre-optimize resources.
The ResourceTracker service monitors:
- Token usage (Input/Output)
- Estimated cost
- Latency (E2E, LLM, DB)
- I/O operations
After each batch processing (when there are actual metrics to show), you'll see a loud efficiency report:
╔══════════════════════════════════════════════════════════════╗
║ 🚀 AUTONOMOUS EFFICIENCY REPORT 🚀 ║
╠══════════════════════════════════════════════════════════════╣
║ 📊 Messages Served: 45 ║
║ 🧠 LLM Calls Made: 8 ║
║ 🎯 Messages per LLM Call: 5.63 ║
║ 💰 Cost per Message (¢): 2.45 ║
║ 📦 Batches Processed: 3 ║
╚══════════════════════════════════════════════════════════════╝
Note: The report only displays when there are messages or batches to report on - no empty reports!
Without plugin-autonomous:
- 8 agents × 10 messages × 5 rooms = 400 LLM calls
With per-room batching (v1):
- 5 rooms × 1 planning call = 5 LLM calls (80× reduction)
With time-sliced batching (v2):
- 1 time slice × 1 planning call = 1 LLM call (400× reduction!)
Scenario: 8 agents monitoring 3 busy Discord channels
Per-Message (No Batching):
- 50 messages arrive in 10 seconds
- 50 msg × 8 agents = 400 LLM calls
- Cost: ~$2.00
Time-Sliced Batching:
- Messages collected in 20 time slices (500ms each)
- 20 slices × 1 LLM call = 20 LLM calls
- Cost: ~$0.10
- Savings: 95% ($1.90)
Time slices enable true multi-room coordination:
Time Slice #1 (500ms):
Room A: 3 messages
Room B: 2 messages
Room C: 1 message
→ Single LLM call coordinates all 8 agents across all 3 rooms
→ 6 messages, 8 agents, 1 LLM call = 48 decisions
- ARCHITECTURE.md - Technical architecture and data flow
- EXAMPLES.md - Real-world usage patterns and examples
- CONFIG.md - Complete configuration reference
- DESIGN_DECISIONS.md - Architectural rationale
bun run buildbun run test