Advanced inference optimization plugin for OpenClaw. Provides intelligent model routing, cross-provider pricing, dynamic cost tracking, latency monitoring, prompt caching optimization, and multi-provider embedding service across 12+ models on Anthropic and OpenRouter.
0 vulnerabilities · 855 tests passing · Node 22+
- ClawRouter Integration — 15-dimension complexity scoring as primary classifier, with heuristic fallback
- Cross-Provider Support — Route across Anthropic and OpenRouter providers seamlessly
- Tier-Based Classification — Automatic model selection based on request complexity (simple/mid/complex/reasoning)
- Provider Resolution — Glob-pattern matching for intelligent provider selection (
openai/*→ openrouter) - Shadow Mode — Observe routing decisions without mutation (active mode blocked on OpenClaw hooks)
- Request Tracking — Input/output tokens, cache reads/writes, estimated savings per request
- Dynamic Pricing — Live pricing data from OpenRouter API with 6-hour TTL caching
- Latency Monitoring — Per-model P50/P95/avg latency tracking with circular buffer (100 samples default)
- Dashboard — Real-time web UI at
http://localhost:3333with dark theme
- Cache Breakpoint Injection — Optimizes Anthropic's prompt caching for maximum efficiency
- Smart Windowing — Maintains conversation context while minimizing redundant tokens
- 90% Cache Savings — Cache reads are significantly cheaper than regular input tokens
- Risk-Free Operation — All routing runs in observe-only mode, no request mutation
- Comprehensive Logging — Detailed routing decisions with cost projections and provider recommendations
- Full Pipeline Simulation — Complete shadow execution of classify → resolve → recommend → log
openclaw plugins install slimclawThis automatically downloads, installs, and registers the plugin. Skip to Configuration.
# Install to OpenClaw plugins directory
mkdir -p ~/.openclaw/plugins/slimclaw
cd ~/.openclaw/plugins/slimclaw
npm init -y
npm install slimclawThen register the plugin in ~/.openclaw/openclaw.json:
{
"plugins": {
"load": {
"paths": ["~/.openclaw/plugins/slimclaw"]
},
"entries": {
"slimclaw": { "enabled": true }
}
}
}git clone https://github.com/evansantos/slimclaw ~/.openclaw/plugins/slimclaw
cd ~/.openclaw/plugins/slimclaw
npm install && npm run buildThen register the plugin in ~/.openclaw/openclaw.json using the same configuration as Method 2.
Create a configuration file at ~/.openclaw/plugins/slimclaw/slimclaw.config.json.
Start with this basic setup to enable shadow mode:
{
"enabled": true,
"mode": "shadow"
}Shadow mode observes and logs routing decisions without modifying requests:
{
"enabled": true,
"mode": "shadow",
"routing": {
"enabled": true,
"shadowLogging": true
},
"dashboard": {
"enabled": true,
"port": 3333
}
}Configure which models to use for different complexity levels:
{
"enabled": true,
"mode": "shadow",
"routing": {
"enabled": true,
"shadowLogging": true,
"tiers": {
"simple": "anthropic/claude-3-haiku-20240307",
"mid": "anthropic/claude-sonnet-4-20250514",
"complex": "anthropic/claude-opus-4-20250514",
"reasoning": "anthropic/claude-opus-4-20250514"
}
},
"dashboard": {
"enabled": true,
"port": 3333
}
}Route different model families to their optimal providers:
{
"enabled": true,
"mode": "shadow",
"routing": {
"enabled": true,
"shadowLogging": true,
"tiers": {
"simple": "anthropic/claude-3-haiku-20240307",
"mid": "openai/gpt-4-turbo",
"complex": "anthropic/claude-opus-4-20250514",
"reasoning": "openai/o1-preview"
},
"tierProviders": {
"openai/*": "openrouter",
"anthropic/*": "anthropic",
"google/*": "openrouter"
},
"openRouterHeaders": {
"HTTP-Referer": "https://slimclaw.dev",
"X-Title": "SlimClaw"
}
},
"dashboard": {
"enabled": true,
"port": 3333
}
}Set daily/weekly spending limits with automatic tier downgrading:
{
"enabled": true,
"mode": "shadow",
"routing": {
"enabled": true,
"shadowLogging": true,
"reasoningBudget": 10000,
"allowDowngrade": true,
"tiers": {
"simple": "anthropic/claude-3-haiku-20240307",
"mid": "anthropic/claude-sonnet-4-20250514",
"complex": "anthropic/claude-opus-4-20250514",
"reasoning": "anthropic/claude-opus-4-20250514"
}
},
"dashboard": {
"enabled": true,
"port": 3333
}
}Automatically fetch live pricing from OpenRouter API:
{
"enabled": true,
"mode": "shadow",
"routing": {
"enabled": true,
"shadowLogging": true,
"tiers": {
"simple": "anthropic/claude-3-haiku-20240307",
"mid": "openai/gpt-4-turbo",
"complex": "anthropic/claude-opus-4-20250514",
"reasoning": "openai/o1-preview"
},
"tierProviders": {
"openai/*": "openrouter",
"anthropic/*": "anthropic"
},
"dynamicPricing": {
"enabled": true,
"ttlMs": 21600000,
"refreshIntervalMs": 21600000,
"timeoutMs": 10000,
"apiUrl": "https://openrouter.ai/api/v1/models"
}
},
"dashboard": {
"enabled": true,
"port": 3333
}
}After configuration, restart the OpenClaw gateway to load the plugin:
# Restart gateway to load plugin
openclaw gateway restart
# Check it loaded successfully
tail -5 ~/.openclaw/logs/gateway.log | grep SlimClaw
# Check the dashboard is running
curl http://localhost:3333/api/statsYou should see log entries indicating SlimClaw loaded successfully and the dashboard should respond with current metrics. Access the web UI at http://localhost:3333 to monitor routing decisions in real-time.
/slimclawShows current metrics: requests, tokens, cache hits, savings, and routing statistics.
When dashboard.enabled: true, access the web UI at:
http://localhost:3333
Or from your network: http://<your-ip>:3333
GET /metrics/optimizer— Current metrics summaryGET /metrics/history?period=hour&limit=24— Historical dataGET /metrics/raw— Raw metrics for debuggingGET /api/routing-stats— Routing decision statisticsGET /health— Health check
SlimClaw implements a comprehensive routing pipeline that operates in shadow mode:
- Classification → ClawRouter analyzes request complexity across 15 dimensions, assigns tier (simple/mid/complex/reasoning). Falls back to built-in heuristic classifier on failure (circuit breaker pattern).
- Provider Resolution → Map model patterns to providers using glob matching (
openai/*→ openrouter) - Shadow Recommendation → Generate complete routing decision without mutation
- Structured Logging → Output detailed recommendations with cost analysis
SlimClaw v0.2.0 operates exclusively in shadow mode due to OpenClaw's current hook limitations:
- Observes all requests and generates routing recommendations
- Logs what routing decisions would be made with cost projections
- Tracks dynamic pricing and latency for routing intelligence
- Cannot mutate requests until OpenClaw supports
historyMessagesmutation
Example Shadow Log:
[SlimClaw] 🔮 Shadow route: opus-4-6 → o4-mini (via openrouter)
Tier: reasoning (0.92) | Savings: 78% | $0.045/1k → $0.003/1k
| Complexity | Default Model | Use Cases |
|---|---|---|
simple |
Claude 3 Haiku | Basic Q&A, simple tasks, casual chat |
mid |
Claude 4 Sonnet | General development, analysis, writing |
complex |
Claude 4 Opus | Architecture, complex debugging, research |
reasoning |
Claude 4 Opus | Multi-step logic, planning, deep analysis |
import {
resolveModel,
buildShadowRecommendation,
makeRoutingDecision,
type ShadowRecommendation,
} from './routing/index.js';
// Generate routing decision
const decision = resolveModel(classification, routingConfig, context);
// Build complete shadow recommendation
const recommendation = buildShadowRecommendation(
runId,
actualModel,
decision,
tierProviders,
pricing,
);import { DynamicPricingCache, DEFAULT_DYNAMIC_PRICING_CONFIG } from './routing/dynamic-pricing.js';
const pricing = new DynamicPricingCache({
enabled: true,
openRouterApiUrl: 'https://openrouter.ai/api/v1/models',
cacheTtlMs: 21600000, // 6 hours
fetchTimeoutMs: 5000,
fallbackToHardcoded: true,
});
// Get live pricing (sync — falls back to hardcoded if cache miss)
const cost = pricing.getPricing('openai/gpt-4-turbo');
// { inputPer1k: 0.01, outputPer1k: 0.03 }
// Refresh cache from OpenRouter API (async)
await pricing.refresh();import {
LatencyTracker,
DEFAULT_LATENCY_TRACKER_CONFIG,
type LatencyStats,
} from './routing/latency-tracker.js';
const tracker = new LatencyTracker({
enabled: true,
windowSize: 100, // circular buffer per model
outlierThresholdMs: 60000,
});
// Record request latency (tokenCount optional — enables throughput calc)
tracker.recordLatency('anthropic/claude-sonnet-4-20250514', 2500, 150);
// Get statistics
const stats: LatencyStats | null = tracker.getLatencyStats('anthropic/claude-sonnet-4-20250514');
// { p50: 2100, p95: 4500, avg: 2300, min: 1800, max: 5200, count: 87, tokensPerSecond: 45.2 }import { resolveProvider, type ProviderResolution } from './routing/provider-resolver.js';
const resolution = resolveProvider('openai/gpt-4-turbo', {
'openai/*': 'openrouter',
'anthropic/*': 'anthropic',
});
// { provider: 'openrouter', matchedPattern: 'openai/*', confidence: 1.0 }- 618 tests passing across 43 test files
- 3 major phases complete:
- Phase 1: Cross-provider pricing (PR #24)
- Phase 2a: Shadow routing (PR #25)
- Phase 3a: Dynamic pricing + latency tracking (PR #29)
- 12+ models supported across Anthropic and OpenRouter providers
- Shadow mode only — active routing blocked on OpenClaw hook mutation support
- 0 vulnerabilities — clean security audit
| Provider | Models | Pricing | Status |
|---|---|---|---|
| Anthropic | Claude 3/4 series | ✅ Dynamic + Hardcoded | Production |
| OpenRouter | 12+ models (OpenAI, Google, etc.) | ✅ Live API pricing | Production |
SlimClaw includes a production-ready embedding router for multi-provider vector generation with intelligent caching, cost tracking, and configurable complexity-based model selection.
- Intelligent Routing — Classify text complexity (simple/mid/complex) and select optimal embedding model
- Smart Caching — SQLite-backed persistent cache with TTL and duplicate detection
- Cost Tracking — Per-model pricing with real-time metrics and savings calculation
- Retry Logic — Exponential backoff with configurable retries and timeout protection
- Configurable Thresholds — Tune complexity detection via config (default: 200/1000 chars)
- Dashboard Integration — Real-time metrics at
/api/embeddings/metrics
import { EmbeddingRouter } from 'slimclaw';
const router = new EmbeddingRouter({
providers: [
new AnthropicProvider({ apiKey: process.env.ANTHROPIC_API_KEY }),
new OpenRouterProvider({ apiKey: process.env.OPENROUTER_API_KEY }),
],
config: {
tiers: {
simple: 'openai/text-embedding-3-small',
mid: 'openai/text-embedding-3-large',
complex: 'cohere/cohere-embed-english-v3.0',
},
retry: {
maxRetries: 3,
baseDelayMs: 1000,
timeoutMs: 30000,
},
},
});
const result = await router.embed('Hello, world!');
console.log(result.embedding); // [0.123, 0.456, ...]1. Set up API keys — Create .env from .env.example:
ANTHROPIC_API_KEY=sk-ant-your-key-here
OPENROUTER_API_KEY=your-openrouter-key-here2. Enable in config — Add to slimclaw.config.json or OpenClaw's openclaw.json under plugins.slimclaw:
{
"embeddings": {
"enabled": true,
"routing": {
"tiers": {
"simple": "voyage-3-lite",
"mid": "voyage-3",
"complex": "voyage-3-large"
},
"tierProviders": {
"voyage-3-lite": "openrouter",
"voyage-3": "openrouter",
"voyage-3-large": "openrouter"
}
},
"caching": {
"enabled": true,
"ttlMs": 3600000,
"maxSize": 1000
},
"metrics": {
"enabled": true,
"trackCosts": true
}
}
}Config Schema:
enabled(boolean, default:false) — Enable embedding routerrouting.tiers(object) — Map complexity tiers to modelsrouting.tierProviders(object) — Map models/patterns to providers (anthropic|openrouter)caching.enabled(boolean, default:true) — Enable SQLite cachecaching.ttlMs(number, default:3600000) — Cache TTL in milliseconds (1 hour)caching.maxSize(number, default:1000) — Max cache entriesmetrics.enabled(boolean, default:true) — Track metricsmetrics.trackCosts(boolean, default:true) — Calculate per-model costs
Access embedding metrics at: GET /api/embeddings/metrics
{
"totalRequests": 42,
"cacheHits": 38,
"cacheMisses": 4,
"totalCost": 0.15,
"averageDurationMs": 245,
"requestsByTier": {
"simple": 20,
"mid": 15,
"complex": 7
}
}- Exponential Backoff: 1s → 2s → 4s delays between attempts
- Smart Error Handling: Retries 5xx/timeouts, skips 4xx (client errors)
- Timeout Protection: Per-request 30s timeout prevents hanging
- Error Context: Logs include provider name, attempt count, original error
Embeddings tests included in main test suite:
npm test -- src/embeddings/Coverage: 57+ tests covering routing, caching, retry logic, and metrics.
- Active Mode — Waiting on OpenClaw hook mutation support (#20416)
- Budget Enforcement — Per-session/daily spend limits with automatic tier capping
- A/B Testing — Route percentages across providers to compare quality/cost tradeoffs
- Multi-Factor Routing — Combine cost + latency + quality signals for optimal model selection
- Embedding Pooling — Batch multiple embedding requests for efficiency
Note:
@blockrun/clawrouterpullsviemas a transitive dependency (~50 packages). This doesn't affect functionality but adds tonode_modulessize. See BlockRunAI/clawrouter#1 for tracking.
SlimClaw can operate as an active provider proxy, intercepting model requests and applying intelligent routing.
- Enable proxy in
slimclaw.config.json:
{
"enabled": true,
"proxy": {
"enabled": true,
"port": 3334
},
"routing": {
"enabled": true,
"tiers": {
"simple": "anthropic/claude-3-haiku-20240307",
"mid": "anthropic/claude-sonnet-4-20250514",
"complex": "anthropic/claude-opus-4-20250514"
}
}
}- Set your OpenClaw model to
slimclaw/auto:
{
"defaultModel": "slimclaw/auto"
}- OpenClaw sends request to
slimclaw/auto - SlimClaw classifies prompt complexity
- Routing pipeline selects optimal model for the tier
- Request forwards to real provider (OpenRouter)
- Streaming response pipes back to OpenClaw
| Feature | Status |
|---|---|
slimclaw/auto model |
✅ |
| OpenRouter forwarding | ✅ |
| Streaming responses | ✅ |
| Budget enforcement | ✅ |
| A/B testing | ✅ |
| Direct Anthropic API | ⏳ Phase 2 |
| Multiple virtual models | ⏳ Phase 2 |
PRs welcome! The project uses branch protection with CI + GitSniff review.
git clone https://github.com/evansantos/slimclaw
cd slimclaw
npm install
npm test- TypeScript — Full type safety with strict checking
- Testing — 618 tests with full coverage across routing pipeline
- ESLint — Enforced code style and quality
- Branch Protection — All changes require PR review
MIT