Skip to content

feat: integrate external memory store via Zep REST client#1

Open
aryehlev wants to merge 8 commits intomainfrom
claude/memory-router-integration-plan-M7Q0R
Open

feat: integrate external memory store via Zep REST client#1
aryehlev wants to merge 8 commits intomainfrom
claude/memory-router-integration-plan-M7Q0R

Conversation

@aryehlev
Copy link
Copy Markdown
Owner

@aryehlev aryehlev commented Apr 20, 2026

Adds opt-in persistent memory that runs as a sidecar service rather than
in-process. Before each conversation turn the runtime recalls relevant
facts from the configured memory backend and injects them into the
system prompt; after the turn completes it forwards the user prompt and
assistant reply back to the backend for extraction.

  • New memory-client crate exposes a synchronous MemoryClient trait
    plus ZepMemoryClient (reqwest::blocking) and a NoopMemoryClient.
    Recall and ingest endpoints target Zep v2 /api/v2/sessions/{id}/....
  • ConversationRuntime::with_memory_client wires an optional client
    into run_turn; failures are logged and never abort the turn.
  • RuntimeFeatureConfig gains a memory block (enabled, baseUrl,
    apiKey, userId, recallLimit) with schema validation and CLI
    boot-time construction.
  • Tests: 7 unit tests in memory-client, 3 HTTP integration tests
    against a minimal in-process mock, 2 runtime tests covering prompt
    augmentation and error tolerance.
  • Drive-by: fix clippy uninlined_format_args in bash.rs that was
    blocking the workspace verification gate; rustfmt reflowed a few
    unrelated files in api/ and runtime/lane_events.rs.

Summary by CodeRabbit

  • New Features

    • Runtime router with external and eval‑driven model selection.
    • Optional conversation memory: per‑turn recall and ingest via an external memory client.
    • Eval CLI to replay sessions, record per‑turn routing decisions, metrics, and JSONL output.
    • Local router sidecar compose and optional efficiency hooks (RTK) plus Caveman skill.
  • Documentation

    • Router deployment and efficiency integration guides added.
  • Chores

    • Test additions and formatting cleanup.

claude added 4 commits April 20, 2026 16:23
Adds opt-in persistent memory that runs as a sidecar service rather than
in-process. Before each conversation turn the runtime recalls relevant
facts from the configured memory backend and injects them into the
system prompt; after the turn completes it forwards the user prompt and
assistant reply back to the backend for extraction.

- New `memory-client` crate exposes a synchronous `MemoryClient` trait
  plus `ZepMemoryClient` (reqwest::blocking) and a `NoopMemoryClient`.
  Recall and ingest endpoints target Zep v2 `/api/v2/sessions/{id}/...`.
- `ConversationRuntime::with_memory_client` wires an optional client
  into `run_turn`; failures are logged and never abort the turn.
- `RuntimeFeatureConfig` gains a `memory` block (`enabled`, `baseUrl`,
  `apiKey`, `userId`, `recallLimit`) with schema validation and CLI
  boot-time construction.
- Tests: 7 unit tests in memory-client, 3 HTTP integration tests
  against a minimal in-process mock, 2 runtime tests covering prompt
  augmentation and error tolerance.
- Drive-by: fix clippy `uninlined_format_args` in bash.rs that was
  blocking the workspace verification gate; rustfmt reflowed a few
  unrelated files in api/ and runtime/lane_events.rs.
Adds an opt-in routing/caching path. When .claw.json contains a
`router` block with `enabled: true`, claw bypasses per-model provider
detection and sends every request to the configured OpenAI-compatible
proxy. The intended flow is:

    claw  ->  gptcache:8100  ->  routellm:6060  ->  upstream providers

RouteLLM picks between a strong and weak model per prompt using its
trained classifier; GPTCache serves semantic cache hits in front. Both
speak OpenAI-compat so claw only needs a base-URL swap.

Config keys (all strings, `enabled` bool):
  router.enabled, router.baseUrl, router.apiKey, router.model

Where `model` is the router specifier the proxy expects (for RouteLLM,
e.g. `router-mf-0.11593`). On each turn's MessageStart event we read
`message.model` from the response and emit:

    [router] user_model=opus routed_to=claude-haiku-4-5

to stderr, giving an immediate per-turn signal that pipes cleanly into
a cost/quality eval. RouteLLM ships its own MT-Bench/MMLU/GSM8K harness
which users run inside the sidecar container.

- runtime/config: new RouterConfig struct + parser, schema validation,
  re-export.
- rusty-claude-cli: RouterOverride + build_router_override, wired into
  AnthropicRuntimeClient::new; when set, constructs an
  OpenAiCompatClient with the overridden base_url and uses
  router.model as the request model. Falls back to existing
  detect_provider_kind path when disabled.
- deploy/router/: docker-compose.yml and README documenting the
  two-sidecar topology and the RouteLLM eval invocation.
- Tests: config parsing with a fully populated `router` block and
  default-disabled state.
Replays every user turn from a saved session through the configured
router proxy and emits per-turn JSONL plus a stderr summary. This is
the piece that closes the loop on routing evaluation: instead of
benchmarking the router against Chatbot Arena data, you benchmark it
against your own historical prompts and observe which upstream model
the router actually picks per turn.

Invocation:

    claw eval --session latest --output eval.jsonl
    claw eval --session abc123 --max-turns 50

Per-turn record shape:

    {"turn_index": 3, "user_input": "...",
     "requested_model": "router-mf-0.11593",
     "routed_model": "claude-haiku-4-5",
     "latency_ms": 284, "input_tokens": 1840, "output_tokens": 92,
     "ok": true}

Summary line on stderr includes turn counts, average latency,
total tokens, and the model distribution picked by the router so
cost/quality curves can be plotted from a single run.

- `CliAction::Eval` variant with `parse_eval_args` supporting
  --session, --output/-o, --max-turns, and --output-format, plus
  inline --flag=value syntax.
- `run_eval_command` loads the session via the existing
  SessionStore, builds cumulative history per turn so the router
  sees the real conversation depth, calls `send_message` through
  an OpenAiCompatClient pointed at `router.baseUrl`, and records
  `response.model` (the upstream pick) along with usage.
- Requires `router.enabled: true` — the command errors out
  otherwise so we never silently fall through to upstream providers.
- `EvalSummary` accumulator with full coverage: successes,
  failures, latency, tokens, model distribution.
- Tests: 11 new tests covering arg parsing (defaults, explicit
  flags, inline syntax, invalid --max-turns, unknown options,
  top-level subcommand routing), EvalSummary math, string
  helpers.
- deploy/router/README.md documents the new subcommand alongside
  RouteLLM's own eval harness.
Adds a second routing mode where claw picks the upstream model itself
per turn from a scoreboard of prior outcomes, instead of delegating to
an external proxy. The scoreboard is a plain JSON file you can inspect;
every decision is explainable (warm_start / explore / exploit).

Request flow in `eval-driven` mode:

    claw (picks model from scoreboard) --> Anthropic / OpenAI / ...

A cache sidecar is still optional (add `baseUrl`/`apiKey` to the
`router` block and claw's chosen model flows through it), but
RouteLLM's classifier is no longer required.

Selection strategy (in new `eval-router` crate):
- Bucket prompts by length (short < 512 chars, medium < 6000, long ≥)
- Warm-start: any candidate with fewer than `minSamples` samples in
  the current bucket gets picked so the scoreboard builds a baseline
- Epsilon-greedy explore (`epsilonPercent` / 100 of the time)
- Otherwise pick the candidate with the highest Laplace-smoothed
  success rate; tiebreak by lower average latency

After each turn, claw records `(success, latency_ms, input_tokens,
output_tokens)` back to the scoreboard and persists it atomically
(write-then-rename, so a crash mid-write can't corrupt the file).

Config (`.claw.json` `router` block):
- `mode`: "external" (default) or "eval-driven"
- `candidates`: array of concrete model names (required when eval-driven)
- `epsilonPercent`: exploration rate 0-100 (default 10)
- `minSamples`: warm-start threshold (default 5)
- `scoreboardPath`: JSON file location (default ~/.local/share/claw/)

Each turn logs one line to stderr:

    [router] bucket=short selected=claude-haiku-4-5 reason=exploit

- `eval-router` crate: RouterScoreboard with load/save, PromptBucket,
  ModelStats (Laplace smoothing), Selection/SelectionReason, DefaultRng
  (splitmix64-based, no rand crate dep), RandomSource trait for
  deterministic tests. 12 unit tests.
- `runtime::RouterConfig` gains mode/candidates/epsilonPercent/
  minSamples/scoreboardPath with schema validation and parsing tests
  (3 new).
- `AnthropicRuntimeClient`: optional `EvalRouterState` consulted
  pre-turn to override the request model, and updated post-turn with
  the outcome. Falls back to `self.model` when router returns no
  selection.
- `build_eval_router` / `build_router_override` are now mutually
  exclusive on `router.mode`.
- CLI tests: prompt_bucket_for_request (3 cases), tokens_from_events
  (2 cases).
- `deploy/router/README.md` documents Mode A (external sidecar) and
  Mode B (eval-driven, in-process) side-by-side.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 20, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Adds eval-driven routing and external synchronous memory: new crates eval-router and memory-client, config and validation for router/memory, runtime plumbing (per-turn recall, prompt augmentation, ingest, selection/recording), CLI eval replay tooling, deploy/docs, and accompanying tests.

Changes

Cohort / File(s) Summary
Test & formatting tweaks
rust/crates/api/src/providers/mod.rs, rust/crates/api/src/providers/openai_compat.rs, rust/crates/runtime/src/lane_events.rs, rust/crates/runtime/src/bash.rs
Whitespace, assertion reformatting, and a timestamp format tweak in tests/code; no behavioral changes.
Runtime config & validation
rust/crates/runtime/src/config.rs, rust/crates/runtime/src/config_validate.rs
Added memory and router config sections and accessors (MemoryConfig, RouterConfig, RouterMode), parsing helpers, defaults, and validation schema entries for nested memory.* and router.* keys (unknown-key/type errors for invalid values).
Conversation runtime (memory integration)
rust/crates/runtime/src/conversation.rs
Added optional synchronous MemoryClient + with_memory_client(...), per-turn recall to augment system prompt, and post-turn ingest of user+assistant messages; recall/ingest errors are logged and non-fatal.
Eval-driven router crate
rust/crates/eval-router/Cargo.toml, rust/crates/eval-router/src/lib.rs
New crate implementing RouterScoreboard, PromptBucket, ModelStats, selection logic (warm-start / explore / exploit), Laplace smoothing, decay, file atomic load/save, RNG trait, and tests. Public APIs for selection/recording exposed.
Memory client crate
rust/crates/memory-client/Cargo.toml, rust/crates/memory-client/src/lib.rs, rust/crates/memory-client/tests/zep_http.rs
New crate defining MemoryClient trait, MemoryMessage, NoopMemoryClient, Zep-backed ZepMemoryClient (blocking HTTP), error mapping/validation, and integration tests with an in-process HTTP mock.
CLI eval & runtime routing plumbing
rust/crates/rusty-claude-cli/src/main.rs, rust/crates/rusty-claude-cli/Cargo.toml
Added eval subcommand and replay tooling producing per-turn JSONL plus summary; routing/eval state and router override plumbing added; AnthropicRuntimeClient::new accepts optional router/eval-router and records routing outcomes; new router/memory deps wired in.
Runtime crate exports & deps
rust/crates/runtime/src/lib.rs, rust/crates/runtime/Cargo.toml
Re-exported MemoryConfig, RouterConfig, RouterMode; added memory-client as a path dependency.
Deploy: router infra & docs
deploy/router/README.md, deploy/router/docker-compose.yml
Added router documentation (external vs eval-driven modes) and Docker Compose manifest for sidecar stack (routellm / gptcache).
Deploy: efficiency integrations
deploy/efficiency/README.md, deploy/efficiency/caveman/README.md, deploy/efficiency/rtk/hook.sh, deploy/efficiency/rtk/settings.json.example
New efficiency docs, Caveman guidance, RTK PreToolUse hook script, and example settings file.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant ConversationRuntime
    participant MemoryClient
    participant RouterScoreboard
    participant UpstreamProvider

    User->>ConversationRuntime: send_message(session_id, user_input)
    ConversationRuntime->>MemoryClient: recall(session_id, user_input, limit)
    MemoryClient-->>ConversationRuntime: memory_snippets (or error)
    ConversationRuntime->>RouterScoreboard: determine bucket & select(candidates, epsilon)
    RouterScoreboard-->>ConversationRuntime: Selection(model, reason)
    ConversationRuntime->>UpstreamProvider: ApiRequest(model=selected_model, system_prompt_with_memories)
    UpstreamProvider-->>ConversationRuntime: assistant_response + usage events
    ConversationRuntime->>RouterScoreboard: record_outcome(bucket, model, success, latency, tokens, cost)
    ConversationRuntime->>MemoryClient: ingest(session_id, user_input + assistant_text)
    MemoryClient-->>ConversationRuntime: ok (or error)
    ConversationRuntime-->>User: assistant_response
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~55 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title 'feat: integrate external memory store via Zep REST client' directly corresponds to the primary feature in this changeset: adding opt-in persistent memory integration via a Zep-compatible REST client, including the new memory-client crate, ZepMemoryClient implementation, and ConversationRuntime integration.
Docstring Coverage ✅ Passed Docstring coverage is 93.51% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch claude/memory-router-integration-plan-M7Q0R

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🧹 Nitpick comments (1)
deploy/router/README.md (1)

9-11: Add language specifiers to fenced code blocks.

The ASCII diagrams and log output examples should have language specifiers for consistency and accessibility. Use text or plaintext for these blocks.

🔧 Proposed fix

Line 9:

-```
+```text
 claw --> gptcache:8100 --> routellm:6060 --> Anthropic / OpenAI / ...

Line 52:

-```
+```text
 claw (picks model from scoreboard) --> Anthropic / OpenAI / ...

Line 87:

-```
+```text
 [router] bucket=short selected=claude-haiku-4-5 reason=exploit

Line 172:

-```
+```text
 [router] user_model=opus routed_to=claude-haiku-4-5

Also applies to: 52-54, 87-89, 172-174

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deploy/router/README.md` around lines 9 - 11, Update the fenced code blocks
in the README so they include a language specifier (use `text` or `plaintext`)
to improve consistency and accessibility; specifically add the specifier to the
diagram block containing "claw --> gptcache:8100 --> routellm:6060 --> Anthropic
/ OpenAI / ...", the diagram "claw (picks model from scoreboard) --> Anthropic /
OpenAI / ...", and the log example blocks beginning with "[router] bucket=short
selected=claude-haiku-4-5 reason=exploit" and "[router] user_model=opus
routed_to=claude-haiku-4-5" so each triple-backtick fence starts with ```text
(or ```plaintext) instead of just ``` .
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@deploy/router/docker-compose.yml`:
- Around line 15-16: The docs reference a non-existent docker service
`gptcache-direct`; either add a cache-only service or update the instructions to
the actual service name. Create a new service definition (e.g.,
`gptcache-direct`) in the compose file mirroring the existing `gptcache` service
but remove the `depends_on: - routellm` and any RouteLLM-specific environment or
network ties so it can run standalone, and ensure volumes/ports match the
current `gptcache` setup; alternatively, change the README command to use the
real service name and remove the "cache only" claim. Also apply the same fix to
the duplicated doc text around the second occurrence (lines ~43-53) so both
instructions point to a valid service or describe how to run the cache-only
service.

In `@rust/crates/eval-router/src/lib.rs`:
- Around line 155-160: ScoreboardState currently derives Default which
initializes version to 0 (not SCOREBOARD_VERSION); replace the auto-derived
Default with an explicit impl Default for ScoreboardState that returns
ScoreboardState { version: default_version(), buckets: Default::default() } (or
otherwise set version to SCOREBOARD_VERSION via default_version) so new/empty
scoreboards start with the correct schema version; apply the same fix to the
analogous struct/impl at the other occurrence referenced (lines ~292-300) so
both code paths use default_version()/SCOREBOARD_VERSION instead of zero.

In `@rust/crates/memory-client/src/lib.rs`:
- Around line 215-221: The code interpolates raw session_id into the route
(e.g., the path in SearchRequest creation and the other block around lines
246–251); percent-encode the session_id as a path segment before formatting the
path string so characters like '/', '?', '#', or spaces cannot break the URL.
Locate the places building path like
format!("/api/v2/sessions/{session_id}/...") and replace the interpolated
session_id with an encoded version (use a path-segment percent-encoding helper
such as percent_encode or an equivalent utility) and ensure both occurrences
(the search path and the other session-related path) use the same
encoded_session_id variable before calling self.url(&path).
- Around line 234-239: The client currently trusts the backend and collects all
filtered entries from payload.results; enforce the recall cap locally by
limiting the iterator to recallLimit (or recall_limit) before collecting. Update
the chain that maps payload.results -> filter_map(|entry|
entry.fact.or(entry.content)) -> filter(|text| !text.trim().is_empty()) to
insert .take(recallLimit) (or .take(recall_limit) using the local variable name)
immediately before .collect() so the client-side result vector never exceeds the
configured recall limit while preserving order.

In `@rust/crates/runtime/src/conversation.rs`:
- Around line 548-556: The recalled snippets retrieved via
client.recall(&session_id, user_input, self.memory_recall_limit) are treated as
untrusted user/assistant content and must not be appended to system-level
prompts; change the code that builds and pushes the memory block (the block
currently built and pushed via prompt.push(...)) so it is added to the
conversation as a user-level or explicitly "untrusted retrieval" segment instead
of system_prompt—e.g., prepend a clear label like "Retrieved memories
(untrusted):" and push it into the regular user/assistant message stream or a
separate non-system input, and optionally sanitize/escape content to prevent
injection; update references where prompt.push is used to ensure system-level
instructions are never overridden by recalled snippets.
- Around line 576-585: The loop currently pushes every assistant text from
assistant_messages (using flatten_assistant_text) which causes transient tool
preambles to be stored; change it to only extract and push the final non-empty
assistant reply for the turn—e.g., find the last message in assistant_messages
for which flatten_assistant_text returns a non-empty String and call
MemoryMessage::new(MemoryRole::Assistant, text) once (instead of pushing each),
preserving the existing payload length guard logic; update the logic around
assistant_messages, flatten_assistant_text, and the payload push so only the
final assistant reply is persisted.

In `@rust/crates/rusty-claude-cli/src/main.rs`:
- Around line 6357-6360: The code currently creates a single "writer" from
output_path (Some -> File, None -> stdout) which causes JSON mode to emit JSONL
turns to stdout and then append a pretty JSON summary to the same stdout
(breaking JSONL/JSON consumers); change the logic so that when output_format ==
OutputFormat::Json and output_path is None you still write turn records to
stdout but emit the summary to stderr (use io::stderr() or eprintln! for the
summary), and keep the existing behavior when an explicit output_path is
provided; update both the writer creation site (where "let mut writer: Box<dyn
Write>" is set) and the summary-printing block (the code around the other
section referenced at 6414-6429) so the summary uses stderr while records use
the existing "writer".
- Around line 1330-1394: The parse_eval_args function treats --help as an
unknown option; update parse_eval_args to detect "--help" and "-h" (in the main
match loop) and short-circuit by returning Err("help".to_string()) (or the same
sentinel the rest of the CLI uses to trigger printing help) instead of treating
them as unknown flags, so claw eval --help flows into the existing help path;
then add a regression test that runs the eval subcommand with --help and asserts
the help output is produced (refer to parse_eval_args and the CliAction::Eval
return path when adding the test).

---

Nitpick comments:
In `@deploy/router/README.md`:
- Around line 9-11: Update the fenced code blocks in the README so they include
a language specifier (use `text` or `plaintext`) to improve consistency and
accessibility; specifically add the specifier to the diagram block containing
"claw --> gptcache:8100 --> routellm:6060 --> Anthropic / OpenAI / ...", the
diagram "claw (picks model from scoreboard) --> Anthropic / OpenAI / ...", and
the log example blocks beginning with "[router] bucket=short
selected=claude-haiku-4-5 reason=exploit" and "[router] user_model=opus
routed_to=claude-haiku-4-5" so each triple-backtick fence starts with ```text
(or ```plaintext) instead of just ``` .
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 62d949e6-6b50-4247-863a-6435caf08cb7

📥 Commits

Reviewing files that changed from the base of the PR and between 50e3fa3 and 379f165.

⛔ Files ignored due to path filters (1)
  • rust/Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (18)
  • deploy/router/README.md
  • deploy/router/docker-compose.yml
  • rust/crates/api/src/providers/mod.rs
  • rust/crates/api/src/providers/openai_compat.rs
  • rust/crates/eval-router/Cargo.toml
  • rust/crates/eval-router/src/lib.rs
  • rust/crates/memory-client/Cargo.toml
  • rust/crates/memory-client/src/lib.rs
  • rust/crates/memory-client/tests/zep_http.rs
  • rust/crates/runtime/Cargo.toml
  • rust/crates/runtime/src/bash.rs
  • rust/crates/runtime/src/config.rs
  • rust/crates/runtime/src/config_validate.rs
  • rust/crates/runtime/src/conversation.rs
  • rust/crates/runtime/src/lane_events.rs
  • rust/crates/runtime/src/lib.rs
  • rust/crates/rusty-claude-cli/Cargo.toml
  • rust/crates/rusty-claude-cli/src/main.rs

Comment on lines +15 to +16
# To run gptcache alone (cache only, no RouteLLM):
# docker compose -f deploy/router/docker-compose.yml up -d gptcache-direct
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

The documented gptcache-direct path does not exist.

Line 16 tells users to start gptcache-direct, but this compose file never defines that service, and the only GPTCache service still depends on routellm. The advertised cache-only command will fail as written.

Also applies to: 43-53

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deploy/router/docker-compose.yml` around lines 15 - 16, The docs reference a
non-existent docker service `gptcache-direct`; either add a cache-only service
or update the instructions to the actual service name. Create a new service
definition (e.g., `gptcache-direct`) in the compose file mirroring the existing
`gptcache` service but remove the `depends_on: - routellm` and any
RouteLLM-specific environment or network ties so it can run standalone, and
ensure volumes/ports match the current `gptcache` setup; alternatively, change
the README command to use the real service name and remove the "cache only"
claim. Also apply the same fix to the duplicated doc text around the second
occurrence (lines ~43-53) so both instructions point to a valid service or
describe how to run the cache-only service.

Comment on lines +155 to +160
#[derive(Debug, Clone, PartialEq, Default, Serialize, Deserialize)]
struct ScoreboardState {
#[serde(default = "default_version")]
version: u32,
#[serde(default)]
buckets: BTreeMap<PromptBucket, BTreeMap<String, ModelStats>>,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

New scoreboards are initialized with version = 0.

ScoreboardState::default() comes from derive(Default), so the missing-file and empty-file paths create a scoreboard with the zero value for version, not SCOREBOARD_VERSION. The first save will therefore write stale schema metadata.

Proposed fix
-#[derive(Debug, Clone, PartialEq, Default, Serialize, Deserialize)]
+#[derive(Debug, Clone, PartialEq, Serialize, Deserialize)]
 struct ScoreboardState {
     #[serde(default = "default_version")]
     version: u32,
     #[serde(default)]
     buckets: BTreeMap<PromptBucket, BTreeMap<String, ModelStats>>,
 }
 
 const fn default_version() -> u32 {
     SCOREBOARD_VERSION
 }
+
+impl Default for ScoreboardState {
+    fn default() -> Self {
+        Self {
+            version: SCOREBOARD_VERSION,
+            buckets: BTreeMap::new(),
+        }
+    }
+}

Also applies to: 292-300

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rust/crates/eval-router/src/lib.rs` around lines 155 - 160, ScoreboardState
currently derives Default which initializes version to 0 (not
SCOREBOARD_VERSION); replace the auto-derived Default with an explicit impl
Default for ScoreboardState that returns ScoreboardState { version:
default_version(), buckets: Default::default() } (or otherwise set version to
SCOREBOARD_VERSION via default_version) so new/empty scoreboards start with the
correct schema version; apply the same fix to the analogous struct/impl at the
other occurrence referenced (lines ~292-300) so both code paths use
default_version()/SCOREBOARD_VERSION instead of zero.

Comment on lines +215 to +221
let path = format!("/api/v2/sessions/{session_id}/search");
let body = SearchRequest {
text: query,
limit,
user_id: &self.config.user_id,
};
let request = self.apply_auth(self.http.post(self.url(&path)).json(&body));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Percent-encode session_id before putting it in the path.

Line 215 and Line 246 interpolate the raw session id into the URL. If an id ever contains /, ?, #, or spaces, these requests hit the wrong endpoint or split one logical session across multiple paths. Encode it as a path segment before formatting the route.

Also applies to: 246-251

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rust/crates/memory-client/src/lib.rs` around lines 215 - 221, The code
interpolates raw session_id into the route (e.g., the path in SearchRequest
creation and the other block around lines 246–251); percent-encode the
session_id as a path segment before formatting the path string so characters
like '/', '?', '#', or spaces cannot break the URL. Locate the places building
path like format!("/api/v2/sessions/{session_id}/...") and replace the
interpolated session_id with an encoded version (use a path-segment
percent-encoding helper such as percent_encode or an equivalent utility) and
ensure both occurrences (the search path and the other session-related path) use
the same encoded_session_id variable before calling self.url(&path).

Comment on lines +234 to +239
Ok(payload
.results
.into_iter()
.filter_map(|entry| entry.fact.or(entry.content))
.filter(|text| !text.trim().is_empty())
.collect())
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Clamp recall results locally, not just in the request body.

This trusts the backend to honor limit. If Zep over-returns, the runtime will inject more memories than recallLimit allows and can bloat the system prompt. Enforce the cap client-side too.

Proposed fix
         Ok(payload
             .results
             .into_iter()
             .filter_map(|entry| entry.fact.or(entry.content))
             .filter(|text| !text.trim().is_empty())
+            .take(limit)
             .collect())
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
Ok(payload
.results
.into_iter()
.filter_map(|entry| entry.fact.or(entry.content))
.filter(|text| !text.trim().is_empty())
.collect())
Ok(payload
.results
.into_iter()
.filter_map(|entry| entry.fact.or(entry.content))
.filter(|text| !text.trim().is_empty())
.take(limit)
.collect())
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rust/crates/memory-client/src/lib.rs` around lines 234 - 239, The client
currently trusts the backend and collects all filtered entries from
payload.results; enforce the recall cap locally by limiting the iterator to
recallLimit (or recall_limit) before collecting. Update the chain that maps
payload.results -> filter_map(|entry| entry.fact.or(entry.content)) ->
filter(|text| !text.trim().is_empty()) to insert .take(recallLimit) (or
.take(recall_limit) using the local variable name) immediately before .collect()
so the client-side result vector never exceeds the configured recall limit while
preserving order.

Comment thread rust/crates/runtime/src/conversation.rs
Comment on lines +576 to +585
for message in assistant_messages {
if let Some(text) = flatten_assistant_text(message) {
if !text.is_empty() {
payload.push(MemoryMessage::new(MemoryRole::Assistant, text));
}
}
}
if payload.len() <= 1 {
return;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Only ingest the final assistant reply for the turn.

This loop stores every assistant text message from the tool iteration chain. On tool turns, that will persist transient preambles like “Let me calculate that” alongside the real answer, which pollutes long-term memory and diverges from the PR objective of forwarding the assistant reply after the turn.

Suggested fix
-        for message in assistant_messages {
-            if let Some(text) = flatten_assistant_text(message) {
-                if !text.is_empty() {
-                    payload.push(MemoryMessage::new(MemoryRole::Assistant, text));
-                }
-            }
-        }
+        if let Some(text) = assistant_messages
+            .iter()
+            .rev()
+            .find_map(flatten_assistant_text)
+            .filter(|text| !text.trim().is_empty())
+        {
+            payload.push(MemoryMessage::new(MemoryRole::Assistant, text));
+        }

A regression test with a two-iteration tool turn would lock this down.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
for message in assistant_messages {
if let Some(text) = flatten_assistant_text(message) {
if !text.is_empty() {
payload.push(MemoryMessage::new(MemoryRole::Assistant, text));
}
}
}
if payload.len() <= 1 {
return;
}
if let Some(text) = assistant_messages
.iter()
.rev()
.find_map(flatten_assistant_text)
.filter(|text| !text.trim().is_empty())
{
payload.push(MemoryMessage::new(MemoryRole::Assistant, text));
}
if payload.len() <= 1 {
return;
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rust/crates/runtime/src/conversation.rs` around lines 576 - 585, The loop
currently pushes every assistant text from assistant_messages (using
flatten_assistant_text) which causes transient tool preambles to be stored;
change it to only extract and push the final non-empty assistant reply for the
turn—e.g., find the last message in assistant_messages for which
flatten_assistant_text returns a non-empty String and call
MemoryMessage::new(MemoryRole::Assistant, text) once (instead of pushing each),
preserving the existing payload length guard logic; update the logic around
assistant_messages, flatten_assistant_text, and the payload push so only the
final assistant reply is persisted.

Comment on lines +1330 to +1394
fn parse_eval_args(args: &[String], output_format: CliOutputFormat) -> Result<CliAction, String> {
let mut session_reference = LATEST_SESSION_REFERENCE.to_string();
let mut output_path: Option<PathBuf> = None;
let mut max_turns: Option<usize> = None;
let mut index = 0;

while index < args.len() {
match args[index].as_str() {
"--session" => {
let value = args
.get(index + 1)
.ok_or_else(|| "missing value for --session".to_string())?;
session_reference.clone_from(value);
index += 2;
}
flag if flag.starts_with("--session=") => {
session_reference = flag[10..].to_string();
index += 1;
}
"--output" | "-o" => {
let value = args
.get(index + 1)
.ok_or_else(|| format!("missing value for {}", args[index]))?;
output_path = Some(PathBuf::from(value));
index += 2;
}
flag if flag.starts_with("--output=") => {
output_path = Some(PathBuf::from(&flag[9..]));
index += 1;
}
"--max-turns" => {
let value = args
.get(index + 1)
.ok_or_else(|| "missing value for --max-turns".to_string())?;
max_turns = Some(
value
.parse::<usize>()
.map_err(|_| format!("invalid value for --max-turns: {value}"))?,
);
index += 2;
}
flag if flag.starts_with("--max-turns=") => {
let raw = &flag[12..];
max_turns = Some(
raw.parse::<usize>()
.map_err(|_| format!("invalid value for --max-turns: {raw}"))?,
);
index += 1;
}
other if other.starts_with('-') => {
return Err(format!("unknown eval option: {other}"));
}
other => {
return Err(format!("unexpected eval argument: {other}"));
}
}
}

Ok(CliAction::Eval {
session_reference,
output_path,
output_format,
max_turns,
})
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Preserve claw eval --help.

The new subcommand currently treats --help as an unknown eval option, so claw eval --help errors instead of showing help like the other top-level commands. Please wire eval into the existing help path and add a regression test for it.

Suggested minimal fix
 fn parse_eval_args(args: &[String], output_format: CliOutputFormat) -> Result<CliAction, String> {
     let mut session_reference = LATEST_SESSION_REFERENCE.to_string();
     let mut output_path: Option<PathBuf> = None;
     let mut max_turns: Option<usize> = None;
     let mut index = 0;

     while index < args.len() {
         match args[index].as_str() {
+            "--help" | "-h" => {
+                return Ok(CliAction::Help { output_format });
+            }
             "--session" => {
                 let value = args
                     .get(index + 1)
                     .ok_or_else(|| "missing value for --session".to_string())?;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
fn parse_eval_args(args: &[String], output_format: CliOutputFormat) -> Result<CliAction, String> {
let mut session_reference = LATEST_SESSION_REFERENCE.to_string();
let mut output_path: Option<PathBuf> = None;
let mut max_turns: Option<usize> = None;
let mut index = 0;
while index < args.len() {
match args[index].as_str() {
"--session" => {
let value = args
.get(index + 1)
.ok_or_else(|| "missing value for --session".to_string())?;
session_reference.clone_from(value);
index += 2;
}
flag if flag.starts_with("--session=") => {
session_reference = flag[10..].to_string();
index += 1;
}
"--output" | "-o" => {
let value = args
.get(index + 1)
.ok_or_else(|| format!("missing value for {}", args[index]))?;
output_path = Some(PathBuf::from(value));
index += 2;
}
flag if flag.starts_with("--output=") => {
output_path = Some(PathBuf::from(&flag[9..]));
index += 1;
}
"--max-turns" => {
let value = args
.get(index + 1)
.ok_or_else(|| "missing value for --max-turns".to_string())?;
max_turns = Some(
value
.parse::<usize>()
.map_err(|_| format!("invalid value for --max-turns: {value}"))?,
);
index += 2;
}
flag if flag.starts_with("--max-turns=") => {
let raw = &flag[12..];
max_turns = Some(
raw.parse::<usize>()
.map_err(|_| format!("invalid value for --max-turns: {raw}"))?,
);
index += 1;
}
other if other.starts_with('-') => {
return Err(format!("unknown eval option: {other}"));
}
other => {
return Err(format!("unexpected eval argument: {other}"));
}
}
}
Ok(CliAction::Eval {
session_reference,
output_path,
output_format,
max_turns,
})
}
fn parse_eval_args(args: &[String], output_format: CliOutputFormat) -> Result<CliAction, String> {
let mut session_reference = LATEST_SESSION_REFERENCE.to_string();
let mut output_path: Option<PathBuf> = None;
let mut max_turns: Option<usize> = None;
let mut index = 0;
while index < args.len() {
match args[index].as_str() {
"--help" | "-h" => {
return Ok(CliAction::Help { output_format });
}
"--session" => {
let value = args
.get(index + 1)
.ok_or_else(|| "missing value for --session".to_string())?;
session_reference.clone_from(value);
index += 2;
}
flag if flag.starts_with("--session=") => {
session_reference = flag[10..].to_string();
index += 1;
}
"--output" | "-o" => {
let value = args
.get(index + 1)
.ok_or_else(|| format!("missing value for {}", args[index]))?;
output_path = Some(PathBuf::from(value));
index += 2;
}
flag if flag.starts_with("--output=") => {
output_path = Some(PathBuf::from(&flag[9..]));
index += 1;
}
"--max-turns" => {
let value = args
.get(index + 1)
.ok_or_else(|| "missing value for --max-turns".to_string())?;
max_turns = Some(
value
.parse::<usize>()
.map_err(|_| format!("invalid value for --max-turns: {value}"))?,
);
index += 2;
}
flag if flag.starts_with("--max-turns=") => {
let raw = &flag[12..];
max_turns = Some(
raw.parse::<usize>()
.map_err(|_| format!("invalid value for --max-turns: {raw}"))?,
);
index += 1;
}
other if other.starts_with('-') => {
return Err(format!("unknown eval option: {other}"));
}
other => {
return Err(format!("unexpected eval argument: {other}"));
}
}
}
Ok(CliAction::Eval {
session_reference,
output_path,
output_format,
max_turns,
})
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rust/crates/rusty-claude-cli/src/main.rs` around lines 1330 - 1394, The
parse_eval_args function treats --help as an unknown option; update
parse_eval_args to detect "--help" and "-h" (in the main match loop) and
short-circuit by returning Err("help".to_string()) (or the same sentinel the
rest of the CLI uses to trigger printing help) instead of treating them as
unknown flags, so claw eval --help flows into the existing help path; then add a
regression test that runs the eval subcommand with --help and asserts the help
output is produced (refer to parse_eval_args and the CliAction::Eval return path
when adding the test).

Comment on lines +6357 to +6360
let mut writer: Box<dyn Write> = match output_path {
Some(path) => Box::new(fs::File::create(path)?),
None => Box::new(io::stdout()),
};
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Keep eval stdout single-format in JSON mode.

When --output-format json is used without --output, this command writes JSONL turn records to stdout and then appends a pretty JSON summary to stdout as well. That makes the output neither valid JSONL nor a single JSON document, which will break downstream consumers.

Suggested minimal fix
-    if matches!(output_format, CliOutputFormat::Json) {
+    if matches!(output_format, CliOutputFormat::Json) && output_path.is_some() {
         println!(
             "{}",
             serde_json::to_string_pretty(&json!({
                 "kind": "eval",
                 "session_id": handle.id,

Also applies to: 6414-6429

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rust/crates/rusty-claude-cli/src/main.rs` around lines 6357 - 6360, The code
currently creates a single "writer" from output_path (Some -> File, None ->
stdout) which causes JSON mode to emit JSONL turns to stdout and then append a
pretty JSON summary to the same stdout (breaking JSONL/JSON consumers); change
the logic so that when output_format == OutputFormat::Json and output_path is
None you still write turn records to stdout but emit the summary to stderr (use
io::stderr() or eprintln! for the summary), and keep the existing behavior when
an explicit output_path is provided; update both the writer creation site (where
"let mut writer: Box<dyn Write>" is set) and the summary-printing block (the
code around the other section referenced at 6414-6429) so the summary uses
stderr while records use the existing "writer".

…PTCache sidecars)

Collapses the three external sidecars that were backing memory, router,
and cache into one in-process crate. Same conceptual surface area —
different, simpler implementation.

What replaces what:

| Sidecar (before)         | Now                                    |
|--------------------------|----------------------------------------|
| Zep (memory)             | claw-state/memory: SQLite + FTS5       |
| RouteLLM (router)        | claw-state/scoreboard: SQLite rows     |
| GPTCache (cache)         | claw-state/cache: SQLite, exact-match  |

Honest gap from the sidecars (to be closed in the follow-up commit):
- Memory: no LLM-based fact extraction yet; messages are stored verbatim
  and recalled by keyword.
- Memory + cache: no semantic (embedding) recall yet — exact / keyword
  only.
- Router: simple adaptive success-rate scoreboard instead of RouteLLM's
  pre-trained classifiers. Learns from your traffic; needs warm-up.

Follow-up commit adds a candle-backed EmbeddingProvider so memory
recall and cache both become paraphrase-aware.

New crate `claw-state`:
- `StateStore::open(path)` opens one SQLite connection behind a Mutex;
  scoreboard / cache / memory are scoped handles that share it.
- Atomic WAL-mode writes; schema versioning guards future migrations.
- 27 unit tests covering scoreboard selection (warm-start, explore,
  exploit, tie-breaking), exact cache TTL semantics, memory FTS5 recall
  with scope filters, RNG determinism, schema-version mismatch guard.

Config: `router` and `memory` blocks merged into a single `state` block:

    {
      "state": {
        "databasePath": "~/.local/share/claw/state.sqlite",
        "router": { "enabled": true, "candidates": [...] },
        "cache": { "enabled": true, "ttlSeconds": 3600 },
        "memory": { "enabled": true, "recallLimit": 7 }
      }
    }

Deletes:
- crates/eval-router (JSON scoreboard, replaced by SQLite)
- crates/memory-client (Zep REST client, replaced by FTS5 recall)
- deploy/router (docker-compose for GPTCache + RouteLLM — no longer needed)

Runtime now owns the MemoryProvider trait (moved from memory-client).
AnthropicRuntimeClient takes an optional EvalRouterState and
RequestCache, both thin Arc<StateStore> adapters — cache wiring in
stream() lands with the embedding provider in the follow-up.

`claw eval` rewritten to drive the in-process router: each session turn
is replayed through `ScoreboardHandle::select`, the chosen model is
called via the normal `ProviderClient`, and the outcome (success,
latency, tokens) is recorded back to the scoreboard. Useful for seeding
a fresh scoreboard from session history.
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

♻️ Duplicate comments (2)
rust/crates/rusty-claude-cli/src/main.rs (2)

1338-1383: ⚠️ Potential issue | 🟡 Minor

claw eval --help still falls through to unknown eval option.

parse_eval_args() still rejects --help/-h, so claw eval --help errors instead of showing help. Please short-circuit help here and add the missing regression test.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rust/crates/rusty-claude-cli/src/main.rs` around lines 1338 - 1383, The
argument parser in parse_eval_args (the while loop handling flags) does not
handle --help or -h, so passing "claw eval --help" falls through to the
unknown-flag branch; update parse_eval_args to detect "--help" and "-h" (before
the other flag checks/unknown branch) and short-circuit by returning the
appropriate help signal (e.g., a ShowHelp variant or an Err/Ok that the caller
uses to print help) and add a regression test (e.g., test_eval_help_shows_usage)
that invokes parse_eval_args or runs the eval subcommand with "--help"/"-h" and
asserts it returns the help result rather than an error.

6363-6366: ⚠️ Potential issue | 🟠 Major

--output-format json still mixes JSONL records and a pretty JSON summary on stdout.

When --output is omitted, turn records go to stdout through writer, and then the final summary object is also printed to stdout. That leaves stdout as neither valid JSONL nor a single JSON document.

Also applies to: 6466-6481

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rust/crates/rusty-claude-cli/src/main.rs` around lines 6363 - 6366, The final
summary object is being printed to stdout even when turn records are streamed to
stdout via the local "writer" (created from output_path), causing mixed JSONL +
pretty JSON; change the summary emission so it goes to the same destination as
the records: when output_path is Some(...) write the serialized summary to the
same writer (Box<dyn Write> named "writer"), otherwise emit the summary to
stderr (io::stderr()) instead of stdout. Locate the summary-printing code near
the block that creates "writer" (and in the code around lines referenced
6466-6481) and replace the direct stdout print with conditional logic that
writes to "writer" for Some(path) and to stderr for the None case.
🧹 Nitpick comments (4)
rust/crates/runtime/src/config_validate.rs (1)

568-603: Add unit coverage for the new state.* validation branches.

The new nested validation paths are not exercised in tests yet. Please add cases for unknown keys and wrong types under state, state.router, state.cache, and state.memory to prevent silent regressions.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rust/crates/runtime/src/config_validate.rs` around lines 568 - 603, The new
nested validation branches (the block that checks object.get("state") and its
nested router/cache/memory and calls validate_object_keys with STATE_FIELDS,
STATE_ROUTER_FIELDS, STATE_CACHE_FIELDS and STATE_MEMORY_FIELDS) lack unit test
coverage; add unit tests for validate_object_keys behavior that exercise each
branch: provide a base config with unknown keys under "state" and assert the
validator (result.merge path) reports unknown-key errors, then add separate
tests with invalid types for values under "state", "state.router",
"state.cache", and "state.memory" to assert type validation errors are produced;
use the same helper/test harness used elsewhere for config validation so tests
call the same validation entrypoint and check errors reference the correct path
strings ("state", "state.router", "state.cache", "state.memory") and expected
error messages.
rust/crates/claw-state/src/store.rs (1)

53-58: FTS5 content table requires manual sync triggers.

The FTS5 virtual table is declared with content='memory_facts', making it a "content-less" FTS index that references the base table. However, SQLite doesn't automatically keep this in sync—inserts/updates/deletes to memory_facts won't propagate to memory_facts_fts without explicit triggers or manual INSERT/DELETE statements in the FTS table.

Looking at memory.rs, the insert and delete methods do manually update both tables, which is correct. This is worth documenting in the schema comment to prevent future maintenance issues.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rust/crates/claw-state/src/store.rs` around lines 53 - 58, The FTS5 virtual
table declaration (memory_facts_fts) uses content='memory_facts' which requires
manual synchronization; update the schema comment near the CREATE VIRTUAL TABLE
statement to note that SQLite will not auto-sync memory_facts ->
memory_facts_fts and that inserts/updates/deletes must be mirrored (as done in
memory.rs insert and delete methods), mentioning the exact symbols
memory_facts_fts, memory_facts and the memory.rs insert/delete functions so
future maintainers know to keep them in sync or add triggers if behavior
changes.
rust/crates/claw-state/src/scoreboard.rs (1)

229-287: Minor: stats are fetched twice for candidates in exploit path.

In select, stats() is called during warm-start check (line 246) and again during exploit scoring (line 269). For small candidate lists this is negligible, but caching could reduce DB round-trips.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rust/crates/claw-state/src/scoreboard.rs` around lines 229 - 287, The select
method currently calls self.stats(...) twice for candidates; fix this by
fetching each candidate's ModelStats once and reusing it for both the warm-start
check and the exploit scoring: iterate candidates and build a Vec<(String,
ModelStats)> or HashMap<String, ModelStats> by calling self.stats(bucket,
candidate)? once (preserving the Result error propagation), then use that
collection to perform the min_samples warm-start check and later to sort/score
(replace the current separate scored population). Update references to scored
and the warm-start logic to use the precomputed stats so DB round-trips are
eliminated.
rust/crates/claw-state/src/memory.rs (1)

38-52: Silent fallback to Global on unknown scope prefix could mask corruption.

MemoryScope::decode silently returns Global for any unrecognized prefix. If corrupted data enters the database, this would be hard to diagnose. Consider returning an Option or at least logging a warning.

📝 Alternative: Return Option for explicit handling
-    fn decode(raw: &str) -> Self {
+    fn decode(raw: &str) -> Option<Self> {
         if raw == "global" {
-            Self::Global
+            Some(Self::Global)
         } else if let Some(rest) = raw.strip_prefix("workspace:") {
-            Self::Workspace {
+            Some(Self::Workspace {
                 hash: rest.to_string(),
-            }
+            })
         } else if let Some(rest) = raw.strip_prefix("session:") {
-            Self::Session {
+            Some(Self::Session {
                 id: rest.to_string(),
-            }
+            })
         } else {
-            Self::Global
+            None
         }
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rust/crates/claw-state/src/memory.rs` around lines 38 - 52,
MemoryScope::decode currently silently maps any unrecognized string to
MemoryScope::Global which can hide data corruption; change the signature to
return Option<MemoryScope> (fn decode(raw: &str) -> Option<Self>) and return
Some(MemoryScope::Global) only for the explicit "global" match,
Some(Workspace{...}) for "workspace:" prefix, Some(Session{...}) for "session:"
prefix, and return None for any unknown prefix; then update all call sites of
MemoryScope::decode to handle the None case (propagate an error, log a warning,
or skip/migrate the corrupted record) so unknown/corrupt values are surfaced
instead of silently becoming Global.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@rust/crates/claw-state/src/cache.rs`:
- Around line 3-4: The module-level doc comment claims the cache key is
sha256(system_prompt + messages_json + model) but the implementation in
from_parts builds the hash as model, then system_prompt, then messages_json;
update either the doc comment or the from_parts function to make their ordering
consistent—preferably modify the doc comment to state sha256(model +
system_prompt + messages_json) (referencing from_parts) or change from_parts to
hash in the documented order (referencing from_parts) so the comment and
implementation match.
- Line 96: The project uses Option::is_none_or (seen in claw-state::cache.rs
row.filter(...).is_none_or), which requires Rust 1.82.0; add rust-version =
"1.82.0" to the [workspace.package] section of rust/Cargo.toml to declare the
MSRV so builds and tooling know the minimum supported Rust version.

In `@rust/crates/runtime/src/config_validate.rs`:
- Around line 295-298: FieldSpec declares "epsilonPercent" with
FieldType::Number but the current number matcher only accepts integers, so
update the validation logic that checks FieldType::Number to accept
decimal/floating-point values (e.g., parse/validate as f64 or allow
serde_json::Number with is_f64) instead of integer-only checks; locate the
number-matching function used by config validation (the matcher invoked for
FieldType::Number in config_validate.rs) and broaden its condition to treat JSON
numbers with a fractional component as valid so values like 2.5 pass for
epsilonPercent.

In `@rust/crates/rusty-claude-cli/src/main.rs`:
- Around line 7117-7129: default_state_db_path currently only checks
XDG_DATA_HOME and HOME which fails on Windows; update the function
(default_state_db_path) to use a platform-aware data directory provider (e.g.,
the dirs or dirs_next crate) or check Windows-specific env vars (LOCALAPPDATA /
APPDATA) so you return a PathBuf under the OS-appropriate data directory (append
"claw/state.sqlite") instead of falling back to None; ensure the new logic
preserves existing XDG/HOME behavior but prefers dirs::data_local_dir() (or
equivalent) on Windows and as a general cross-platform fallback.
- Around line 373-382: The top-level CLI help doesn't mention the new Eval
subcommand (the Eval enum variant), so update the global help/usage text to
advertise "claw eval" and show a short usage line and example; locate where the
main clap command or help strings are defined (the top-level
help/about/long_about or usage in the Command builder) and add a brief usage
entry and one example invocation referencing "claw eval --session <SESSION>
--output <FILE>" (and any relevant flags like --max-turns) so the `claw --help`
output reflects the new Eval subcommand.
- Around line 6372-6394: The router selection (bundle.store.scoreboard().select)
and provider construction (ApiProviderClient::from_model) error branches
prematurely use continue and thus skip emitting the per-turn JSONL record and
calling summary.record_failure(); instead, on those Err paths call
summary.record_failure() and emit the same failure JSONL record that successful
turns produce (i.e., reuse the code/path that writes the turn record to the
output stream) before continuing, so the JSONL stream and footer counts reflect
the failed turn; update the Err arms for select and from_model to invoke
summary.record_failure() and the existing record-write routine rather than just
continuing.
- Around line 7500-7501: state.cache (RequestCache) is created but never used:
update the streaming request path to consult and populate the cache—inside the
stream() function (and the similar paths around the eval_router usage), check
state.cache for a cached response keyed by the incoming request before calling
the provider; if a hit, return the cached response immediately; if a miss, call
the provider as today then store the resulting response into state.cache (using
the RequestCache get/lookup and put/insert/save APIs your crate exposes) so
subsequent calls hit the cache. Ensure you handle
Option<EvalRouterState>/state.cache being None and preserve existing
error/stream semantics when reading from or writing to the cache.
- Around line 12600-12668: The three tests
prompt_bucket_picks_latest_user_message_length,
prompt_bucket_defaults_to_short_when_no_user_messages, and
prompt_bucket_counts_chars_across_multiple_text_blocks import PromptBucket from
the wrong crate (`eval_router`); change those imports to use
claw_state::PromptBucket so they match the production usage (update the use
statements in each test that currently read `use eval_router::PromptBucket;` to
`use claw_state::PromptBucket;`).

---

Duplicate comments:
In `@rust/crates/rusty-claude-cli/src/main.rs`:
- Around line 1338-1383: The argument parser in parse_eval_args (the while loop
handling flags) does not handle --help or -h, so passing "claw eval --help"
falls through to the unknown-flag branch; update parse_eval_args to detect
"--help" and "-h" (before the other flag checks/unknown branch) and
short-circuit by returning the appropriate help signal (e.g., a ShowHelp variant
or an Err/Ok that the caller uses to print help) and add a regression test
(e.g., test_eval_help_shows_usage) that invokes parse_eval_args or runs the eval
subcommand with "--help"/"-h" and asserts it returns the help result rather than
an error.
- Around line 6363-6366: The final summary object is being printed to stdout
even when turn records are streamed to stdout via the local "writer" (created
from output_path), causing mixed JSONL + pretty JSON; change the summary
emission so it goes to the same destination as the records: when output_path is
Some(...) write the serialized summary to the same writer (Box<dyn Write> named
"writer"), otherwise emit the summary to stderr (io::stderr()) instead of
stdout. Locate the summary-printing code near the block that creates "writer"
(and in the code around lines referenced 6466-6481) and replace the direct
stdout print with conditional logic that writes to "writer" for Some(path) and
to stderr for the None case.

---

Nitpick comments:
In `@rust/crates/claw-state/src/memory.rs`:
- Around line 38-52: MemoryScope::decode currently silently maps any
unrecognized string to MemoryScope::Global which can hide data corruption;
change the signature to return Option<MemoryScope> (fn decode(raw: &str) ->
Option<Self>) and return Some(MemoryScope::Global) only for the explicit
"global" match, Some(Workspace{...}) for "workspace:" prefix, Some(Session{...})
for "session:" prefix, and return None for any unknown prefix; then update all
call sites of MemoryScope::decode to handle the None case (propagate an error,
log a warning, or skip/migrate the corrupted record) so unknown/corrupt values
are surfaced instead of silently becoming Global.

In `@rust/crates/claw-state/src/scoreboard.rs`:
- Around line 229-287: The select method currently calls self.stats(...) twice
for candidates; fix this by fetching each candidate's ModelStats once and
reusing it for both the warm-start check and the exploit scoring: iterate
candidates and build a Vec<(String, ModelStats)> or HashMap<String, ModelStats>
by calling self.stats(bucket, candidate)? once (preserving the Result error
propagation), then use that collection to perform the min_samples warm-start
check and later to sort/score (replace the current separate scored population).
Update references to scored and the warm-start logic to use the precomputed
stats so DB round-trips are eliminated.

In `@rust/crates/claw-state/src/store.rs`:
- Around line 53-58: The FTS5 virtual table declaration (memory_facts_fts) uses
content='memory_facts' which requires manual synchronization; update the schema
comment near the CREATE VIRTUAL TABLE statement to note that SQLite will not
auto-sync memory_facts -> memory_facts_fts and that inserts/updates/deletes must
be mirrored (as done in memory.rs insert and delete methods), mentioning the
exact symbols memory_facts_fts, memory_facts and the memory.rs insert/delete
functions so future maintainers know to keep them in sync or add triggers if
behavior changes.

In `@rust/crates/runtime/src/config_validate.rs`:
- Around line 568-603: The new nested validation branches (the block that checks
object.get("state") and its nested router/cache/memory and calls
validate_object_keys with STATE_FIELDS, STATE_ROUTER_FIELDS, STATE_CACHE_FIELDS
and STATE_MEMORY_FIELDS) lack unit test coverage; add unit tests for
validate_object_keys behavior that exercise each branch: provide a base config
with unknown keys under "state" and assert the validator (result.merge path)
reports unknown-key errors, then add separate tests with invalid types for
values under "state", "state.router", "state.cache", and "state.memory" to
assert type validation errors are produced; use the same helper/test harness
used elsewhere for config validation so tests call the same validation
entrypoint and check errors reference the correct path strings ("state",
"state.router", "state.cache", "state.memory") and expected error messages.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 96fecc97-d1cb-4324-8841-beba21a5f826

📥 Commits

Reviewing files that changed from the base of the PR and between 379f165 and 2d438a8.

⛔ Files ignored due to path filters (1)
  • rust/Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (15)
  • rust/crates/claw-state/Cargo.toml
  • rust/crates/claw-state/src/cache.rs
  • rust/crates/claw-state/src/embedding.rs
  • rust/crates/claw-state/src/error.rs
  • rust/crates/claw-state/src/lib.rs
  • rust/crates/claw-state/src/memory.rs
  • rust/crates/claw-state/src/rng.rs
  • rust/crates/claw-state/src/scoreboard.rs
  • rust/crates/claw-state/src/store.rs
  • rust/crates/runtime/src/config.rs
  • rust/crates/runtime/src/config_validate.rs
  • rust/crates/runtime/src/conversation.rs
  • rust/crates/runtime/src/lib.rs
  • rust/crates/rusty-claude-cli/Cargo.toml
  • rust/crates/rusty-claude-cli/src/main.rs
✅ Files skipped from review due to trivial changes (1)
  • rust/crates/rusty-claude-cli/Cargo.toml
🚧 Files skipped from review as they are similar to previous changes (1)
  • rust/crates/runtime/src/lib.rs

Comment thread rust/crates/claw-state/src/cache.rs Outdated
Comment on lines +3 to +4
//! Cache key is `sha256(system_prompt + messages_json + model)` — any
//! byte-level change in the request misses. That's fine for the common
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Doc comment doesn't match implementation order.

The comment says sha256(system_prompt + messages_json + model) but from_parts at lines 27-34 hashes in order: model, system_prompt, messages_json. While the hash is still unique, the documentation is misleading.

📝 Suggested fix
-//! Cache key is `sha256(system_prompt + messages_json + model)` — any
+//! Cache key is `sha256(model + system_prompt + messages_json)` — any
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
//! Cache key is `sha256(system_prompt + messages_json + model)` — any
//! byte-level change in the request misses. That's fine for the common
//! Cache key is `sha256(model + system_prompt + messages_json)` — any
//! byte-level change in the request misses. That's fine for the common
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rust/crates/claw-state/src/cache.rs` around lines 3 - 4, The module-level doc
comment claims the cache key is sha256(system_prompt + messages_json + model)
but the implementation in from_parts builds the hash as model, then
system_prompt, then messages_json; update either the doc comment or the
from_parts function to make their ordering consistent—preferably modify the doc
comment to state sha256(model + system_prompt + messages_json) (referencing
from_parts) or change from_parts to hash in the documented order (referencing
from_parts) so the comment and implementation match.

Comment thread rust/crates/claw-state/src/cache.rs Outdated
},
)
.optional()?;
Ok(row.filter(|entry| entry.expires_at_ms.is_none_or(|expires| expires > now)))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check for rust-version in Cargo.toml files
fd -t f 'Cargo.toml' --exec grep -l 'rust-version' {} \; | head -5 | xargs -I{} sh -c 'echo "=== {} ===" && grep -E "rust-version|edition" {}'

# Check if there's a rust-toolchain file
fd -t f 'rust-toolchain' -o -t f 'rust-toolchain.toml' | head -1 | xargs -I{} sh -c 'echo "=== {} ===" && cat {}'

Repository: aryehlev/claw-code

Length of output: 178


🏁 Script executed:

#!/bin/bash
# Find all Cargo.toml files and check for rust-version
echo "=== Searching for rust-version in Cargo.toml files ==="
find . -name "Cargo.toml" -type f | while read file; do
  if grep -q "rust-version" "$file"; then
    echo "Found in: $file"
    grep "rust-version" "$file"
  fi
done

echo -e "\n=== Checking for rust-toolchain files ==="
find . -name "rust-toolchain*" -type f | head -5 | while read file; do
  echo "File: $file"
  cat "$file"
done

echo -e "\n=== Checking edition in main Cargo.toml ==="
if [ -f "Cargo.toml" ]; then
  grep -E "edition|rust-version" "Cargo.toml" | head -5
fi

echo -e "\n=== Checking edition in workspace crates ==="
find . -path "*/crates/*/Cargo.toml" -type f | while read file; do
  echo "=== $file ==="
  grep -E "edition|rust-version" "$file" 2>/dev/null || echo "No match"
done | head -20

Repository: aryehlev/claw-code

Length of output: 941


🏁 Script executed:

#!/bin/bash
echo "=== Root Cargo.toml ==="
cat ./Cargo.toml | head -50

echo -e "\n=== Full root Cargo.toml (if more than 50 lines) ==="
wc -l ./Cargo.toml

Repository: aryehlev/claw-code

Length of output: 245


🏁 Script executed:

#!/bin/bash
echo "=== Checking rust/Cargo.toml ==="
if [ -f "rust/Cargo.toml" ]; then
  cat rust/Cargo.toml
else
  echo "Not found"
fi

echo -e "\n=== Directory structure ==="
ls -la | head -20

Repository: aryehlev/claw-code

Length of output: 1723


🌐 Web query:

is_none_or rust stabilized version release

💡 Result:

Yes, Option::is_none_or in Rust was stabilized in version 1.82.0, released on October 17, 2024. It is now available in stable Rust without requiring a feature flag.

Citations:


Declare MSRV explicitly to support Option::is_none_or.

is_none_or requires Rust 1.82.0 or later. The project lacks an explicit MSRV declaration in the workspace Cargo.toml. Add rust-version = "1.82.0" to rust/Cargo.toml under [workspace.package] to document this requirement.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rust/crates/claw-state/src/cache.rs` at line 96, The project uses
Option::is_none_or (seen in claw-state::cache.rs row.filter(...).is_none_or),
which requires Rust 1.82.0; add rust-version = "1.82.0" to the
[workspace.package] section of rust/Cargo.toml to declare the MSRV so builds and
tooling know the minimum supported Rust version.

Comment on lines +295 to +298
FieldSpec {
name: "epsilonPercent",
expected: FieldType::Number,
},
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

epsilonPercent accepts only integers despite being declared as a number.

Line 296 introduces state.router.epsilonPercent as numeric, but the current number matcher is integer-only, so valid decimal configs (for example 2.5) will be rejected as wrong type.

Suggested fix
-            Self::Number => value.as_i64().is_some(),
+            Self::Number => value.as_i64().is_some() || value.as_f64().is_some(),
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rust/crates/runtime/src/config_validate.rs` around lines 295 - 298, FieldSpec
declares "epsilonPercent" with FieldType::Number but the current number matcher
only accepts integers, so update the validation logic that checks
FieldType::Number to accept decimal/floating-point values (e.g., parse/validate
as f64 or allow serde_json::Number with is_f64) instead of integer-only checks;
locate the number-matching function used by config validation (the matcher
invoked for FieldType::Number in config_validate.rs) and broaden its condition
to treat JSON numbers with a fractional component as valid so values like 2.5
pass for epsilonPercent.

Comment on lines +373 to +382
/// Replay a recorded session's user turns through the configured router
/// (`.claw.json` `router` block) and emit one JSONL record per turn
/// capturing which upstream model the router picked, latency, and
/// token usage. Writes to stdout or to `output_path` if provided.
Eval {
session_reference: String,
output_path: Option<PathBuf>,
output_format: CliOutputFormat,
max_turns: Option<usize>,
},
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

claw --help still doesn't advertise the new eval command.

The parser now exposes claw eval, but the top-level help text is still missing a usage line and example for it, so the public CLI surface is stale.

Also applies to: 708-708

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rust/crates/rusty-claude-cli/src/main.rs` around lines 373 - 382, The
top-level CLI help doesn't mention the new Eval subcommand (the Eval enum
variant), so update the global help/usage text to advertise "claw eval" and show
a short usage line and example; locate where the main clap command or help
strings are defined (the top-level help/about/long_about or usage in the Command
builder) and add a brief usage entry and one example invocation referencing
"claw eval --session <SESSION> --output <FILE>" (and any relevant flags like
--max-turns) so the `claw --help` output reflects the new Eval subcommand.

Comment thread rust/crates/rusty-claude-cli/src/main.rs Outdated
Comment on lines +7117 to +7129
fn default_state_db_path() -> Option<PathBuf> {
env::var_os("XDG_DATA_HOME")
.map(|home| PathBuf::from(home).join("claw").join("state.sqlite"))
.or_else(|| {
env::var_os("HOME").map(|home| {
PathBuf::from(home)
.join(".local")
.join("share")
.join("claw")
.join("state.sqlite")
})
})
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

The default state DB path is POSIX-only.

This only checks XDG_DATA_HOME and HOME. On Windows those are commonly absent, so state features fall back to in-memory storage unless state.databasePath is set manually.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rust/crates/rusty-claude-cli/src/main.rs` around lines 7117 - 7129,
default_state_db_path currently only checks XDG_DATA_HOME and HOME which fails
on Windows; update the function (default_state_db_path) to use a platform-aware
data directory provider (e.g., the dirs or dirs_next crate) or check
Windows-specific env vars (LOCALAPPDATA / APPDATA) so you return a PathBuf under
the OS-appropriate data directory (append "claw/state.sqlite") instead of
falling back to None; ensure the new logic preserves existing XDG/HOME behavior
but prefers dirs::data_local_dir() (or equivalent) on Windows and as a general
cross-platform fallback.

Comment on lines +7500 to +7501
eval_router: Option<EvalRouterState>,
cache: Option<RequestCache>,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

state.cache is wired in but never actually used.

RequestCache gets built and stored on the client, but stream() never performs a cache lookup before the provider call and never writes the response back afterward. As written, enabling state.cache is a no-op.

Also applies to: 7520-7521, 7587-7678

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rust/crates/rusty-claude-cli/src/main.rs` around lines 7500 - 7501,
state.cache (RequestCache) is created but never used: update the streaming
request path to consult and populate the cache—inside the stream() function (and
the similar paths around the eval_router usage), check state.cache for a cached
response keyed by the incoming request before calling the provider; if a hit,
return the cached response immediately; if a miss, call the provider as today
then store the resulting response into state.cache (using the RequestCache
get/lookup and put/insert/save APIs your crate exposes) so subsequent calls hit
the cache. Ensure you handle Option<EvalRouterState>/state.cache being None and
preserve existing error/stream semantics when reading from or writing to the
cache.

Comment on lines +12600 to +12668
fn prompt_bucket_picks_latest_user_message_length() {
use eval_router::PromptBucket;
use runtime::{ApiRequest, ContentBlock, ConversationMessage, MessageRole};

// Earlier user message is long, but the latest (what the router
// should key on) is short → expect Short.
let request = ApiRequest {
system_prompt: vec![],
messages: vec![
ConversationMessage {
role: MessageRole::User,
blocks: vec![ContentBlock::Text {
text: "a".repeat(10_000),
}],
usage: None,
},
ConversationMessage {
role: MessageRole::Assistant,
blocks: vec![ContentBlock::Text {
text: "ok".to_string(),
}],
usage: None,
},
ConversationMessage {
role: MessageRole::User,
blocks: vec![ContentBlock::Text {
text: "hi".to_string(),
}],
usage: None,
},
],
};
assert_eq!(prompt_bucket_for_request(&request), PromptBucket::Short);
}

#[test]
fn prompt_bucket_defaults_to_short_when_no_user_messages() {
use eval_router::PromptBucket;
use runtime::ApiRequest;

let request = ApiRequest {
system_prompt: vec![],
messages: vec![],
};
assert_eq!(prompt_bucket_for_request(&request), PromptBucket::Short);
}

#[test]
fn prompt_bucket_counts_chars_across_multiple_text_blocks() {
use eval_router::PromptBucket;
use runtime::{ApiRequest, ContentBlock, ConversationMessage, MessageRole};

let request = ApiRequest {
system_prompt: vec![],
messages: vec![ConversationMessage {
role: MessageRole::User,
blocks: vec![
ContentBlock::Text {
text: "a".repeat(400),
},
ContentBlock::Text {
text: "b".repeat(200),
},
],
usage: None,
}],
};
// 600 total chars → Medium bucket (>=512, <6000).
assert_eq!(prompt_bucket_for_request(&request), PromptBucket::Medium);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== Cargo manifests mentioning eval_router / claw_state =="
fd -a '^Cargo.toml$' . -x sh -c '
  printf "\n--- %s ---\n" "$1"
  rg -n '"'"'name\s*=\s*"(eval_router|eval-router|claw_state|claw-state)"|eval_router|claw_state'"'"' "$1" || true
' sh {}

echo
echo "== Rust references to PromptBucket =="
rg -n --type rust '\b(eval_router|claw_state)::PromptBucket\b'

Repository: aryehlev/claw-code

Length of output: 1103


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== PromptBucket definition (likely in claw_state) =="
rg -n --type rust 'enum PromptBucket|pub enum PromptBucket|pub(.*) PromptBucket' rust/crates/claw-state/ || echo "Not found in claw-state, checking elsewhere..."
rg -n --type rust 'enum PromptBucket|pub enum PromptBucket|pub(.*) PromptBucket' || true

echo
echo "== Production code imports in main.rs (non-test uses) =="
rg -n --type rust 'use.*PromptBucket' rust/crates/rusty-claude-cli/src/main.rs -B 5 -A 5 | head -50

echo
echo "== Any remaining eval_router references in Rust =="
rg -n --type rust '\beval_router\b' rust/crates/rusty-claude-cli/src/main.rs

Repository: aryehlev/claw-code

Length of output: 3173


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== Search for PromptBucket imports (production, not tests) =="
rg -n --type rust '^\s*use.*PromptBucket' rust/crates/rusty-claude-cli/src/main.rs | head -20

echo
echo "== Context around line 7598 (first production use of PromptBucket) =="
sed -n '1,7598p' rust/crates/rusty-claude-cli/src/main.rs | tail -100 | head -50

echo
echo "== All use statements at the top of main.rs =="
head -100 rust/crates/rusty-claude-cli/src/main.rs | rg --type rust '^\s*use\s'

Repository: aryehlev/claw-code

Length of output: 3446


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== Full claw_state import block =="
sed -n '1,100p' rust/crates/rusty-claude-cli/src/main.rs | grep -A 20 'use claw_state:'

Repository: aryehlev/claw-code

Length of output: 1379


Update test imports from non-existent eval_router to claw_state.

Production code imports PromptBucket from claw_state (line 4-5 of main.rs), but these three test functions incorrectly import from eval_router, which does not exist as a crate. Update the imports at lines 12601, 12637, and 12649 to use claw_state::PromptBucket; instead.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rust/crates/rusty-claude-cli/src/main.rs` around lines 12600 - 12668, The
three tests prompt_bucket_picks_latest_user_message_length,
prompt_bucket_defaults_to_short_when_no_user_messages, and
prompt_bucket_counts_chars_across_multiple_text_blocks import PromptBucket from
the wrong crate (`eval_router`); change those imports to use
claw_state::PromptBucket so they match the production usage (update the use
statements in each test that currently read `use eval_router::PromptBucket;` to
`use claw_state::PromptBucket;`).

…eLLM + GPTCache sidecars)"

This reverts commit 2d438a8.
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (9)
rust/crates/eval-router/src/lib.rs (1)

155-160: ⚠️ Potential issue | 🟠 Major

Initialize new scoreboards with SCOREBOARD_VERSION.

Lines 295 and 300 call ScoreboardState::default(), but the derived Default at Lines 155-160 sets version to 0, not default_version(). Missing or empty scoreboards therefore persist stale schema metadata on first save.

Proposed fix
-#[derive(Debug, Clone, PartialEq, Default, Serialize, Deserialize)]
+#[derive(Debug, Clone, PartialEq, Serialize, Deserialize)]
 struct ScoreboardState {
     #[serde(default = "default_version")]
     version: u32,
     #[serde(default)]
     buckets: BTreeMap<PromptBucket, BTreeMap<String, ModelStats>>,
 }
 
 const fn default_version() -> u32 {
     SCOREBOARD_VERSION
 }
+
+impl Default for ScoreboardState {
+    fn default() -> Self {
+        Self {
+            version: SCOREBOARD_VERSION,
+            buckets: BTreeMap::new(),
+        }
+    }
+}

Also applies to: 292-300

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rust/crates/eval-router/src/lib.rs` around lines 155 - 160, The derived
Default for ScoreboardState leaves version at 0 while serde uses
default_version(), so calls to ScoreboardState::default() (e.g., where new
scoreboards are created) produce a stale version; replace the derived Default
with a manual impl Default for ScoreboardState that returns ScoreboardState {
version: SCOREBOARD_VERSION, buckets: Default::default() } (keeping the existing
#[serde(default = "default_version")] attribute and the default_version
function) so any new scoreboard initialized via ScoreboardState::default() gets
SCOREBOARD_VERSION.
rust/crates/runtime/src/conversation.rs (2)

576-582: ⚠️ Potential issue | 🟠 Major

Only ingest the final assistant reply for the turn.

This loop stores every assistant text message, so tool-turn preambles get written into long-term memory alongside the real answer. That diverges from the PR objective of forwarding the assistant reply after the turn and will degrade recall quality over time.

Proposed fix
-        for message in assistant_messages {
-            if let Some(text) = flatten_assistant_text(message) {
-                if !text.is_empty() {
-                    payload.push(MemoryMessage::new(MemoryRole::Assistant, text));
-                }
-            }
-        }
+        if let Some(text) = assistant_messages
+            .iter()
+            .rev()
+            .find_map(flatten_assistant_text)
+            .filter(|text| !text.trim().is_empty())
+        {
+            payload.push(MemoryMessage::new(MemoryRole::Assistant, text));
+        }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rust/crates/runtime/src/conversation.rs` around lines 576 - 582, The loop
currently pushes every assistant text chunk (assistant_messages via
flatten_assistant_text) into payload; instead locate only the final assistant
reply for the turn and push a single MemoryMessage::new(MemoryRole::Assistant,
text). Concretely, replace the iteration over assistant_messages with logic that
finds the last non-empty flatten_assistant_text result (e.g., iterate in reverse
or take the last Some(text) where !text.is_empty()) and call
payload.push(MemoryMessage::new(MemoryRole::Assistant, text)) once so tool
preambles are not stored.

542-556: ⚠️ Potential issue | 🔴 Critical

Do not inject recalled memory as system-level instructions.

These snippets come from prior user/assistant content, so appending them to system_prompt gives untrusted text system precedence. That is a prompt-injection path.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rust/crates/runtime/src/conversation.rs` around lines 542 - 556, The current
system_prompt_with_memory function is appending recalled snippets (from
memory_client.recall using session.session_id) into self.system_prompt which
elevates untrusted user/assistant content to system-level — stop injecting
memory into system_prompt: remove the code that pushes the "Relevant
memories..." block into prompt and instead return the original system_prompt
unchanged and expose the memory block as user-level content (e.g., change
system_prompt_with_memory to return (system_prompt, Option<String>) or move
creation of the memory block into the caller so it can be appended to the user
input/messages rather than the system prompt).
rust/crates/rusty-claude-cli/src/main.rs (4)

371-380: ⚠️ Potential issue | 🟡 Minor

Advertise eval in the top-level help output.

The new subcommand is wired into the parser, but claw --help still doesn't show a usage line or example for it. That leaves the public CLI surface stale.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rust/crates/rusty-claude-cli/src/main.rs` around lines 371 - 380, The
top-level help doesn't advertise the new Eval subcommand; update the CLI help
text and examples to include it by adding a usage line and example for the Eval
variant (the Eval enum/variant in the command enum) where the top-level Clap/App
is constructed (e.g., in the function or code block that builds the main CLI
parser such as build_cli or the top-level Clap::command() setup). Add a short
description and at least one example invocation showing Eval with its key flags
(session_reference, optional output_path, output_format, max_turns) so that
`claw --help` prints the Eval usage and example.

7062-7078: ⚠️ Potential issue | 🟠 Major

Use a platform-aware default scoreboard path.

This only checks XDG_DATA_HOME and HOME, so eval-driven routing loses persistent scoreboard state on many Windows setups unless router.scoreboardPath is configured manually. Please fall back to the OS-local app-data directory instead of a POSIX-only path scheme.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rust/crates/rusty-claude-cli/src/main.rs` around lines 7062 - 7078, The
default_scoreboard_path function only checks XDG_DATA_HOME and HOME (POSIX) and
thus loses state on Windows; update it to be platform-aware by first attempting
the Windows app-data location (e.g., env::var_os("LOCALAPPDATA") or, better,
using a platform helper like dirs::data_local_dir() / dirs::data_dir()) and
falling back to the existing XDG/HOME logic; ensure the returned PathBuf still
ends with "claw/router-scoreboard.json" and reference the function name
default_scoreboard_path and the "router-scoreboard.json" filename when making
the change.

1336-1384: ⚠️ Potential issue | 🟡 Minor

claw eval --help still falls through as an unknown option.

parse_eval_args() does not recognize --help/-h, so the new subcommand errors instead of following the existing help path. Please short-circuit here and add a regression test for claw eval --help.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rust/crates/rusty-claude-cli/src/main.rs` around lines 1336 - 1384,
parse_eval_args() currently treats "--help" and "-h" as unknown options in the
args match, so calling "claw eval --help" errors; update the match in
parse_eval_args (the while loop handling flags like "--session", "--output",
"--max-turns") to explicitly handle "--help" and "-h" by short-circuiting and
returning the same help behavior used by the parent CLI (e.g., return an
Err/Signal that triggers the existing help path or set a flag that causes help
to be shown), and add a regression test (e.g., test_eval_help) that invokes the
eval subcommand with "--help" and asserts the CLI returns/prints help rather
than reporting an unknown option.

6357-6360: ⚠️ Potential issue | 🟠 Major

Keep eval stdout single-format in JSON mode.

When --output-format json is used without --output, this writes JSONL turn records to stdout and then appends a pretty JSON summary to stdout as well. That breaks both JSONL and single-document JSON consumers. Emit the summary to stderr in that case, or only print it when records are going to a file.

Also applies to: 6414-6429

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rust/crates/rusty-claude-cli/src/main.rs` around lines 6357 - 6360, The
current logic writes JSONL records to `writer` (created from `output_path`) and
then always writes the pretty JSON summary to the same `writer`, which causes
mixed stdout output when `output_format` is JSON and `output_path` is None;
change the summary to go to stderr in that case. Concretely: keep creating
`writer` as you do (variable `writer`, matched on `output_path`), but when
emitting the final summary (the block around lines 6414-6429), detect if
`output_format` is JSON and `output_path` is None (or if `writer` is `stdout`)
and write the summary to `io::stderr()` (or use `eprintln!`) instead of
`writer`; otherwise continue to write the summary to `writer` (file). Ensure you
reference `output_format`, `output_path`, the `writer` variable and the
summary-emitting code path so the summary is redirected only when writing JSON
to stdout.
rust/crates/memory-client/src/lib.rs (2)

234-239: ⚠️ Potential issue | 🟠 Major

Enforce recall limit client-side with .take(limit).

The backend may return more results than requested. Without client-side enforcement, excess memories could bloat the system prompt beyond the configured recallLimit.

Proposed fix
         Ok(payload
             .results
             .into_iter()
             .filter_map(|entry| entry.fact.or(entry.content))
             .filter(|text| !text.trim().is_empty())
+            .take(limit)
             .collect())
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rust/crates/memory-client/src/lib.rs` around lines 234 - 239, The iterator
over payload.results can return more items than desired; enforce the client-side
recall limit by inserting a .take(limit_usize) into the iterator chain before
.collect(), e.g. between the .filter(...) and .collect() calls that follow
payload.results.into_iter(). Use the function's recall/limit parameter (convert
to usize if it's an integer type) as limit_usize so only up to that many
memories are collected.

215-215: ⚠️ Potential issue | 🟠 Major

Percent-encode session_id before interpolating into URL path.

If session_id contains reserved URL characters (/, ?, #, spaces), these requests will hit wrong endpoints or corrupt the path structure.

Proposed fix
+use percent_encoding::{utf8_percent_encode, NON_ALPHANUMERIC};
+
+fn encode_path_segment(s: &str) -> String {
+    utf8_percent_encode(s, NON_ALPHANUMERIC).to_string()
+}
+
 impl MemoryClient for ZepMemoryClient {
     fn recall(
         &mut self,
         session_id: &str,
         query: &str,
         limit: usize,
     ) -> Result<Vec<String>, MemoryError> {
         if query.trim().is_empty() || limit == 0 {
             return Ok(Vec::new());
         }
-        let path = format!("/api/v2/sessions/{session_id}/search");
+        let encoded_id = encode_path_segment(session_id);
+        let path = format!("/api/v2/sessions/{encoded_id}/search");

Apply the same encoding in ingest:

     fn ingest(&mut self, session_id: &str, messages: &[MemoryMessage]) -> Result<(), MemoryError> {
         if messages.is_empty() {
             return Ok(());
         }
-        let path = format!("/api/v2/sessions/{session_id}/messages");
+        let encoded_id = encode_path_segment(session_id);
+        let path = format!("/api/v2/sessions/{encoded_id}/messages");

Also applies to: 246-246

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rust/crates/memory-client/src/lib.rs` at line 215, The session_id is
interpolated directly into URL paths (e.g., the format! call that builds
"/api/v2/sessions/{session_id}/search" and the similar build in the ingest
path), so percent-encode session_id before formatting the path to avoid reserved
characters breaking the URL; update the code that constructs these paths to call
a percent-encoding helper (or use the percent-encoding/url crate) to produce an
encoded_session_id and use that variable in format! for both the search path and
the ingest path occurrences.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@deploy/router/README.md`:
- Around line 9-11: The fenced code blocks like the one containing "claw -->
gptcache:8100 --> routellm:6060 --> Anthropic / OpenAI / ..." are unlabeled and
triggering markdownlint; add a language tag `text` to each triple-backtick fence
(e.g., replace ``` with ```text) for the blocks at the noted locations
(including the blocks referenced at lines 52-54 and 87-89) so the content
renders the same but satisfies the linter.
- Around line 168-174: Update the stderr example to match the documented router
log format used elsewhere: replace the conflicting line showing "[router]
user_model=... routed_to=..." with the canonical format "[router] bucket=...
selected=... reason=..." so the README consistently shows the same router
decision fields; ensure the example values mirror the earlier example's field
names (bucket, selected, reason) and wording.

In `@rust/crates/memory-client/src/lib.rs`:
- Around line 166-175: The apply_auth method currently always sets Authorization
to "Api-Key {key}"; change it to inspect self.config.api_key and, when
Some(api_key), use api_key.starts_with("z_") to choose the header value—use
"Api-Key {api_key}" for keys starting with "z_" and "Bearer {api_key}" for all
other non-empty keys; return the original builder when api_key is None. Target
the apply_auth function and the self.config.api_key check to implement this
conditional header selection.

---

Duplicate comments:
In `@rust/crates/eval-router/src/lib.rs`:
- Around line 155-160: The derived Default for ScoreboardState leaves version at
0 while serde uses default_version(), so calls to ScoreboardState::default()
(e.g., where new scoreboards are created) produce a stale version; replace the
derived Default with a manual impl Default for ScoreboardState that returns
ScoreboardState { version: SCOREBOARD_VERSION, buckets: Default::default() }
(keeping the existing #[serde(default = "default_version")] attribute and the
default_version function) so any new scoreboard initialized via
ScoreboardState::default() gets SCOREBOARD_VERSION.

In `@rust/crates/memory-client/src/lib.rs`:
- Around line 234-239: The iterator over payload.results can return more items
than desired; enforce the client-side recall limit by inserting a
.take(limit_usize) into the iterator chain before .collect(), e.g. between the
.filter(...) and .collect() calls that follow payload.results.into_iter(). Use
the function's recall/limit parameter (convert to usize if it's an integer type)
as limit_usize so only up to that many memories are collected.
- Line 215: The session_id is interpolated directly into URL paths (e.g., the
format! call that builds "/api/v2/sessions/{session_id}/search" and the similar
build in the ingest path), so percent-encode session_id before formatting the
path to avoid reserved characters breaking the URL; update the code that
constructs these paths to call a percent-encoding helper (or use the
percent-encoding/url crate) to produce an encoded_session_id and use that
variable in format! for both the search path and the ingest path occurrences.

In `@rust/crates/runtime/src/conversation.rs`:
- Around line 576-582: The loop currently pushes every assistant text chunk
(assistant_messages via flatten_assistant_text) into payload; instead locate
only the final assistant reply for the turn and push a single
MemoryMessage::new(MemoryRole::Assistant, text). Concretely, replace the
iteration over assistant_messages with logic that finds the last non-empty
flatten_assistant_text result (e.g., iterate in reverse or take the last
Some(text) where !text.is_empty()) and call
payload.push(MemoryMessage::new(MemoryRole::Assistant, text)) once so tool
preambles are not stored.
- Around line 542-556: The current system_prompt_with_memory function is
appending recalled snippets (from memory_client.recall using session.session_id)
into self.system_prompt which elevates untrusted user/assistant content to
system-level — stop injecting memory into system_prompt: remove the code that
pushes the "Relevant memories..." block into prompt and instead return the
original system_prompt unchanged and expose the memory block as user-level
content (e.g., change system_prompt_with_memory to return (system_prompt,
Option<String>) or move creation of the memory block into the caller so it can
be appended to the user input/messages rather than the system prompt).

In `@rust/crates/rusty-claude-cli/src/main.rs`:
- Around line 371-380: The top-level help doesn't advertise the new Eval
subcommand; update the CLI help text and examples to include it by adding a
usage line and example for the Eval variant (the Eval enum/variant in the
command enum) where the top-level Clap/App is constructed (e.g., in the function
or code block that builds the main CLI parser such as build_cli or the top-level
Clap::command() setup). Add a short description and at least one example
invocation showing Eval with its key flags (session_reference, optional
output_path, output_format, max_turns) so that `claw --help` prints the Eval
usage and example.
- Around line 7062-7078: The default_scoreboard_path function only checks
XDG_DATA_HOME and HOME (POSIX) and thus loses state on Windows; update it to be
platform-aware by first attempting the Windows app-data location (e.g.,
env::var_os("LOCALAPPDATA") or, better, using a platform helper like
dirs::data_local_dir() / dirs::data_dir()) and falling back to the existing
XDG/HOME logic; ensure the returned PathBuf still ends with
"claw/router-scoreboard.json" and reference the function name
default_scoreboard_path and the "router-scoreboard.json" filename when making
the change.
- Around line 1336-1384: parse_eval_args() currently treats "--help" and "-h" as
unknown options in the args match, so calling "claw eval --help" errors; update
the match in parse_eval_args (the while loop handling flags like "--session",
"--output", "--max-turns") to explicitly handle "--help" and "-h" by
short-circuiting and returning the same help behavior used by the parent CLI
(e.g., return an Err/Signal that triggers the existing help path or set a flag
that causes help to be shown), and add a regression test (e.g., test_eval_help)
that invokes the eval subcommand with "--help" and asserts the CLI
returns/prints help rather than reporting an unknown option.
- Around line 6357-6360: The current logic writes JSONL records to `writer`
(created from `output_path`) and then always writes the pretty JSON summary to
the same `writer`, which causes mixed stdout output when `output_format` is JSON
and `output_path` is None; change the summary to go to stderr in that case.
Concretely: keep creating `writer` as you do (variable `writer`, matched on
`output_path`), but when emitting the final summary (the block around lines
6414-6429), detect if `output_format` is JSON and `output_path` is None (or if
`writer` is `stdout`) and write the summary to `io::stderr()` (or use
`eprintln!`) instead of `writer`; otherwise continue to write the summary to
`writer` (file). Ensure you reference `output_format`, `output_path`, the
`writer` variable and the summary-emitting code path so the summary is
redirected only when writing JSON to stdout.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: d6e46af6-2e82-4089-a5bd-c24a79bbe776

📥 Commits

Reviewing files that changed from the base of the PR and between 2d438a8 and 1ed8b67.

⛔ Files ignored due to path filters (1)
  • rust/Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (14)
  • deploy/router/README.md
  • deploy/router/docker-compose.yml
  • rust/crates/eval-router/Cargo.toml
  • rust/crates/eval-router/src/lib.rs
  • rust/crates/memory-client/Cargo.toml
  • rust/crates/memory-client/src/lib.rs
  • rust/crates/memory-client/tests/zep_http.rs
  • rust/crates/runtime/Cargo.toml
  • rust/crates/runtime/src/config.rs
  • rust/crates/runtime/src/config_validate.rs
  • rust/crates/runtime/src/conversation.rs
  • rust/crates/runtime/src/lib.rs
  • rust/crates/rusty-claude-cli/Cargo.toml
  • rust/crates/rusty-claude-cli/src/main.rs
✅ Files skipped from review due to trivial changes (3)
  • rust/crates/rusty-claude-cli/Cargo.toml
  • rust/crates/runtime/Cargo.toml
  • deploy/router/docker-compose.yml
🚧 Files skipped from review as they are similar to previous changes (2)
  • rust/crates/runtime/src/config_validate.rs
  • rust/crates/runtime/src/lib.rs

Comment thread deploy/router/README.md
Comment on lines +9 to +11
```
claw --> gptcache:8100 --> routellm:6060 --> Anthropic / OpenAI / ...
```
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add language tags to these fenced blocks.

markdownlint is already flagging these unlabeled fences. Marking them as text keeps the docs lint-clean without changing the rendered content.

Also applies to: 52-54, 87-89

🧰 Tools
🪛 markdownlint-cli2 (0.22.0)

[warning] 9-9: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deploy/router/README.md` around lines 9 - 11, The fenced code blocks like the
one containing "claw --> gptcache:8100 --> routellm:6060 --> Anthropic / OpenAI
/ ..." are unlabeled and triggering markdownlint; add a language tag `text` to
each triple-backtick fence (e.g., replace ``` with ```text) for the blocks at
the noted locations (including the blocks referenced at lines 52-54 and 87-89)
so the content renders the same but satisfies the linter.

Comment thread deploy/router/README.md
Comment on lines +168 to +174
## Observing router decisions

When routing is enabled claw prints one line per turn on stderr:

```
[router] user_model=opus routed_to=claude-haiku-4-5
```
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Align this stderr example with the actual documented router log format.

Line 173 now shows [router] user_model=... routed_to=..., which conflicts with the earlier [router] bucket=... selected=... reason=... example on Lines 87-89. Keeping both formats in the same README will confuse anyone parsing router decisions.

🧰 Tools
🪛 markdownlint-cli2 (0.22.0)

[warning] 172-172: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deploy/router/README.md` around lines 168 - 174, Update the stderr example to
match the documented router log format used elsewhere: replace the conflicting
line showing "[router] user_model=... routed_to=..." with the canonical format
"[router] bucket=... selected=... reason=..." so the README consistently shows
the same router decision fields; ensure the example values mirror the earlier
example's field names (bucket, selected, reason) and wording.

Comment on lines +166 to +175
fn apply_auth(
&self,
builder: reqwest::blocking::RequestBuilder,
) -> reqwest::blocking::RequestBuilder {
if let Some(api_key) = &self.config.api_key {
builder.header("Authorization", format!("Api-Key {api_key}"))
} else {
builder
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🌐 Web query:

Zep API authentication header format Api-Key

💡 Result:

The authentication header format for the Zep API (getzep.com, agent memory platform) using Api-Key is: Authorization: Api-Key {api_key} This applies specifically to Zep Cloud API keys, which start with "z_". For self-hosted Zep Open Source (keys not starting with "z_"), it uses Authorization: Bearer {api_key} instead. Example from official Python SDK source code: if api_key and api_key.startswith("z_"): headers["Authorization"] = f"Api-Key {api_key}" Official docs reference SDK initialization with api_key but do not explicitly show raw HTTP header; the SDK implementation confirms the format.

Citations:


Implement conditional authentication for Zep Cloud and self-hosted instances.

The code currently uses Authorization: Api-Key for all keys, but Zep requires different formats based on key type: Cloud API keys (starting with z_) use Api-Key, while self-hosted Zep Open Source requires Authorization: Bearer. Update the code to check the key prefix and apply the correct authentication scheme:

if let Some(api_key) = &self.config.api_key {
    if api_key.starts_with("z_") {
        builder.header("Authorization", format!("Api-Key {api_key}"))
    } else {
        builder.header("Authorization", format!("Bearer {api_key}"))
    }
} else {
    builder
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rust/crates/memory-client/src/lib.rs` around lines 166 - 175, The apply_auth
method currently always sets Authorization to "Api-Key {key}"; change it to
inspect self.config.api_key and, when Some(api_key), use
api_key.starts_with("z_") to choose the header value—use "Api-Key {api_key}" for
keys starting with "z_" and "Bearer {api_key}" for all other non-empty keys;
return the original builder when api_key is None. Target the apply_auth function
and the self.config.api_key check to implement this conditional header
selection.

Zero Rust changes. Both tools integrate via claw's existing hook and
skill mechanisms; the added files are wrapper + docs.

- deploy/efficiency/rtk/hook.sh — PreToolUse wrapper that bridges
  claw's hook ABI (HOOK_TOOL_NAME + HOOK_TOOL_INPUT env vars, JSON
  stdout with hookSpecificOutput.updatedInput) to rtk's expectations.
  Filters for Bash tool calls, rewrites known rtk-supported
  commands (git, cargo, npm, docker, etc.) to `rtk <cmd>`, passes
  everything else through. Gracefully degrades if rtk isn't on PATH.
  Smoke-tested: bash+git is rewritten, bash+ls passes through, non-
  bash tools pass through.

- deploy/efficiency/rtk/settings.json.example — one-line .claw.json
  snippet pointing PreToolUse at the wrapper.

- deploy/efficiency/caveman/README.md — manual install steps for
  claw. Caveman upstream targets Claude Code's plugin marketplace
  which claw doesn't implement, so this documents copying SKILL.md
  files directly into .claw/skills/. Caveman's SessionStart auto-
  activation doesn't port (claw has no SessionStart event), so
  activation is manual via /caveman slash commands.

- deploy/efficiency/README.md — overview showing how RTK, Caveman,
  the router scoreboard, and prompt/sidecar caches stack.
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@deploy/efficiency/caveman/README.md`:
- Around line 41-46: Add explicit fenced code block languages to satisfy
markdownlint MD040 by updating the triple-backtick fences that wrap the command
lists to include a language specifier (e.g., "text"); specifically, replace the
opening fences for the blocks containing the "/caveman", "/caveman lite",
"/caveman full", "/caveman ultra" lines and the block containing "/skills" with
fenced blocks like ```text so the blocks for those command lists are properly
annotated.
- Around line 18-33: The README's filesystem auto-discovery steps are incorrect
for this repo because skills are loaded from a snapshot rather than dynamically
from the filesystem; update the install instructions to explain adding Caveman
to the skills snapshot or project skills directory used by the snapshot loader
(referencing the snapshot-loading behavior in src/commands.py,
src/skills/__init__.py, and src/command_graph.py) and remove/replace the
mkdir/cp steps with instructions to place SKILL.md into the project's snapshot
source (./.claw/skills/) or to run the project's snapshot
generation/registration workflow so the skill appears in the snapshot used at
runtime.

In `@deploy/efficiency/README.md`:
- Around line 5-7: The README's description of "existing hook/skill mechanisms"
conflicts with the Python runtime implementation; update the README text around
the integration model (the paragraphs covering lines 5-7 and 53-54) to either
scope the docs to the Rust/runtime that actually supports PreToolUse hooks or
add an explicit caveat for the Python port stating that src/runtime.py (see
runtime execution flow around the current PreToolUse handling at
src/runtime.py:120-133), src/query_engine.py (query handling at
src/query_engine.py:61-66) and the snapshot-based skills initialization in
src/skills/__init__.py (lines ~1-12) do not execute PreToolUse hooks and instead
use snapshot-based skills; make the README clear which runtime supports
hook/skill execution and add a short note on migration or limitations for the
Python port.
- Around line 29-49: The fenced diagram block in README.md is missing a language
specifier (MD040); update the opening fence from ``` to include a language such
as ```text so the diagram is treated as a code block (locate the block starting
with the backticks and the line "user turn" and add the specifier), and ensure
the closing fence remains ``` to properly terminate the block.

In `@deploy/efficiency/rtk/hook.sh`:
- Around line 13-16: The comment in deploy/efficiency/rtk/hook.sh incorrectly
claims PreToolUse hooks run for every tool call; reconcile this by either
updating the comment to reflect the current runtime (remove/clarify the "every
tool call" and Bash behavior) or by adding the missing plumbing in
src/runtime.py and/or src/query_engine.py to actually invoke the PreToolUse hook
and permit Bash tools: locate the PreToolUse logic referenced in hook.sh, then
add a hook invocation in the runtime call path (the runtime entrypoint and query
engine paths currently shown in src/runtime.py and src/query_engine.py) so calls
flow through the wrapper, and ensure the Bash tool exclusion at the existing
runtime.py check is adjusted to allow the wrapper to run when intended.
- Around line 43-44: The comment in hook.sh states there is a "jq fallback" but
the script only uses python3 to parse the tool input JSON; either implement the
jq fallback or change the comment to reflect reality. Fix options: (A) add a jq
branch that checks for jq availability and parses the same JSON key as the
existing python3 command (mirror the extraction logic used by the python3
invocation), or (B) update the comment text above the python3 parsing to remove
mention of jq so it no longer claims a fallback. Look for the python3 invocation
in hook.sh and add the jq check/parse or update that comment accordingly.
- Around line 80-86: The Python block currently embeds the Bash variable
`rewritten` into the Python source using `${rewritten@Q}`, which breaks on
special chars; instead invoke Python with the `rewritten` value as an argument
and read it from sys.argv inside the Python snippet. Replace the current
`python3 - <<PY` block that prints JSON with an invocation like `python3 -
"$rewritten" <<'PY'` (use a quoted heredoc delimiter) and inside the snippet
read the value via `import sys` and `sys.argv[1]` when building the JSON output;
update the JSON construction code that references the embedded value to use that
sys.argv variable.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 01f91dce-9643-465a-ae51-fe48447835bd

📥 Commits

Reviewing files that changed from the base of the PR and between 1ed8b67 and 6f585aa.

📒 Files selected for processing (4)
  • deploy/efficiency/README.md
  • deploy/efficiency/caveman/README.md
  • deploy/efficiency/rtk/hook.sh
  • deploy/efficiency/rtk/settings.json.example
✅ Files skipped from review due to trivial changes (1)
  • deploy/efficiency/rtk/settings.json.example

Comment on lines +18 to +33
# claw auto-discovers skills in .claw/skills/<name>/SKILL.md — user-level.
mkdir -p ~/.claw/skills/caveman
cp /tmp/caveman/skills/caveman/SKILL.md ~/.claw/skills/caveman/SKILL.md

# (repeat for any lite/full/ultra variants you want)
for v in caveman-lite caveman-full caveman-ultra; do
[ -f /tmp/caveman/skills/$v/SKILL.md ] || continue
mkdir -p ~/.claw/skills/$v
cp /tmp/caveman/skills/$v/SKILL.md ~/.claw/skills/$v/SKILL.md
done

rm -rf /tmp/caveman
```

Project-scoped install is the same with `./.claw/skills/` instead of
`~/.claw/skills/`.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Filesystem “auto-discovery” instructions don’t match the current skills loading model.

Line 18 claims claw auto-discovers ~/.claw/skills/<name>/SKILL.md, and Lines 63-69 say /skills should reflect manual installs. In this repo, src/commands.py:23-33, src/skills/__init__.py:1-12, and src/command_graph.py:30-34 indicate snapshot-based loading, not dynamic filesystem discovery. These steps will likely not activate Caveman here.

Also applies to: 63-69

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deploy/efficiency/caveman/README.md` around lines 18 - 33, The README's
filesystem auto-discovery steps are incorrect for this repo because skills are
loaded from a snapshot rather than dynamically from the filesystem; update the
install instructions to explain adding Caveman to the skills snapshot or project
skills directory used by the snapshot loader (referencing the snapshot-loading
behavior in src/commands.py, src/skills/__init__.py, and src/command_graph.py)
and remove/replace the mkdir/cp steps with instructions to place SKILL.md into
the project's snapshot source (./.claw/skills/) or to run the project's snapshot
generation/registration workflow so the skill appears in the snapshot used at
runtime.

Comment on lines +41 to +46
```
/caveman # compress this turn
/caveman lite # lighter compression
/caveman full # aggressive
/caveman ultra # maximum, near-telegram
```
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add fenced code block languages to satisfy markdownlint (MD040).

Proposed fix
-```
+```text
 /caveman            # compress this turn
 /caveman lite       # lighter compression
 /caveman full       # aggressive
 /caveman ultra      # maximum, near-telegram

@@
- +text
/skills

Also applies to: 65-67

🧰 Tools
🪛 markdownlint-cli2 (0.22.0)

[warning] 41-41: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deploy/efficiency/caveman/README.md` around lines 41 - 46, Add explicit
fenced code block languages to satisfy markdownlint MD040 by updating the
triple-backtick fences that wrap the command lists to include a language
specifier (e.g., "text"); specifically, replace the opening fences for the
blocks containing the "/caveman", "/caveman lite", "/caveman full", "/caveman
ultra" lines and the block containing "/skills" with fenced blocks like ```text
so the blocks for those command lists are properly annotated.

Comment on lines +5 to +7
separately-installed binary or skill via its existing hook / skill
mechanisms. No Rust changes in claw itself; each integration lands and
unlands by editing `.claw.json`.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Integration model in docs conflicts with current runtime capabilities.

Lines 5-7 and 53-54 describe existing hook/skill mechanisms as if they are active, but current Python paths (src/runtime.py:120-133, src/query_engine.py:61-66, src/skills/__init__.py:1-12) indicate no PreToolUse execution and snapshot-based skills. Please scope this README to the runtime that supports these features, or add explicit caveats for the Python port.

Also applies to: 53-54

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deploy/efficiency/README.md` around lines 5 - 7, The README's description of
"existing hook/skill mechanisms" conflicts with the Python runtime
implementation; update the README text around the integration model (the
paragraphs covering lines 5-7 and 53-54) to either scope the docs to the
Rust/runtime that actually supports PreToolUse hooks or add an explicit caveat
for the Python port stating that src/runtime.py (see runtime execution flow
around the current PreToolUse handling at src/runtime.py:120-133),
src/query_engine.py (query handling at src/query_engine.py:61-66) and the
snapshot-based skills initialization in src/skills/__init__.py (lines ~1-12) do
not execute PreToolUse hooks and instead use snapshot-based skills; make the
README clear which runtime supports hook/skill execution and add a short note on
migration or limitations for the Python port.

Comment on lines +29 to +49
```
user turn
Caveman (per-turn) ──► shorter model output, shorter user text
Router (scoreboard) ─► cheapest capable model
Cache (sidecar / prompt-cache)
Provider
Tool call? ─► RTK PreToolUse hook ─► compressed output back to model
Memory ingest (Zep) ──► persisted facts for future recall
```
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Specify a language for the fenced diagram block (MD040).

Proposed fix
-```
+```text
   user turn
      │
      ▼
@@
   Memory ingest (Zep) ──► persisted facts for future recall
</details>

<details>
<summary>🧰 Tools</summary>

<details>
<summary>🪛 markdownlint-cli2 (0.22.0)</summary>

[warning] 29-29: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

</details>

</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

Verify each finding against the current code and only fix it if needed.

In @deploy/efficiency/README.md around lines 29 - 49, The fenced diagram block
in README.md is missing a language specifier (MD040); update the opening fence
from to include a language such astext so the diagram is treated as a
code block (locate the block starting with the backticks and the line "user
turn" and add the specifier), and ensure the closing fence remains ``` to
properly terminate the block.


</details>

<!-- fingerprinting:phantom:poseidon:hawk:66f14981-ae89-4eeb-84cc-11b69e1c78aa -->

<!-- This is an auto-generated comment by CodeRabbit -->

Comment on lines +13 to +16
# claw runs PreToolUse hooks for EVERY tool call, not just Bash. This
# wrapper short-circuits for non-Bash tools and only rewrites the
# command when it matches a known rtk-supported tool family. All other
# tool calls pass through untouched.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Behavior claim conflicts with current runtime integration path in this repo.

Lines 13-16 state PreToolUse hooks run for every tool call, but src/runtime.py:120-133 and src/query_engine.py:61-66 show no hook invocation path, and src/runtime.py:169-174 explicitly denies Bash tools. This wrapper appears non-functional in the current Python runtime unless additional plumbing exists outside these paths.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deploy/efficiency/rtk/hook.sh` around lines 13 - 16, The comment in
deploy/efficiency/rtk/hook.sh incorrectly claims PreToolUse hooks run for every
tool call; reconcile this by either updating the comment to reflect the current
runtime (remove/clarify the "every tool call" and Bash behavior) or by adding
the missing plumbing in src/runtime.py and/or src/query_engine.py to actually
invoke the PreToolUse hook and permit Bash tools: locate the PreToolUse logic
referenced in hook.sh, then add a hook invocation in the runtime call path (the
runtime entrypoint and query engine paths currently shown in src/runtime.py and
src/query_engine.py) so calls flow through the wrapper, and ensure the Bash tool
exclusion at the existing runtime.py check is adjusted to allow the wrapper to
run when intended.

Comment on lines +43 to +44
# Pull the command out of the tool input JSON. Using python3 because it's
# almost always present; fall back to jq if you've got it.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Comment says jq fallback exists, but code doesn’t implement it.

Line 44 mentions a jq fallback, but only Python parsing is implemented. Either add the fallback or update the comment to avoid misleading operators.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deploy/efficiency/rtk/hook.sh` around lines 43 - 44, The comment in hook.sh
states there is a "jq fallback" but the script only uses python3 to parse the
tool input JSON; either implement the jq fallback or change the comment to
reflect reality. Fix options: (A) add a jq branch that checks for jq
availability and parses the same JSON key as the existing python3 command
(mirror the extraction logic used by the python3 invocation), or (B) update the
comment text above the python3 parsing to remove mention of jq so it no longer
claims a fallback. Look for the python3 invocation in hook.sh and add the jq
check/parse or update that comment accordingly.

Comment on lines +80 to +86
python3 - <<PY
import json
print(json.dumps({
"hookSpecificOutput": {
"updatedInput": {"command": ${rewritten@Q}}
}
}))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail
rewritten=$'rtk printf "line1\nline2"\nrtk echo done'

# Reproduce current embedding behavior; expected: Python parse failure on some bash versions/inputs.
python3 - <<PY
import json
print(json.dumps({"command": ${rewritten@Q}}))
PY

Repository: aryehlev/claw-code

Length of output: 229


🏁 Script executed:

cat -n deploy/efficiency/rtk/hook.sh | sed -n '75,90p'

Repository: aryehlev/claw-code

Length of output: 440


🏁 Script executed:

# Check how rewritten variable is set and if it could contain problematic characters
cat -n deploy/efficiency/rtk/hook.sh | head -80

# Test the proposed fix to ensure it works correctly
python3 - "$'rtk printf \"line1\nline2\"\nrtk echo done'" <<'PY'
import json, sys
print(json.dumps({
    "hookSpecificOutput": {
        "updatedInput": {"command": sys.argv[1]}
    }
}))
PY

Repository: aryehlev/claw-code

Length of output: 3367


🏁 Script executed:

# Search for other python3 - << patterns in the file to check if similar issue exists elsewhere
grep -n 'python3 -' deploy/efficiency/rtk/hook.sh

Repository: aryehlev/claw-code

Length of output: 122


Pass rewritten as an argument instead of embedding in Python source code.

Line 80 uses ${rewritten@Q} to embed a Bash-escaped variable directly into Python source. The variable contains user commands (set at line 71) that may include special characters, newlines, or quotes—causing Python parse failures. The test case confirms this causes a SyntaxError. Additionally, line 46 in the same file already demonstrates the correct pattern: use a quoted heredoc delimiter and pass the variable as an argument via sys.argv.

Proposed fix
-python3 - <<PY
+python3 - "$rewritten" <<'PY'
 import json
-print(json.dumps({
+import sys
+print(json.dumps({
     "hookSpecificOutput": {
-        "updatedInput": {"command": ${rewritten@Q}}
+        "updatedInput": {"command": sys.argv[1]}
     }
 }))
 PY
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
python3 - <<PY
import json
print(json.dumps({
"hookSpecificOutput": {
"updatedInput": {"command": ${rewritten@Q}}
}
}))
python3 - "$rewritten" <<'PY'
import json
import sys
print(json.dumps({
"hookSpecificOutput": {
"updatedInput": {"command": sys.argv[1]}
}
}))
PY
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deploy/efficiency/rtk/hook.sh` around lines 80 - 86, The Python block
currently embeds the Bash variable `rewritten` into the Python source using
`${rewritten@Q}`, which breaks on special chars; instead invoke Python with the
`rewritten` value as an argument and read it from sys.argv inside the Python
snippet. Replace the current `python3 - <<PY` block that prints JSON with an
invocation like `python3 - "$rewritten" <<'PY'` (use a quoted heredoc delimiter)
and inside the snippet read the value via `import sys` and `sys.argv[1]` when
building the JSON output; update the JSON construction code that references the
embedded value to use that sys.argv variable.

…king, e2e test

Closes the top four gaps flagged in the branch's gap analysis.

1. **Fallback on provider error**. `AnthropicRuntimeClient::stream` now
   loops up to MAX_ROUTER_ATTEMPTS times (2) when the eval router is
   configured: first attempt uses the scoreboard's default `select`,
   each subsequent try calls a new `select_excluding(bucket, tried)`
   that filters out every model already tried this turn. One flaky
   provider can no longer take down a turn another candidate would
   have handled. The failing candidate gets a failure recorded in the
   scoreboard before we cascade, so the router learns from the
   outage.

2. **Scoreboard decay**. `ModelStats` gains `last_observation_ms`;
   a new `record_outcome_with_decay` applies exponential decay
   (`0.5 ^ (elapsed_ms / half_life_ms)`) to `successes`, `failures`,
   and `total_latency_ms` before folding in the new sample. Token
   totals and the new cost field are not decayed — they're a
   lifetime ledger, not a routing input. Configured via
   `router.halfLifeHours` in .claw.json (default 168 = one week; 0
   disables decay). Old scoreboards load as-is (pre-decay rows have
   `last_observation_ms = 0`, which skips the first decay pass).

3. **Cost tracking**. `ModelStats` also gains `total_cost_micros`
   (micro-dollars stored as u64 so the on-disk JSON stays
   float-free). The CLI computes per-turn cost via the existing
   `runtime::pricing_for_model` and passes it to
   `record_outcome_with_decay`. `claw eval` now emits
   `cost=$0.XXXX` in its stderr summary and adds `cost_micros` to
   every per-turn JSONL record plus `total_cost_micros` /
   `total_cost_usd` in the final JSON summary. Unknown models
   (no pricing entry) fall through as 0 — missing data is better
   than wrong data.

4. **End-to-end integration test**
   (`rusty-claude-cli/tests/router_e2e.rs`). Spins up the
   `MockAnthropicService` and exercises two paths: a clean warm-start
   that confirms the scoreboard JSON lands on disk with the expected
   shape (`successes = 1`, populated `last_observation_ms`, the
   `[router]` stderr line), and a pre-seeded scoreboard that
   exercises the select-excluding branch without needing a failing
   mock. Both tests pass in isolation of real providers.

Tests: 18 eval-router unit, 10 memory-client, 28 runtime config,
23 conversation, 19 CLI (3 new cost tests), 2 new router_e2e.
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

♻️ Duplicate comments (3)
rust/crates/rusty-claude-cli/src/main.rs (3)

6371-6447: ⚠️ Potential issue | 🟠 Major

Keep eval stdout single-format in JSON mode.

When --output-format json is used without --output, this writes JSONL turn records to stdout and then appends a pretty JSON summary to stdout as well. That makes stdout neither valid JSONL nor a single JSON document. Emit the summary to stderr in that case, or only print the pretty summary when output_path is set.


371-380: ⚠️ Potential issue | 🟡 Minor

Advertise claw eval in top-level help.

This adds a new public subcommand, but claw --help still doesn't expose it, so the CLI surface is stale for users discovering commands. Please add a usage line and example for claw eval --session <SESSION> --output <FILE> [--max-turns N].

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rust/crates/rusty-claude-cli/src/main.rs` around lines 371 - 380, The
top-level CLI help doesn't mention the new Eval subcommand (the Eval enum
variant), so update the top-level clap command definition (where the CLI is
constructed—e.g., the function that builds the clap::Command or the struct
implementing clap derives) to include usage and an example for the eval
subcommand; add a usage line like "claw eval --session <SESSION> --output <FILE>
[--max-turns N]" and an example invocation to the top-level
help/after_help/long_about so `claw --help` advertises Eval. Ensure you
reference the Eval variant in the subcommand list so it appears in the printed
subcommands.

1330-1394: ⚠️ Potential issue | 🟡 Minor

Handle claw eval --help like the other top-level commands.

parse_eval_args() still treats --help/-h as unknown options, so claw eval --help errors instead of showing help. Please short-circuit to the normal help path here and add a regression test for it.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rust/crates/rusty-claude-cli/src/main.rs` around lines 1330 - 1394,
parse_eval_args currently treats "--help"/"-h" as unknown; update the function
(parse_eval_args) to detect "--help" and "-h" early in the argument loop and
short-circuit by returning the global help action (return Ok(CliAction::Help) or
the equivalent top-level help variant used in your CLI enum) so "claw eval
--help" shows help instead of erroring, and add a regression test that calls the
eval subcommand parser (or runs the CLI entry) with ["eval", "--help"] /
["eval", "-h"] and asserts the help action/output is produced; touch the
parse_eval_args match to include the "--help" | "-h" arm and add the test in the
same test module where other CLI tests live.
🧹 Nitpick comments (2)
rust/crates/eval-router/src/lib.rs (2)

369-378: Consider eliminating the TOCTOU race by reading first.

There's a minor race window between path.exists() and fs::read(&path). While the error propagation is acceptable, you could simplify by attempting the read unconditionally and treating NotFound as "empty scoreboard":

♻️ Proposed refactor
-let state = if path.exists() {
-    let bytes = fs::read(&path)?;
-    if bytes.is_empty() {
-        ScoreboardState::default()
-    } else {
-        serde_json::from_slice(&bytes)?
-    }
-} else {
-    ScoreboardState::default()
+let state = match fs::read(&path) {
+    Ok(bytes) if bytes.is_empty() => ScoreboardState::default(),
+    Ok(bytes) => serde_json::from_slice(&bytes)?,
+    Err(e) if e.kind() == io::ErrorKind::NotFound => ScoreboardState::default(),
+    Err(e) => return Err(e.into()),
 };
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rust/crates/eval-router/src/lib.rs` around lines 369 - 378, The current
TOCTOU occurs because code checks path.exists() before fs::read; instead call
fs::read(&path) unconditionally, then match the Result: on Ok(bytes) return
ScoreboardState::default() if bytes.is_empty() else
serde_json::from_slice(&bytes)?; on Err(e) if e.kind() ==
std::io::ErrorKind::NotFound return ScoreboardState::default() else propagate
the error; update the block that builds state (references: ScoreboardState,
path, fs::read, serde_json::from_slice) accordingly.

521-530: Warm-start selection order may introduce sampling bias.

Using find() always selects the first under-sampled candidate in input order, meaning the first model gets all its min_samples observations before the second model receives any. Consider randomizing the warm-start selection for more balanced initial sampling:

♻️ Proposed alternative
-if let Some(needs_data) = candidates
-    .iter()
-    .find(|candidate| self.stats(bucket, candidate).samples() < u64::from(min_samples))
-{
+let under_sampled: Vec<_> = candidates
+    .iter()
+    .filter(|c| self.stats(bucket, c).samples() < u64::from(min_samples))
+    .collect();
+if !under_sampled.is_empty() {
+    let idx = rng.next_bounded(under_sampled.len());
     return Ok(Selection {
-        model: needs_data.clone(),
+        model: under_sampled[idx].clone(),
         bucket,
         reason: SelectionReason::WarmStart,
     });
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rust/crates/eval-router/src/lib.rs` around lines 521 - 530, The current
warm-start logic uses candidates.iter().find(...) which always picks the first
under-sampled model in input order (introducing bias); change it to collect all
candidates where self.stats(bucket, candidate).samples() <
u64::from(min_samples) and then choose one at random (or shuffle and pick first)
before returning Selection { model: ..., bucket, reason:
SelectionReason::WarmStart }; update the selection site that references
candidates and self.stats(...) to use a random selection strategy (e.g., via the
rand crate or an existing RNG) so all under-sampled models get balanced
warm-start traffic.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@rust/crates/runtime/src/config.rs`:
- Around line 622-625: The current epsilon() method silently clamps
epsilon_percent which hides bad configs; remove the min(100) clamp and make
epsilon() simply convert epsilon_percent to fraction (e.g., percent / 100.0) or
return None if absent, and add validation in parse_optional_router_config() to
reject/return an error if epsilonPercent > 100 (and similarly validate negative
values if applicable); update both occurrences referenced (around epsilon() and
the other instance at lines ~1086-1102) so parse_optional_router_config()
performs the bounded check and epsilon() is a straight conversion from the
epsilon_percent field.

In `@rust/crates/rusty-claude-cli/src/main.rs`:
- Around line 7080-7116: default_scoreboard_path() currently only falls back to
XDG_DATA_HOME/HOME (POSIX), causing build_eval_router() to disable eval-driven
routing on Windows; change default_scoreboard_path to be platform-aware by using
a cross-platform data directory helper (e.g., dirs::data_dir() or the
`directories` crate) as the primary fallback, and if that returns Some(dir)
append "claw/router-scoreboard.json"; retain XDG/HOME checks only as secondary
fallbacks so build_eval_router() receives a valid PathBuf on Windows without
requiring router.scoreboardPath to be set manually.
- Around line 7359-7418: cost_micros_for currently only accounts for
input_tokens and output_tokens and thus misses cache create/read charges; change
its signature to accept full usage (or additional cache token fields) — e.g.,
cost_micros_for(model: &str, input_tokens: u32, output_tokens: u32,
cache_read_tokens: u32, cache_write_tokens: u32) or accept the runtime::Usage
struct — and update the implementation to include cache read/write token counts
multiplied by pricing.cache_read_cost_per_million and
pricing.cache_write_cost_per_million (add them to input_micros/output_micros
before rounding). Then update all call sites that compute costs (notably where
claw eval output cost is computed and EvalRouterState::record(...) /
tokens_from_events is used) to pass through the cache token values from the
Usage/AssistantEvent::Usage so cache charges are included. Ensure numeric types
and rounding behavior remain consistent with the existing implementation.

In `@rust/crates/rusty-claude-cli/tests/router_e2e.rs`:
- Around line 138-145: The test
router_falls_back_to_second_candidate_when_first_fails is misleading because it
only seeds the scoreboard and never causes candidates[0] to fail; either rename
the test to reflect that it validates a warm-start scenario (e.g.,
router_uses_warm_candidate_when_available) or modify the test to inject a
failure for candidates[0] and assert the fallback branch: arrange a
mock/provider or request handler to return an error for candidate index 0, run
the turn, and assert that the router selects candidates[1] (and that
select_excluding or retry behavior is exercised) while keeping existing
references to scoreboard, candidates[0]/candidates[1], and any
select_excluding/eval-router hooks used in the test.

---

Duplicate comments:
In `@rust/crates/rusty-claude-cli/src/main.rs`:
- Around line 371-380: The top-level CLI help doesn't mention the new Eval
subcommand (the Eval enum variant), so update the top-level clap command
definition (where the CLI is constructed—e.g., the function that builds the
clap::Command or the struct implementing clap derives) to include usage and an
example for the eval subcommand; add a usage line like "claw eval --session
<SESSION> --output <FILE> [--max-turns N]" and an example invocation to the
top-level help/after_help/long_about so `claw --help` advertises Eval. Ensure
you reference the Eval variant in the subcommand list so it appears in the
printed subcommands.
- Around line 1330-1394: parse_eval_args currently treats "--help"/"-h" as
unknown; update the function (parse_eval_args) to detect "--help" and "-h" early
in the argument loop and short-circuit by returning the global help action
(return Ok(CliAction::Help) or the equivalent top-level help variant used in
your CLI enum) so "claw eval --help" shows help instead of erroring, and add a
regression test that calls the eval subcommand parser (or runs the CLI entry)
with ["eval", "--help"] / ["eval", "-h"] and asserts the help action/output is
produced; touch the parse_eval_args match to include the "--help" | "-h" arm and
add the test in the same test module where other CLI tests live.

---

Nitpick comments:
In `@rust/crates/eval-router/src/lib.rs`:
- Around line 369-378: The current TOCTOU occurs because code checks
path.exists() before fs::read; instead call fs::read(&path) unconditionally,
then match the Result: on Ok(bytes) return ScoreboardState::default() if
bytes.is_empty() else serde_json::from_slice(&bytes)?; on Err(e) if e.kind() ==
std::io::ErrorKind::NotFound return ScoreboardState::default() else propagate
the error; update the block that builds state (references: ScoreboardState,
path, fs::read, serde_json::from_slice) accordingly.
- Around line 521-530: The current warm-start logic uses
candidates.iter().find(...) which always picks the first under-sampled model in
input order (introducing bias); change it to collect all candidates where
self.stats(bucket, candidate).samples() < u64::from(min_samples) and then choose
one at random (or shuffle and pick first) before returning Selection { model:
..., bucket, reason: SelectionReason::WarmStart }; update the selection site
that references candidates and self.stats(...) to use a random selection
strategy (e.g., via the rand crate or an existing RNG) so all under-sampled
models get balanced warm-start traffic.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 11f1c5fa-429a-4133-b499-1b017a1b25c1

📥 Commits

Reviewing files that changed from the base of the PR and between 6f585aa and 46af183.

📒 Files selected for processing (5)
  • rust/crates/eval-router/src/lib.rs
  • rust/crates/runtime/src/config.rs
  • rust/crates/runtime/src/config_validate.rs
  • rust/crates/rusty-claude-cli/src/main.rs
  • rust/crates/rusty-claude-cli/tests/router_e2e.rs

Comment on lines +622 to +625
pub fn epsilon(&self) -> Option<f64> {
self.epsilon_percent
.map(|percent| f64::from(percent.min(100)) / 100.0)
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Reject out-of-range epsilonPercent instead of silently clamping it.

250 currently becomes 1.0 here, so a bad config is accepted and changes routing behavior instead of failing fast. Validate epsilonPercent <= 100 in parse_optional_router_config() and keep epsilon() as a straight conversion.

Suggested fix
     let epsilon_percent = optional_u32(entry, "epsilonPercent", "merged settings.router")?;
+    if let Some(percent) = epsilon_percent {
+        if percent > 100 {
+            return Err(ConfigError::Parse(
+                "merged settings.router.epsilonPercent: field must be between 0 and 100"
+                    .to_string(),
+            ));
+        }
+    }
     let min_samples = optional_u32(entry, "minSamples", "merged settings.router")?;
@@
     pub fn epsilon(&self) -> Option<f64> {
-        self.epsilon_percent
-            .map(|percent| f64::from(percent.min(100)) / 100.0)
+        self.epsilon_percent.map(|percent| f64::from(percent) / 100.0)
     }

Also applies to: 1086-1102

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rust/crates/runtime/src/config.rs` around lines 622 - 625, The current
epsilon() method silently clamps epsilon_percent which hides bad configs; remove
the min(100) clamp and make epsilon() simply convert epsilon_percent to fraction
(e.g., percent / 100.0) or return None if absent, and add validation in
parse_optional_router_config() to reject/return an error if epsilonPercent > 100
(and similarly validate negative values if applicable); update both occurrences
referenced (around epsilon() and the other instance at lines ~1086-1102) so
parse_optional_router_config() performs the bounded check and epsilon() is a
straight conversion from the epsilon_percent field.

Comment on lines +7080 to +7116
fn default_scoreboard_path() -> Option<PathBuf> {
env::var_os("XDG_DATA_HOME")
.map(|home| {
PathBuf::from(home)
.join("claw")
.join("router-scoreboard.json")
})
.or_else(|| {
env::var_os("HOME").map(|home| {
PathBuf::from(home)
.join(".local")
.join("share")
.join("claw")
.join("router-scoreboard.json")
})
})
}

fn build_eval_router(config: &runtime::RouterConfig) -> Option<EvalRouterState> {
if !config.enabled() || config.mode() != runtime::RouterMode::EvalDriven {
return None;
}
if config.candidates().is_empty() {
eprintln!(
"warning: router.mode=eval-driven requires router.candidates with at least one model; skipping"
);
return None;
}
let scoreboard_path = config
.scoreboard_path()
.map(PathBuf::from)
.or_else(default_scoreboard_path);
let Some(scoreboard_path) = scoreboard_path else {
eprintln!(
"warning: router.mode=eval-driven requires router.scoreboardPath when HOME/XDG_DATA_HOME are unset; skipping"
);
return None;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Use a cross-platform default scoreboard path.

This fallback is POSIX-only. On Windows, XDG_DATA_HOME and HOME are often unset, so build_eval_router() disables eval-driven routing unless router.scoreboardPath is set manually. Please use a platform-aware data dir fallback here (LOCALAPPDATA/APPDATA or a shared helper) so the default path works off Linux too.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rust/crates/rusty-claude-cli/src/main.rs` around lines 7080 - 7116,
default_scoreboard_path() currently only falls back to XDG_DATA_HOME/HOME
(POSIX), causing build_eval_router() to disable eval-driven routing on Windows;
change default_scoreboard_path to be platform-aware by using a cross-platform
data directory helper (e.g., dirs::data_dir() or the `directories` crate) as the
primary fallback, and if that returns Some(dir) append
"claw/router-scoreboard.json"; retain XDG/HOME checks only as secondary
fallbacks so build_eval_router() receives a valid PathBuf on Windows without
requiring router.scoreboardPath to be set manually.

Comment on lines +7359 to +7418
fn cost_micros_for(model: &str, input_tokens: u32, output_tokens: u32) -> u64 {
let Some(pricing) = runtime::pricing_for_model(model) else {
return 0;
};
// Pricing is per million tokens in dollars. Tokens * cost is
// micro-dollars (because dollars * 1e6 / 1e6 = dollars, but tokens
// are counts not ratios — multiplying tokens by per-million-dollar
// price yields micro-dollars directly).
let input_micros = f64::from(input_tokens) * pricing.input_cost_per_million;
let output_micros = f64::from(output_tokens) * pricing.output_cost_per_million;
let total = (input_micros + output_micros).round();
if total <= 0.0 {
0
} else {
#[allow(
clippy::cast_precision_loss,
clippy::cast_possible_truncation,
clippy::cast_sign_loss
)]
let as_u64 = total as u64;
as_u64
}
}

/// Length of the latest user message in characters — that's what bucketing
/// keys off. If the latest message is not from the user we fall back to
/// the total user-text length so a tool-result continuation still gets a
/// stable bucket.
fn prompt_bucket_for_request(request: &ApiRequest) -> PromptBucket {
use runtime::MessageRole as RtRole;
let latest_user_len = request
.messages
.iter()
.rev()
.find(|message| message.role == RtRole::User)
.map_or(0, |message| {
message
.blocks
.iter()
.filter_map(|block| match block {
runtime::ContentBlock::Text { text } => Some(text.chars().count()),
_ => None,
})
.sum::<usize>()
});
PromptBucket::from_prompt_chars(latest_user_len)
}

/// Extract the cumulative input/output token counts from the last
/// [`AssistantEvent::Usage`] emitted during a turn, if any.
fn tokens_from_events(events: &[AssistantEvent]) -> (u32, u32) {
events
.iter()
.rev()
.find_map(|event| match event {
AssistantEvent::Usage(usage) => Some((usage.input_tokens, usage.output_tokens)),
_ => None,
})
.unwrap_or((0, 0))
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Cost tracking drops cache-create/read billing.

runtime::pricing_for_model() includes cache creation/read rates, but this path only carries input_tokens and output_tokens. That underreports cost for cached turns in both claw eval output and EvalRouterState::record(...), which can skew the cost-based router stats. Please pass the full usage through here and include cache token charges in cost_micros_for().

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rust/crates/rusty-claude-cli/src/main.rs` around lines 7359 - 7418,
cost_micros_for currently only accounts for input_tokens and output_tokens and
thus misses cache create/read charges; change its signature to accept full usage
(or additional cache token fields) — e.g., cost_micros_for(model: &str,
input_tokens: u32, output_tokens: u32, cache_read_tokens: u32,
cache_write_tokens: u32) or accept the runtime::Usage struct — and update the
implementation to include cache read/write token counts multiplied by
pricing.cache_read_cost_per_million and pricing.cache_write_cost_per_million
(add them to input_micros/output_micros before rounding). Then update all call
sites that compute costs (notably where claw eval output cost is computed and
EvalRouterState::record(...) / tokens_from_events is used) to pass through the
cache token values from the Usage/AssistantEvent::Usage so cache charges are
included. Ensure numeric types and rounding behavior remain consistent with the
existing implementation.

Comment on lines +138 to +145
fn router_falls_back_to_second_candidate_when_first_fails() {
// We can't easily make the mock provider fail per-model, so exercise
// fallback at the scoreboard level: seed the scoreboard so
// candidates[0] is already warm and candidates[1] is not, then run a
// turn. We only want to confirm the wiring builds and doesn't
// regress happy-path behavior — the failing-first-candidate path is
// covered by unit tests in eval-router and by the select_excluding
// unit tests in main.rs.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

This test does not cover the failure fallback path it claims to.

The setup only seeds scoreboard state; it never makes candidates[0] fail, so retry/select-excluding regressions would still pass here. Please either rename the test to the warm-start scenario it actually exercises or add a failure injection and assert the fallback branch directly.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rust/crates/rusty-claude-cli/tests/router_e2e.rs` around lines 138 - 145, The
test router_falls_back_to_second_candidate_when_first_fails is misleading
because it only seeds the scoreboard and never causes candidates[0] to fail;
either rename the test to reflect that it validates a warm-start scenario (e.g.,
router_uses_warm_candidate_when_available) or modify the test to inject a
failure for candidates[0] and assert the fallback branch: arrange a
mock/provider or request handler to return an error for candidate index 0, run
the turn, and assert that the router selects candidates[1] (and that
select_excluding or retry behavior is exercised) while keeping existing
references to scoreboard, candidates[0]/candidates[1], and any
select_excluding/eval-router hooks used in the test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants