e6qu · e6qu · Mar 17, 2026 · Mar 17, 2026 · Mar 17, 2026
diff --git a/AGENTS.md b/AGENTS.md
@@ -2,6 +2,7 @@
 - ALWAYS USE PARALLEL TOOLS WHEN APPLICABLE.
 - The default branch in this repo is `dev`.
 - Local `main` ref may not exist; use `dev` or `origin/dev` for diffs.
+- **NEVER commit directly to `dev`.** Always create a feature branch, rebase on `origin/dev`, and open a PR. No exceptions.
 - Prefer automation: execute requested actions without confirmation unless blocked by missing info or safety/irreversibility.
 
 ## Style Guide
@@ -153,8 +154,12 @@ Use `context_history` to navigate the edit DAG:
 - Test actual implementation, do not duplicate logic into tests
 - Tests cannot run from repo root (guard: `do-not-run-tests-from-root`); run from package dirs like `packages/opencode`.
 
-## Git Hooks
+## Git Workflow
 
+- **NEVER commit or push directly to `dev`.** Always work on a feature/fix/docs branch and create a PR.
+- All changes go through PRs — no exceptions, not even "quick fixes" or docs-only changes.
+- **Always rebase on `origin/dev` before creating a PR.** Run `git fetch origin && git rebase origin/dev` on your branch first. No exceptions.
+- When creating a PR with `gh pr create`, always target the `origin` repo explicitly: `gh pr create --repo e6qu/frankencode --base dev`.
 - NEVER bypass pre-commit hooks. No `HUSKY=0`, no `--no-verify`. Fix the issue instead.
 - Pre-commit runs: prettier format, typecheck, tests. All must pass before commit.
 - Commit messages must follow conventional commits (`feat:`, `fix:`, `chore:`, etc).

diff --git a/DO_NEXT.md b/DO_NEXT.md
@@ -1,65 +1,50 @@
 # Frankencode — Do Next
 
-## Completed
-
-- [x] Plan Mode Fixes (removed experimental flag, enabled plan_enter tool)
-- [x] **Verification tool** implementation
-- [x] Fixed `focus-rewrite-history` agent missing tool permissions
-- [x] Added `verification` config schema to config.ts
-- [x] Fixed TypeScript errors in verify.ts
-- [x] Added VerifyTool to registry
-- [x] Added `/verify` command
-- [x] Typecheck passes
-- [x] All 1384 tests pass
-- [x] Progressive Disclosure for Skills - lazy load content on demand
-- [x] Evaluator-Optimizer - evaluator/optimizer agents + refine tool
-- [x] Skills as Scripts - scripts in skill directories become callable tools
-- [x] Code review — found 16 bugs (#21-#36)
-
-## In Progress — Bug Fix Pass
-
-### P0 — Critical (do first)
-
-- [ ] **#28** Refine tool: pass `git diff` output or changed file paths into evaluator prompt
-- [ ] **#29** Refine tool: verify `tools: {}` behavior — if it blocks tools, pass correct tool set
-- [ ] **#32** Skill template: verify all `template` consumers handle `Promise<string>`
-
-### P1 — High
-
-- [ ] **#21** Circuit breaker: move `this.lastFailure = now` before the throw
-- [ ] **#27** Verify config: replace shallow spread with deep merge (`mergeDeep` from remeda)
-- [ ] **#36** Evaluator agent: remove `bash: "allow"` from permission set
-
-### P2 — Medium
-
-- [ ] **#22** Circuit breaker: call `breaker.reset()` after successful check
-- [ ] **#24** Circuit breaker: increase default cooldownMs to 30000+
-- [ ] **#25** Verify tool: implement `scope`/`files` filtering or remove unused params
-- [ ] **#30** Refine parseEvaluation: add fallback parsing for score/passed
-- [ ] **#34** Scripts: add argument validation/sanitization
-
-### P3 — Low (can defer)
-
-- [ ] **#23** Circuit breaker: rename `open` to `closed` or `tripped`
-- [ ] **#26** Verify: use shell-word splitter instead of `split(" ")`
-- [ ] **#31** Refine: add child session cleanup after loop
-- [ ] **#33** Skill.get(): add content caching layer
-- [ ] **#35** Scripts: use `::` separator for tool IDs
-
-## Backlog — Testing
-
-- [ ] Unit tests for VerifyTool (circuit-breaker, config loading, error parsing)
-- [ ] Unit tests for RefineTool (evaluation parsing, iteration loop)
-- [ ] Unit tests for Scripts (discovery, tool generation)
-- [ ] Unit tests for CAS, filterEdited, EditGraph, SideThread CRUD
-- [ ] Test classifier_threads + distill_threads
-- [ ] Test /btw command
-- [ ] CAS garbage collection
-- [ ] TUI rendering of edit indicators
-
-## Backlog — Documentation
-
-- [ ] Document /verify command in user guide
-- [ ] Document refine tool usage patterns
-- [ ] Document skill scripts feature
-- [ ] Update README with new features
+## Implemented
+
+- [x] CAS (SQLite) + Part Editing (EditMeta, LifecycleMeta, filterEdited, context_edit, context_deref)
+- [x] Conversation Graph (edit_graph DAG, context_history with log/tree/checkout/fork)
+- [x] Focus Agent + Side Threads (side_thread table, thread_park, thread_list, classifier, focus agents)
+- [x] Integration (system prompt injection, plugin hooks, lifecycle sweeper)
+- [x] v2: query/toolName targeting, classifier_threads, distill_threads, /btw, /focus, /reset-context
+- [x] Config-based control (no feature toggles)
+- [x] Documentation (README, docs/context-editing, docs/schema, docs/agents, AGENTS.md)
+- [x] Ephemeral commands (/threads, /history, /tree, /deref, /classify)
+- [x] /cost TUI command with usage dialog
+- [x] Verify tool (test/lint/typecheck with circuit breaker)
+- [x] Refine tool (evaluator-optimizer loop)
+- [x] Script discovery and execution from skills
+- [x] 40 bugs fixed (code review audits + ephemeral fixes)
+- [x] 25 regression tests for bug fixes
+
+## Next — Upstream Sync
+
+The upstream `anomalyco/opencode` has diverged significantly (~50 commits). Key conflict areas:
+
+- [ ] **Rebase onto upstream/dev** — resolve conflicts in `skill.ts` (Effect service rewrite), `prompt.ts`, `message-v2.ts`, `instance.ts`
+- [ ] **Adapt to Effect-ification** — upstream moved to `LayerMap` and scoped services for Skill, File, Format, VCS, FileTime, FileWatcher; our `Instance.state()` usage in `skill.ts` may need to adapt to `SkillService`
+- [ ] **Verify `instance-state.ts` deletion** — upstream deleted this; check if our code depends on it (used by `Skill.state`, `Command.state`)
+- [ ] **Test after rebase** — run full suite, fix any breakage from upstream changes
+
+## Next — Testing
+
+- [ ] Unit tests for CAS (store, get, dedup via ON CONFLICT)
+- [ ] Unit tests for filterEdited (hidden parts stripped, empty messages dropped)
+- [ ] Unit tests for EditGraph (commit chain, log walk, checkout restore)
+- [ ] Unit tests for SideThread CRUD
+- [ ] Unit tests for ContextEdit validation (ownership, budget, recency, privileged agents)
+- [ ] Unit tests for lifecycle sweeper (discardable auto-hide, ephemeral auto-externalize)
+- [ ] Test classifier_threads + distill_threads with a real session
+- [ ] Test /btw command (verify it forks, doesn't pollute main thread)
+
+## Next — Features
+
+- [ ] CAS garbage collection (orphan cleanup, size limits)
+- [ ] TUI rendering of edit indicators (hidden/replaced/annotated parts)
+- [ ] Session.remove() cleanup of EditGraph rows (add CASCADE or explicit delete)
+- [ ] CAS.store() ownership: stop overwriting session_id on hash collision
+
+## Next — Design Decisions
+
+- [ ] Explore: make /btw use Session.fork() for true message-level isolation
+- [ ] Evaluate upstream's `tools` deprecation and migration to permission-only model
diff --git a/PLAN.md b/PLAN.md
@@ -1,140 +1,74 @@
 # Frankencode Feature Roadmap
 
-Features derived from Claude Blog research (Sept 2025 - Mar 2026) + plan mode fixes + circuit-breaker.
+> **Frankencode** is a fork of [OpenCode](https://github.com/anomalyco/opencode) (`dev` branch) that adds surgical, reversible, agent-driven context editing with content-addressable storage and a conversation history graph.
+
+**Status (2026-03-18):** All features implemented. 40 bugs fixed. 1401 tests passing. See `STATUS.md` for current state, `DO_NEXT.md` for what's next.
 
 ---
 
-## 0. Plan Mode Fixes ✅ COMPLETE
+## Next: Upstream Rebase Plan
 
-**Done**: Removed experimental flag, enable `plan_enter` tool, ensure seamless mode switching.
+The upstream `anomalyco/opencode` has diverged by ~50 commits. The major change is an **Effect-ification wave** that rewrites services (Skill, File, Format, VCS, etc.) from namespace-based modules to Effect scoped services with `LayerMap`.
 
-### Files Modified
+### High-conflict files (require manual resolution):
 
-| File | Change |
-| `src/tool/registry.ts` | Remove flag check, always include `PlanExitTool`, add `PlanEnterTool` |
-| `src/session/prompt.ts` | Remove experimental flag branching, use "new" plan mode logic |
-| `src/tool/plan.ts` | Uncomment `PlanEnterTool`, add import for `ENTER_DESCRIPTION` |
-| `src/config/config.ts` | Add `verification` config schema |
+| File | Upstream change | Our change | Strategy |
+|------|----------------|------------|----------|
+| `skill/skill.ts` | Rewritten to `SkillService` (Effect) | Content cache added | Reimplement cache inside Effect service |
+| `session/prompt.ts` | ~99 lines changed | +filterEdited, +filterEphemeral, +focus injection | Apply our additions to new upstream base |
+| `session/message-v2.ts` | ~107 lines changed | +EditMeta, +LifecycleMeta, +filterEdited | Merge schema additions into new shape |
+| `project/instance.ts` | Refactored, `instance-state.ts` deleted | We use `Instance.state()` | Adapt to new Instance API |
 
----
+### Low-conflict files (additive, straightforward merge):
 
-## 1. Verification Tool ✅ COMPLETE
+All Frankencode-only files (CAS, edit graph, context tools, side threads, agents) should merge cleanly since upstream doesn't have them.
 
-**Problem**: Generated code often passes the "looks right" test but fails on execution, edge cases, or style requirements. Agents need a way to self-verify their work.
+---
 
-**Solution**: A verification tool that agents can call to validate their work against objective criteria. Also exposed as slash command and CLI command.
+## 0. Plan Mode Fixes ✅ COMPLETE
 
-### Files
+**Done**: Removed experimental flag, enable `plan_enter` tool, ensure seamless mode switching.
 
-| File                              | Status      |
-| --------------------------------- | ----------- |
-| `src/tool/verify.ts`              | ✅ COMPLETE |
-| `src/config/config.ts`            | ✅ COMPLETE |
-| `src/tool/registry.ts`            | ✅ COMPLETE |
-| `src/command/index.ts`            | ✅ COMPLETE |
-| `src/command/template/verify.txt` | ✅ COMPLETE |
+---
 
-### Completed Tasks
+## 1. Verification Tool ✅ COMPLETE
 
-- [x] Fix LSP errors in `src/tool/verify.ts`
-- [x] Import `Config` from `../config/config`
-- [x] Add `VerifyTool` to `src/tool/registry.ts`
-- [x] Create `/verify` command template
-- [x] Add `/verify` command to `src/command/index.ts`
-- [x] Typecheck passes
-- [x] All 1384 tests pass
+Verification tool with circuit-breaker for test/lint/typecheck. Exposed as `/verify` command.
 
 ---
 
 ## 2. Progressive Disclosure for Skills ✅ COMPLETE
 
-**Problem**: Loading many skills bloats context window.
-
-**Solution**: Three-tier loading — metadata always, full content on demand.
-
-### Tasks
-
-- [x] Update skill schema to separate `Meta` (name, description, location) from `Loaded` (includes content)
-- [x] Implement lazy content loading in `Skill.get()`
-- [x] Update `all()` and `available()` to return `Meta[]` without content
-- [x] Update command registration to lazy-load skill content
-- [x] Typecheck passes
+Three-tier loading — metadata always, full content on demand. `Meta` vs `Loaded` types.
 
 ---
 
 ## 3. Skills as Scripts ✅ COMPLETE
 
-- [x] Create `src/skill/scripts.ts`
-- [x] Integrate with registry
-- [x] Typecheck passes
+Scripts in skill `scripts/` directories become callable tools. Auto-discovered and registered.
 
 ---
 
 ## 4. Evaluator-Optimizer ✅ COMPLETE
 
-- [x] Create evaluator/optimizer agents
-- [x] Create `src/tool/refine.ts`
-- [x] Add to tool registry
-- [x] Typecheck passes
-
----
-
-## 5. Bug Fix Pass ⬜ IN PROGRESS
-
-**Problem**: Code review on 2026-03-17 found 16 bugs across new features (2 critical, 4 high, 6 medium, 4 low). See `BUGS.md` for full details.
-
-### P0 — Critical (blocks feature correctness)
-
-- [ ] **#28** Refine tool: evaluator/optimizer have no visibility into actual changes — include `git diff` or file paths in prompt
-- [ ] **#29** Refine tool: `tools: {}` may prevent agents from using tools — verify behavior and fix
-- [ ] **#32** Skill template returns `Promise<string>` not `string` — verify consumer compatibility
-
-### P1 — High (incorrect behavior)
-
-- [ ] **#21** Circuit breaker `lastFailure` set after throw — move assignment before throw
-- [ ] **#27** Verify config shallow merge loses nested keys — use deep merge
-- [ ] **#36** Evaluator agent has bash access — remove or restrict
-
-### P2 — Medium (suboptimal behavior)
-
-- [ ] **#22** Circuit breaker never resets on success — add `breaker.reset()` on pass
-- [ ] **#24** Default cooldown (1s) effectively zero — increase to 30s+
-- [ ] **#25** `scope`/`files`/`criteria` params unused — implement or remove
-- [ ] **#30** `parseEvaluation` brittle against LLM template echoing — improve parsing
-- [ ] **#34** Scripts: arbitrary argument injection — add validation
-- [ ] **#36** Evaluator agent has bash (also medium from security angle)
-
-### P3 — Low (cosmetic/minor)
-
-- [ ] **#23** Circuit breaker inverted `open` semantics — rename to `closed` or `tripped`
-- [ ] **#26** `command.split(" ")` breaks quoted args — use shell-word splitter
-- [ ] **#31** Refine child sessions never cleaned up — add cleanup
-- [ ] **#33** `Skill.get()` re-parses file every call — add content cache
-- [ ] **#35** Scripts tool ID collision — use `::` separator
+Evaluator agent reviews changes, optimizer improves based on feedback. Refine tool orchestrates the loop.
 
 ---
 
-## Priority & Dependencies
+## 5. Bug Fix Pass ✅ COMPLETE
 
-| Feature                | Priority | Status        | Dependencies      |
-| ---------------------- | -------- | ------------- | ----------------- |
-| Plan Mode Fixes        | P0       | ✅ Complete   | None              |
-| Verification Tool      | P1       | ⚠️ Has bugs   | None              |
-| Progressive Disclosure | P1       | ⚠️ Has bugs   | None              |
-| Skills as Scripts      | P2       | ⚠️ Has bugs   | None              |
-| Evaluator-Optimizer    | P2       | ⚠️ Has bugs   | Verification Tool |
-| Bug Fix Pass           | P0       | ⬜ Not started | All above         |
+All 16 bugs (#21-#36) from the 2026-03-17 code review fixed in PR #12. 25 regression tests added.
 
 ---
 
-## Estimated LOC
-
-| Feature                | ~LOC | Actual |
-| ---------------------- | ---- | ------ |
-| Plan Mode Fixes        | ~20  | ~20    |
-| Verification Tool      | ~120 | 259    |
-| Progressive Disclosure | ~60  | ~30    |
-| Skills as Scripts      | ~100 | 107    |
-| Evaluator-Optimizer    | ~150 | 207    |
-| Bug Fix Pass           | ~100 | TBD    |
-| **Total**              | ~630 | ~623+  |
+## Summary
+
+| Feature                | Status        |
+| ---------------------- | ------------- |
+| Plan Mode Fixes        | ✅ Complete   |
+| Verification Tool      | ✅ Complete   |
+| Progressive Disclosure | ✅ Complete   |
+| Skills as Scripts      | ✅ Complete   |
+| Evaluator-Optimizer    | ✅ Complete   |
+| Bug Fix Pass (16 bugs) | ✅ Complete   |
+| Upstream Rebase        | ⬜ Next       |
diff --git a/STATUS.md b/STATUS.md
@@ -1,37 +1,58 @@
-# Frankencode Status
+# Frankencode — Project Status
 
-## Session: 2026-03-17
+**Date:** 2026-03-18
+**Upstream:** `anomalyco/opencode` @ `dev`
+**Fork:** `e6qu/frankencode` @ `dev`
 
-### Completed
+## Overview
 
-- ✅ Plan Mode Fixes (removed experimental flag, enabled plan_enter tool)
-- ✅ Fixed `focus-rewrite-history` agent missing tool permissions
-- ✅ Researched 6 months of Claude Blog posts
-- ✅ Designed 4 new features (Verification, Progressive Disclosure, Skills as Scripts, Evaluator-Optimizer)
-- ✅ Verification Tool (`/verify` command with circuit-breaker)
-- ✅ Progressive Disclosure for Skills (lazy load content on demand)
-- ✅ Evaluator-Optimizer (evaluator/optimizer agents + refine tool)
-- ✅ Skills as Scripts (scripts in skill directories become callable tools)
-- ✅ Code review of all new features — found 16 bugs (2 critical, 4 high, 6 medium, 4 low)
+Frankencode is a fork of OpenCode that adds surgical, reversible, agent-driven context editing with content-addressable storage and a conversation history graph. All 4 planned phases are implemented. Currently in hardening/testing phase.
 
-### In Progress
+## Branch Status
 
-- ⬜ Bug fix pass — 16 bugs logged in `BUGS.md` (#21-#36)
+| Branch | Status | PR |
+|--------|--------|----|
+| `dev` | Main development branch | — |
+| `fix/code-review-bugs` | 16 bug fixes + 25 tests | [#12](https://github.com/e6qu/frankencode/pull/12) (merged) |
+| `docs/upstream-sync-notes` | Docs update with upstream analysis | [#13](https://github.com/e6qu/frankencode/pull/13) |
 
-### Blocked
+## Upstream Divergence
 
-- Refine tool (#28, #29) — fundamentally incomplete, evaluator/optimizer have no context about actual changes
+- **10 commits ahead** of upstream (Frankencode features)
+- **~50 commits behind** upstream (Effect refactors, bug fixes, model updates)
 
-### Critical Issues to Resolve Before Merge
+### Upstream changes requiring attention:
 
-1. **Refine tool is non-functional** — evaluator receives no code context, optimizer may have no tools
-2. **Skill template returns Promise** — may inject `"[object Promise]"` into prompts
-3. **Verify circuit breaker has 4 interacting bugs** — lastFailure timing, no success reset, 1s cooldown, shallow config merge
+1. **Effect-ification** — `SkillService`, `FileService`, `FormatService`, `VcsService`, etc. refactored to Effect scoped services
+2. **`instance-state.ts` deleted** — our `Instance.state()` usage needs review
+3. **`skill.ts` rewritten** (333 lines changed) — conflicts with our content cache
+4. **`prompt.ts` changed** (~99 lines) — conflicts with our filterEdited/filterEphemeral pipeline
+5. **`message-v2.ts` changed** (~107 lines) — conflicts with our EditMeta/LifecycleMeta additions
 
-### Next Steps
+## Test Status
 
-1. Fix P0 critical bugs (#28, #29, #32)
-2. Fix P1 high bugs (#21, #27, #36)
-3. Fix P2 medium bugs (#22, #24, #25, #30, #34)
-4. Unit tests for all new features
-5. Integration testing
+- **1401 tests passing**, 0 failures, 8 skipped
+- **25 new regression tests** for bug fixes (verify, refine, scripts, skill cache, agent permissions)
+- **Typecheck:** clean (`bun typecheck`)
+
+## Bug Status
+
+- **0 active bugs**
+- **40 bugs fixed** (tracked in BUGS.md)
+- **4 open design issues** (CAS GC, objective staleness, EditGraph leak, CAS ownership)
+
+## Feature Inventory
+
+| Feature | Status | Files |
+|---------|--------|-------|
+| Content-Addressable Store | Done | `src/cas/` |
+| Context editing (6 operations) | Done | `src/context-edit/`, `src/tool/context-edit.ts` |
+| Edit graph (DAG history) | Done | `src/cas/graph.ts`, `src/tool/context-history.ts` |
+| Side threads | Done | `src/session/side-thread.ts`, `src/tool/thread-*.ts` |
+| Focus agent | Done | `src/agent/agent.ts`, `src/agent/prompt/focus.txt` |
+| Classifier + distill | Done | `src/tool/classifier-threads.ts`, `src/tool/distill-threads.ts` |
+| Ephemeral commands | Done | `src/command/index.ts`, `src/session/prompt.ts` |
+| Verify tool | Done | `src/tool/verify.ts` |
+| Refine tool | Done | `src/tool/refine.ts` |
+| Script discovery | Done | `src/skill/scripts.ts` |
+| /cost command | Done | TUI dialog |