Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 6 additions & 1 deletion AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
- ALWAYS USE PARALLEL TOOLS WHEN APPLICABLE.
- The default branch in this repo is `dev`.
- Local `main` ref may not exist; use `dev` or `origin/dev` for diffs.
- **NEVER commit directly to `dev`.** Always create a feature branch, rebase on `origin/dev`, and open a PR. No exceptions.
- Prefer automation: execute requested actions without confirmation unless blocked by missing info or safety/irreversibility.

## Style Guide
Expand Down Expand Up @@ -153,8 +154,12 @@ Use `context_history` to navigate the edit DAG:
- Test actual implementation, do not duplicate logic into tests
- Tests cannot run from repo root (guard: `do-not-run-tests-from-root`); run from package dirs like `packages/opencode`.

## Git Hooks
## Git Workflow

- **NEVER commit or push directly to `dev`.** Always work on a feature/fix/docs branch and create a PR.
- All changes go through PRs — no exceptions, not even "quick fixes" or docs-only changes.
- **Always rebase on `origin/dev` before creating a PR.** Run `git fetch origin && git rebase origin/dev` on your branch first. No exceptions.
- When creating a PR with `gh pr create`, always target the `origin` repo explicitly: `gh pr create --repo e6qu/frankencode --base dev`.
- NEVER bypass pre-commit hooks. No `HUSKY=0`, no `--no-verify`. Fix the issue instead.
- Pre-commit runs: prettier format, typecheck, tests. All must pass before commit.
- Commit messages must follow conventional commits (`feat:`, `fix:`, `chore:`, etc).
Expand Down
111 changes: 48 additions & 63 deletions DO_NEXT.md
Original file line number Diff line number Diff line change
@@ -1,65 +1,50 @@
# Frankencode — Do Next

## Completed

- [x] Plan Mode Fixes (removed experimental flag, enabled plan_enter tool)
- [x] **Verification tool** implementation
- [x] Fixed `focus-rewrite-history` agent missing tool permissions
- [x] Added `verification` config schema to config.ts
- [x] Fixed TypeScript errors in verify.ts
- [x] Added VerifyTool to registry
- [x] Added `/verify` command
- [x] Typecheck passes
- [x] All 1384 tests pass
- [x] Progressive Disclosure for Skills - lazy load content on demand
- [x] Evaluator-Optimizer - evaluator/optimizer agents + refine tool
- [x] Skills as Scripts - scripts in skill directories become callable tools
- [x] Code review — found 16 bugs (#21-#36)

## In Progress — Bug Fix Pass

### P0 — Critical (do first)

- [ ] **#28** Refine tool: pass `git diff` output or changed file paths into evaluator prompt
- [ ] **#29** Refine tool: verify `tools: {}` behavior — if it blocks tools, pass correct tool set
- [ ] **#32** Skill template: verify all `template` consumers handle `Promise<string>`

### P1 — High

- [ ] **#21** Circuit breaker: move `this.lastFailure = now` before the throw
- [ ] **#27** Verify config: replace shallow spread with deep merge (`mergeDeep` from remeda)
- [ ] **#36** Evaluator agent: remove `bash: "allow"` from permission set

### P2 — Medium

- [ ] **#22** Circuit breaker: call `breaker.reset()` after successful check
- [ ] **#24** Circuit breaker: increase default cooldownMs to 30000+
- [ ] **#25** Verify tool: implement `scope`/`files` filtering or remove unused params
- [ ] **#30** Refine parseEvaluation: add fallback parsing for score/passed
- [ ] **#34** Scripts: add argument validation/sanitization

### P3 — Low (can defer)

- [ ] **#23** Circuit breaker: rename `open` to `closed` or `tripped`
- [ ] **#26** Verify: use shell-word splitter instead of `split(" ")`
- [ ] **#31** Refine: add child session cleanup after loop
- [ ] **#33** Skill.get(): add content caching layer
- [ ] **#35** Scripts: use `::` separator for tool IDs

## Backlog — Testing

- [ ] Unit tests for VerifyTool (circuit-breaker, config loading, error parsing)
- [ ] Unit tests for RefineTool (evaluation parsing, iteration loop)
- [ ] Unit tests for Scripts (discovery, tool generation)
- [ ] Unit tests for CAS, filterEdited, EditGraph, SideThread CRUD
- [ ] Test classifier_threads + distill_threads
- [ ] Test /btw command
- [ ] CAS garbage collection
- [ ] TUI rendering of edit indicators

## Backlog — Documentation

- [ ] Document /verify command in user guide
- [ ] Document refine tool usage patterns
- [ ] Document skill scripts feature
- [ ] Update README with new features
## Implemented

- [x] CAS (SQLite) + Part Editing (EditMeta, LifecycleMeta, filterEdited, context_edit, context_deref)
- [x] Conversation Graph (edit_graph DAG, context_history with log/tree/checkout/fork)
- [x] Focus Agent + Side Threads (side_thread table, thread_park, thread_list, classifier, focus agents)
- [x] Integration (system prompt injection, plugin hooks, lifecycle sweeper)
- [x] v2: query/toolName targeting, classifier_threads, distill_threads, /btw, /focus, /reset-context
- [x] Config-based control (no feature toggles)
- [x] Documentation (README, docs/context-editing, docs/schema, docs/agents, AGENTS.md)
- [x] Ephemeral commands (/threads, /history, /tree, /deref, /classify)
- [x] /cost TUI command with usage dialog
- [x] Verify tool (test/lint/typecheck with circuit breaker)
- [x] Refine tool (evaluator-optimizer loop)
- [x] Script discovery and execution from skills
- [x] 40 bugs fixed (code review audits + ephemeral fixes)
- [x] 25 regression tests for bug fixes

## Next — Upstream Sync

The upstream `anomalyco/opencode` has diverged significantly (~50 commits). Key conflict areas:

- [ ] **Rebase onto upstream/dev** — resolve conflicts in `skill.ts` (Effect service rewrite), `prompt.ts`, `message-v2.ts`, `instance.ts`
- [ ] **Adapt to Effect-ification** — upstream moved to `LayerMap` and scoped services for Skill, File, Format, VCS, FileTime, FileWatcher; our `Instance.state()` usage in `skill.ts` may need to adapt to `SkillService`
- [ ] **Verify `instance-state.ts` deletion** — upstream deleted this; check if our code depends on it (used by `Skill.state`, `Command.state`)
- [ ] **Test after rebase** — run full suite, fix any breakage from upstream changes

## Next — Testing

- [ ] Unit tests for CAS (store, get, dedup via ON CONFLICT)
- [ ] Unit tests for filterEdited (hidden parts stripped, empty messages dropped)
- [ ] Unit tests for EditGraph (commit chain, log walk, checkout restore)
- [ ] Unit tests for SideThread CRUD
- [ ] Unit tests for ContextEdit validation (ownership, budget, recency, privileged agents)
- [ ] Unit tests for lifecycle sweeper (discardable auto-hide, ephemeral auto-externalize)
- [ ] Test classifier_threads + distill_threads with a real session
- [ ] Test /btw command (verify it forks, doesn't pollute main thread)

## Next — Features

- [ ] CAS garbage collection (orphan cleanup, size limits)
- [ ] TUI rendering of edit indicators (hidden/replaced/annotated parts)
- [ ] Session.remove() cleanup of EditGraph rows (add CASCADE or explicit delete)
- [ ] CAS.store() ownership: stop overwriting session_id on hash collision

## Next — Design Decisions

- [ ] Explore: make /btw use Session.fork() for true message-level isolation
- [ ] Evaluate upstream's `tools` deprecation and migration to permission-only model
138 changes: 36 additions & 102 deletions PLAN.md
Original file line number Diff line number Diff line change
@@ -1,140 +1,74 @@
# Frankencode Feature Roadmap

Features derived from Claude Blog research (Sept 2025 - Mar 2026) + plan mode fixes + circuit-breaker.
> **Frankencode** is a fork of [OpenCode](https://github.com/anomalyco/opencode) (`dev` branch) that adds surgical, reversible, agent-driven context editing with content-addressable storage and a conversation history graph.

**Status (2026-03-18):** All features implemented. 40 bugs fixed. 1401 tests passing. See `STATUS.md` for current state, `DO_NEXT.md` for what's next.

---

## 0. Plan Mode Fixes ✅ COMPLETE
## Next: Upstream Rebase Plan

**Done**: Removed experimental flag, enable `plan_enter` tool, ensure seamless mode switching.
The upstream `anomalyco/opencode` has diverged by ~50 commits. The major change is an **Effect-ification wave** that rewrites services (Skill, File, Format, VCS, etc.) from namespace-based modules to Effect scoped services with `LayerMap`.

### Files Modified
### High-conflict files (require manual resolution):

| File | Change |
| `src/tool/registry.ts` | Remove flag check, always include `PlanExitTool`, add `PlanEnterTool` |
| `src/session/prompt.ts` | Remove experimental flag branching, use "new" plan mode logic |
| `src/tool/plan.ts` | Uncomment `PlanEnterTool`, add import for `ENTER_DESCRIPTION` |
| `src/config/config.ts` | Add `verification` config schema |
| File | Upstream change | Our change | Strategy |
|------|----------------|------------|----------|
| `skill/skill.ts` | Rewritten to `SkillService` (Effect) | Content cache added | Reimplement cache inside Effect service |
| `session/prompt.ts` | ~99 lines changed | +filterEdited, +filterEphemeral, +focus injection | Apply our additions to new upstream base |
| `session/message-v2.ts` | ~107 lines changed | +EditMeta, +LifecycleMeta, +filterEdited | Merge schema additions into new shape |
| `project/instance.ts` | Refactored, `instance-state.ts` deleted | We use `Instance.state()` | Adapt to new Instance API |

---
### Low-conflict files (additive, straightforward merge):

## 1. Verification Tool ✅ COMPLETE
All Frankencode-only files (CAS, edit graph, context tools, side threads, agents) should merge cleanly since upstream doesn't have them.

**Problem**: Generated code often passes the "looks right" test but fails on execution, edge cases, or style requirements. Agents need a way to self-verify their work.
---

**Solution**: A verification tool that agents can call to validate their work against objective criteria. Also exposed as slash command and CLI command.
## 0. Plan Mode Fixes ✅ COMPLETE

### Files
**Done**: Removed experimental flag, enable `plan_enter` tool, ensure seamless mode switching.

| File | Status |
| --------------------------------- | ----------- |
| `src/tool/verify.ts` | ✅ COMPLETE |
| `src/config/config.ts` | ✅ COMPLETE |
| `src/tool/registry.ts` | ✅ COMPLETE |
| `src/command/index.ts` | ✅ COMPLETE |
| `src/command/template/verify.txt` | ✅ COMPLETE |
---

### Completed Tasks
## 1. Verification Tool ✅ COMPLETE

- [x] Fix LSP errors in `src/tool/verify.ts`
- [x] Import `Config` from `../config/config`
- [x] Add `VerifyTool` to `src/tool/registry.ts`
- [x] Create `/verify` command template
- [x] Add `/verify` command to `src/command/index.ts`
- [x] Typecheck passes
- [x] All 1384 tests pass
Verification tool with circuit-breaker for test/lint/typecheck. Exposed as `/verify` command.

---

## 2. Progressive Disclosure for Skills ✅ COMPLETE

**Problem**: Loading many skills bloats context window.

**Solution**: Three-tier loading — metadata always, full content on demand.

### Tasks

- [x] Update skill schema to separate `Meta` (name, description, location) from `Loaded` (includes content)
- [x] Implement lazy content loading in `Skill.get()`
- [x] Update `all()` and `available()` to return `Meta[]` without content
- [x] Update command registration to lazy-load skill content
- [x] Typecheck passes
Three-tier loading — metadata always, full content on demand. `Meta` vs `Loaded` types.

---

## 3. Skills as Scripts ✅ COMPLETE

- [x] Create `src/skill/scripts.ts`
- [x] Integrate with registry
- [x] Typecheck passes
Scripts in skill `scripts/` directories become callable tools. Auto-discovered and registered.

---

## 4. Evaluator-Optimizer ✅ COMPLETE

- [x] Create evaluator/optimizer agents
- [x] Create `src/tool/refine.ts`
- [x] Add to tool registry
- [x] Typecheck passes

---

## 5. Bug Fix Pass ⬜ IN PROGRESS

**Problem**: Code review on 2026-03-17 found 16 bugs across new features (2 critical, 4 high, 6 medium, 4 low). See `BUGS.md` for full details.

### P0 — Critical (blocks feature correctness)

- [ ] **#28** Refine tool: evaluator/optimizer have no visibility into actual changes — include `git diff` or file paths in prompt
- [ ] **#29** Refine tool: `tools: {}` may prevent agents from using tools — verify behavior and fix
- [ ] **#32** Skill template returns `Promise<string>` not `string` — verify consumer compatibility

### P1 — High (incorrect behavior)

- [ ] **#21** Circuit breaker `lastFailure` set after throw — move assignment before throw
- [ ] **#27** Verify config shallow merge loses nested keys — use deep merge
- [ ] **#36** Evaluator agent has bash access — remove or restrict

### P2 — Medium (suboptimal behavior)

- [ ] **#22** Circuit breaker never resets on success — add `breaker.reset()` on pass
- [ ] **#24** Default cooldown (1s) effectively zero — increase to 30s+
- [ ] **#25** `scope`/`files`/`criteria` params unused — implement or remove
- [ ] **#30** `parseEvaluation` brittle against LLM template echoing — improve parsing
- [ ] **#34** Scripts: arbitrary argument injection — add validation
- [ ] **#36** Evaluator agent has bash (also medium from security angle)

### P3 — Low (cosmetic/minor)

- [ ] **#23** Circuit breaker inverted `open` semantics — rename to `closed` or `tripped`
- [ ] **#26** `command.split(" ")` breaks quoted args — use shell-word splitter
- [ ] **#31** Refine child sessions never cleaned up — add cleanup
- [ ] **#33** `Skill.get()` re-parses file every call — add content cache
- [ ] **#35** Scripts tool ID collision — use `::` separator
Evaluator agent reviews changes, optimizer improves based on feedback. Refine tool orchestrates the loop.

---

## Priority & Dependencies
## 5. Bug Fix Pass ✅ COMPLETE

| Feature | Priority | Status | Dependencies |
| ---------------------- | -------- | ------------- | ----------------- |
| Plan Mode Fixes | P0 | ✅ Complete | None |
| Verification Tool | P1 | ⚠️ Has bugs | None |
| Progressive Disclosure | P1 | ⚠️ Has bugs | None |
| Skills as Scripts | P2 | ⚠️ Has bugs | None |
| Evaluator-Optimizer | P2 | ⚠️ Has bugs | Verification Tool |
| Bug Fix Pass | P0 | ⬜ Not started | All above |
All 16 bugs (#21-#36) from the 2026-03-17 code review fixed in PR #12. 25 regression tests added.

---

## Estimated LOC

| Feature | ~LOC | Actual |
| ---------------------- | ---- | ------ |
| Plan Mode Fixes | ~20 | ~20 |
| Verification Tool | ~120 | 259 |
| Progressive Disclosure | ~60 | ~30 |
| Skills as Scripts | ~100 | 107 |
| Evaluator-Optimizer | ~150 | 207 |
| Bug Fix Pass | ~100 | TBD |
| **Total** | ~630 | ~623+ |
## Summary

| Feature | Status |
| ---------------------- | ------------- |
| Plan Mode Fixes | ✅ Complete |
| Verification Tool | ✅ Complete |
| Progressive Disclosure | ✅ Complete |
| Skills as Scripts | ✅ Complete |
| Evaluator-Optimizer | ✅ Complete |
| Bug Fix Pass (16 bugs) | ✅ Complete |
| Upstream Rebase | ⬜ Next |
73 changes: 47 additions & 26 deletions STATUS.md
Original file line number Diff line number Diff line change
@@ -1,37 +1,58 @@
# Frankencode Status
# Frankencode — Project Status

## Session: 2026-03-17
**Date:** 2026-03-18
**Upstream:** `anomalyco/opencode` @ `dev`
**Fork:** `e6qu/frankencode` @ `dev`

### Completed
## Overview

- ✅ Plan Mode Fixes (removed experimental flag, enabled plan_enter tool)
- ✅ Fixed `focus-rewrite-history` agent missing tool permissions
- ✅ Researched 6 months of Claude Blog posts
- ✅ Designed 4 new features (Verification, Progressive Disclosure, Skills as Scripts, Evaluator-Optimizer)
- ✅ Verification Tool (`/verify` command with circuit-breaker)
- ✅ Progressive Disclosure for Skills (lazy load content on demand)
- ✅ Evaluator-Optimizer (evaluator/optimizer agents + refine tool)
- ✅ Skills as Scripts (scripts in skill directories become callable tools)
- ✅ Code review of all new features — found 16 bugs (2 critical, 4 high, 6 medium, 4 low)
Frankencode is a fork of OpenCode that adds surgical, reversible, agent-driven context editing with content-addressable storage and a conversation history graph. All 4 planned phases are implemented. Currently in hardening/testing phase.

### In Progress
## Branch Status

- ⬜ Bug fix pass — 16 bugs logged in `BUGS.md` (#21-#36)
| Branch | Status | PR |
|--------|--------|----|
| `dev` | Main development branch | — |
| `fix/code-review-bugs` | 16 bug fixes + 25 tests | [#12](https://github.com/e6qu/frankencode/pull/12) (merged) |
| `docs/upstream-sync-notes` | Docs update with upstream analysis | [#13](https://github.com/e6qu/frankencode/pull/13) |

### Blocked
## Upstream Divergence

- Refine tool (#28, #29) — fundamentally incomplete, evaluator/optimizer have no context about actual changes
- **10 commits ahead** of upstream (Frankencode features)
- **~50 commits behind** upstream (Effect refactors, bug fixes, model updates)

### Critical Issues to Resolve Before Merge
### Upstream changes requiring attention:

1. **Refine tool is non-functional** — evaluator receives no code context, optimizer may have no tools
2. **Skill template returns Promise** — may inject `"[object Promise]"` into prompts
3. **Verify circuit breaker has 4 interacting bugs** — lastFailure timing, no success reset, 1s cooldown, shallow config merge
1. **Effect-ification** — `SkillService`, `FileService`, `FormatService`, `VcsService`, etc. refactored to Effect scoped services
2. **`instance-state.ts` deleted** — our `Instance.state()` usage needs review
3. **`skill.ts` rewritten** (333 lines changed) — conflicts with our content cache
4. **`prompt.ts` changed** (~99 lines) — conflicts with our filterEdited/filterEphemeral pipeline
5. **`message-v2.ts` changed** (~107 lines) — conflicts with our EditMeta/LifecycleMeta additions

### Next Steps
## Test Status

1. Fix P0 critical bugs (#28, #29, #32)
2. Fix P1 high bugs (#21, #27, #36)
3. Fix P2 medium bugs (#22, #24, #25, #30, #34)
4. Unit tests for all new features
5. Integration testing
- **1401 tests passing**, 0 failures, 8 skipped
- **25 new regression tests** for bug fixes (verify, refine, scripts, skill cache, agent permissions)
- **Typecheck:** clean (`bun typecheck`)

## Bug Status

- **0 active bugs**
- **40 bugs fixed** (tracked in BUGS.md)
- **4 open design issues** (CAS GC, objective staleness, EditGraph leak, CAS ownership)

## Feature Inventory

| Feature | Status | Files |
|---------|--------|-------|
| Content-Addressable Store | Done | `src/cas/` |
| Context editing (6 operations) | Done | `src/context-edit/`, `src/tool/context-edit.ts` |
| Edit graph (DAG history) | Done | `src/cas/graph.ts`, `src/tool/context-history.ts` |
| Side threads | Done | `src/session/side-thread.ts`, `src/tool/thread-*.ts` |
| Focus agent | Done | `src/agent/agent.ts`, `src/agent/prompt/focus.txt` |
| Classifier + distill | Done | `src/tool/classifier-threads.ts`, `src/tool/distill-threads.ts` |
| Ephemeral commands | Done | `src/command/index.ts`, `src/session/prompt.ts` |
| Verify tool | Done | `src/tool/verify.ts` |
| Refine tool | Done | `src/tool/refine.ts` |
| Script discovery | Done | `src/skill/scripts.ts` |
| /cost command | Done | TUI dialog |
Loading
Loading