refactor: decouple storage and commands from concrete I/O for testability by spachava753 · Pull Request #174 · spachava753/cpe

spachava753 · 2026-02-17T04:06:20Z

Summary

Overhaul of the testing infrastructure and storage layer to make the codebase testable without real databases, file I/O, or API calls. This branch removes all snapshot-based tests (cupaloy), redesigns the storage interfaces, and progressively decouples internal components from concrete infrastructure.

482 files changed, +5,056 / −23,898 lines across 12 commits.

Motivation

The codebase was tightly coupled to concrete implementations — ExecuteRoot opened its own SQLite database, config was loaded from disk inside business logic, and tests relied on snapshot files that were brittle and hard to maintain. This made unit testing impractical and forced integration tests to hit real APIs and databases.

Changes

Testing strategy reset

Remove all snapshot-based tests (cupaloy) and the testing strategy spec. This clears the way for table-driven unit tests with exact matching, as specified in AGENTS.md.

Storage layer redesign

The concrete DialogStorage type is replaced with a composed MessageDB interface (DialogSaver + MessagesDeleter + MessagesLister + MessagesGetter), allowing consumers to depend on exactly the subset they need. The SaveMessages method is replaced with a transactional SaveDialog iterator API that saves messages lazily as the caller consumes the iterator. Message identity and lineage are conveyed through ExtraFields rather than auxiliary return values. The SQLite implementation is renamed from DialogStorage to Sqlite and accepts an injected DB interface instead of opening its own connection.

In-memory MessageDB

A new storage.MemDB implements the full MessageDB interface using a tree of message nodes with parent/child references. This enables tests to assert on conversation structure (forks, linear chains, deletions) without SQLite. Includes 11 unit tests covering save, continue, fork, get, list, delete (leaf/recursive/atomic), error paths, and dialog reconstruction.

ExecuteRoot dependency injection

ExecuteRoot no longer performs config file loading or database initialization internally. It accepts a pre-resolved *config.Config and an optional storage.MessageDB, with the CLI entrypoint (cmd/root.go) owning the wiring. Integration tests construct configs in-memory via config.ResolveFromRaw.

Other

Rename scripts/ → build/ and simplify the Goyek task runner
Add URL handler tests achieving 100% coverage
Comprehensive storage test suite with behavioral coverage (2,080 lines)

Breaking Changes

ExecuteRootOptions no longer accepts ConfigPath, ModelRef, GenParams, Timeout, or IncognitoMode. Callers must provide a resolved *config.Config and optional storage.MessageDB.
DialogStorage type removed; replaced by Sqlite + MessageDB interface.
SaveMessages replaced by SaveDialog iterator API.

Remove all test files (46 *_test.go files), snapshot directories (10 .snapshots dirs), and the testing strategy specification document. Drop the cupaloy/v2 and go-faker/v4 dependencies which were test-only, and demote stretchr/testify to indirect. Tests will be rewritten incrementally with a focus on exact-match assertions, higher code coverage, and improved testability through refactoring. The AGENTS.md testing guidelines are updated to reflect this and to explicitly prohibit snapshot-based testing going forward.

Thread an io.Writer for stdout through the generator pipeline so tests can silence or capture model output. Add a WithStdout generator option and a Stdout field on ExecuteRootOptions, both defaulting to os.Stdout when nil. Add root_test.go with integration tests covering the full generation flow: new conversation creation, same-provider continuation, cross-provider forked continuation via ContinueID, context cancellation with partial persistence, and incognito mode storage suppression. Tests run against live Anthropic and Gemini APIs, verify message chains and content blocks via direct SQLite queries on the .cpeconvo database.

Consumers of the storage package were tightly coupled to *DialogStorage and its dozen exported methods, making it impossible to depend on only the subset of operations actually needed. This made testing harder and obscured the real dependency surface of each caller. Introduce a composed MessageDB interface (MessagesSaver, MessagesGetter, MessagesLister, MessagesDeleter) so that each consumer declares exactly what it requires — the saving middleware needs only MessagesSaver, the conversation list command needs only MessagesLister, and so on. Message identity and lineage are now conveyed through gai.Message.ExtraFields rather than auxiliary return values, eliminating the need for separate ID slices and parent-ID returns that previously leaked storage internals into every call site. GetDialogForMessage becomes a standalone function over the MessagesGetter interface, making dialog reconstruction testable without a real database. The tree display type MessageIdNode moves to the commands package where it belongs as a presentation concern. Comprehensive tests verify all four interface methods and the dialog reconstruction utility through the public API only.

The scripts/ directory name was generic and didn't convey that it contains build tooling. Renaming to build/ follows the common Go project convention for auxiliary build programs. Flag accessor functions (GetTargetURL, GetPort, etc.) added unnecessary indirection over the package-level flag pointers. Tasks now reference the flag variables directly, and the logFile variable in the MCP debug proxy is renamed to avoid shadowing the flag. Running with no arguments now defaults to a new list task that prints all available tasks, improving discoverability.

…used queries The hand-written checkMessageIDExists method redundantly checked sqlite_master for table existence before falling back to GetMessage with ErrNoRows handling. Since InitDialogStorage always creates the schema before any queries run, the table-existence guard was unnecessary. A dedicated sqlc-managed CheckMessageIDExists query using SELECT EXISTS replaces the entire method body. Five other sqlc queries had no callers anywhere in the codebase: GetMostRecentMessage, GetChildrenCount, ListRootMessages, GetMessageChildrenId, and DeleteBlock. Removing them shrinks the generated query surface to only what is actually used.

The eager slice-in/iterator-out signature forced all messages to be saved inside a single transaction before any ID was available to the caller. This prevented streaming save-and-use patterns where each saved ID feeds into the next message's ParentID, and coupled the API to batch semantics that most callers never needed — every call site was already passing a single-element slice. Accept iter.Seq[SaveMessageOptions] and return iter.Seq2[string, error] so that each message is persisted independently as the caller pulls from the output iterator. This turns SaveMessages into a composable pipeline stage: pull an option, save it, yield its ID, repeat. Callers that need the old slice-based convenience wrap their slice with slices.Values. BREAKING CHANGE: MessagesSaver.SaveMessages signature changed from (ctx, []SaveMessageOptions) (iter.Seq[string], error) to (ctx, iter.Seq[SaveMessageOptions]) iter.Seq2[string, error]. Messages are no longer saved atomically in a shared transaction.

… API SaveMessages persisted each message independently without transactions, requiring callers to manually thread parent IDs via SaveMessageOptions. This made partial saves possible on failure and spread parent-chain bookkeeping across every call site. SaveDialog accepts an iterator of messages representing a conversation thread and saves them atomically in a single transaction. Already- persisted messages (those with MessageIDKey set) are verified against storage but not re-saved, so callers can pass a full dialog where only the tail is new. Titles are now conveyed through ExtraFields rather than a wrapper struct, eliminating SaveMessageOptions entirely. The SavingMiddleware propagates the saved assistant message back into resp.Candidates so outer middlewares (ResponsePrinter) can read the assigned message ID and print it to the user.

The type name now reflects the implementation rather than the domain concept, which is already captured by the interfaces it satisfies. Callers reference the concrete type only at initialization boundaries; everywhere else they depend on DialogSaver or MessageDB.

…internally Sqlite previously opened and owned the *sql.DB connection, making it impossible to inject faults or wrap the database layer for testing without touching real SQLite files. Accepting an interface pushes connection lifecycle management to callers, so each call site explicitly opens, uses, and closes the database. This enables wrapping the concrete *sql.DB with decorators that simulate errors, record calls, or enforce invariants—improving testability and making the dependency surface visible at each call site.

The test suite targeted 100% line coverage but had redundant tests, copy-pasted SQL patterns, scattered error-path tests, and no verification of important behavioral invariants like tree branching, block ordering, or special character handling. Consolidate test helpers (single newTestDB returning both *Sqlite and *sql.DB, shared trigger/poison helpers, expectSaveDialogError), combine related error tests into table-driven groups, and remove tests that duplicated coverage without adding behavioral value. This reduces the file from 1942 to ~1480 lines for the coverage tests while keeping 100% statement coverage. Add 15 behavioral tests that catch classes of bugs invisible to line coverage: block sequence ordering, unicode/NUL/SQL-injection round-trips, conversation tree branching with independent branch deletion, mid-chain dialog retrieval, orphan detection after middle-node deletion, ON DELETE CASCADE verification, empty/duplicate ID edge cases, and message ExtraFields persistence semantics. Fix a bug where getMessage omitted the title from ExtraFields on retrieval despite it being stored as a dedicated column. All known message metadata keys (id, parent_id, created_at, title) are now returned consistently. Add doc comments to every test function explaining what invariant it verifies and why the test exists, so future contributors and agents can navigate the suite without reading every assertion.

The urlhandler package had no tests. This adds comprehensive tests for IsURL, DefaultConfig, and DownloadContent covering all branches including edge cases like control characters in URLs, chunked transfers exceeding size limits, mid-stream connection resets, empty Content-Type fallback, and cancelled contexts.

ExecuteRoot previously performed its own config file loading and SQLite database initialization, making it impossible to test without real file I/O and database connections. Callers had no control over how these dependencies were created, coupling the core generation flow to concrete infrastructure. Push dependency wiring to the caller by replacing ConfigPath, ModelRef, GenParams, Timeout, and IncognitoMode fields with a pre-resolved *config.Config and an optional storage.MessageDB. The CLI entrypoint in cmd/root.go now owns config resolution and database lifecycle, while ExecuteRoot operates purely on injected dependencies. Add an in-memory MessageDB implementation (MemDB) backed by a tree of message nodes with parent/child references, enabling tests to assert on conversation structure without SQLite. The tree mirrors the branching history model used in production: each node holds a message and a slice of children, supporting forks and linear continuations alike. BREAKING CHANGE: ExecuteRootOptions no longer accepts ConfigPath, ModelRef, GenParams, Timeout, or IncognitoMode. Callers must provide a resolved *config.Config via the Config field and an optional storage.MessageDB via the MessageDB field.

spachava753 added 12 commits February 13, 2026 22:55

spachava753 changed the title ~~refactor: decouple ExecuteRoot from I/O, add in-memory MessageDB for testing~~ refactor: decouple storage and commands from concrete I/O for testability Feb 17, 2026

spachava753 merged commit 187a4f3 into main Feb 17, 2026

spachava753 deleted the testing-refactor branch April 10, 2026 00:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: decouple storage and commands from concrete I/O for testability#174

refactor: decouple storage and commands from concrete I/O for testability#174
spachava753 merged 12 commits intomainfrom
testing-refactor

spachava753 commented Feb 17, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

spachava753 commented Feb 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation

Changes

Testing strategy reset

Storage layer redesign

In-memory MessageDB

ExecuteRoot dependency injection

Other

Breaking Changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

spachava753 commented Feb 17, 2026 •

edited

Loading