refactor: decouple storage and commands from concrete I/O for testability#174
Merged
spachava753 merged 12 commits intomainfrom Feb 17, 2026
Merged
refactor: decouple storage and commands from concrete I/O for testability#174spachava753 merged 12 commits intomainfrom
spachava753 merged 12 commits intomainfrom
Conversation
Remove all test files (46 *_test.go files), snapshot directories (10 .snapshots dirs), and the testing strategy specification document. Drop the cupaloy/v2 and go-faker/v4 dependencies which were test-only, and demote stretchr/testify to indirect. Tests will be rewritten incrementally with a focus on exact-match assertions, higher code coverage, and improved testability through refactoring. The AGENTS.md testing guidelines are updated to reflect this and to explicitly prohibit snapshot-based testing going forward.
Thread an io.Writer for stdout through the generator pipeline so tests can silence or capture model output. Add a WithStdout generator option and a Stdout field on ExecuteRootOptions, both defaulting to os.Stdout when nil. Add root_test.go with integration tests covering the full generation flow: new conversation creation, same-provider continuation, cross-provider forked continuation via ContinueID, context cancellation with partial persistence, and incognito mode storage suppression. Tests run against live Anthropic and Gemini APIs, verify message chains and content blocks via direct SQLite queries on the .cpeconvo database.
Consumers of the storage package were tightly coupled to *DialogStorage and its dozen exported methods, making it impossible to depend on only the subset of operations actually needed. This made testing harder and obscured the real dependency surface of each caller. Introduce a composed MessageDB interface (MessagesSaver, MessagesGetter, MessagesLister, MessagesDeleter) so that each consumer declares exactly what it requires — the saving middleware needs only MessagesSaver, the conversation list command needs only MessagesLister, and so on. Message identity and lineage are now conveyed through gai.Message.ExtraFields rather than auxiliary return values, eliminating the need for separate ID slices and parent-ID returns that previously leaked storage internals into every call site. GetDialogForMessage becomes a standalone function over the MessagesGetter interface, making dialog reconstruction testable without a real database. The tree display type MessageIdNode moves to the commands package where it belongs as a presentation concern. Comprehensive tests verify all four interface methods and the dialog reconstruction utility through the public API only.
The scripts/ directory name was generic and didn't convey that it contains build tooling. Renaming to build/ follows the common Go project convention for auxiliary build programs. Flag accessor functions (GetTargetURL, GetPort, etc.) added unnecessary indirection over the package-level flag pointers. Tasks now reference the flag variables directly, and the logFile variable in the MCP debug proxy is renamed to avoid shadowing the flag. Running with no arguments now defaults to a new list task that prints all available tasks, improving discoverability.
…used queries The hand-written checkMessageIDExists method redundantly checked sqlite_master for table existence before falling back to GetMessage with ErrNoRows handling. Since InitDialogStorage always creates the schema before any queries run, the table-existence guard was unnecessary. A dedicated sqlc-managed CheckMessageIDExists query using SELECT EXISTS replaces the entire method body. Five other sqlc queries had no callers anywhere in the codebase: GetMostRecentMessage, GetChildrenCount, ListRootMessages, GetMessageChildrenId, and DeleteBlock. Removing them shrinks the generated query surface to only what is actually used.
The eager slice-in/iterator-out signature forced all messages to be saved inside a single transaction before any ID was available to the caller. This prevented streaming save-and-use patterns where each saved ID feeds into the next message's ParentID, and coupled the API to batch semantics that most callers never needed — every call site was already passing a single-element slice. Accept iter.Seq[SaveMessageOptions] and return iter.Seq2[string, error] so that each message is persisted independently as the caller pulls from the output iterator. This turns SaveMessages into a composable pipeline stage: pull an option, save it, yield its ID, repeat. Callers that need the old slice-based convenience wrap their slice with slices.Values. BREAKING CHANGE: MessagesSaver.SaveMessages signature changed from (ctx, []SaveMessageOptions) (iter.Seq[string], error) to (ctx, iter.Seq[SaveMessageOptions]) iter.Seq2[string, error]. Messages are no longer saved atomically in a shared transaction.
… API SaveMessages persisted each message independently without transactions, requiring callers to manually thread parent IDs via SaveMessageOptions. This made partial saves possible on failure and spread parent-chain bookkeeping across every call site. SaveDialog accepts an iterator of messages representing a conversation thread and saves them atomically in a single transaction. Already- persisted messages (those with MessageIDKey set) are verified against storage but not re-saved, so callers can pass a full dialog where only the tail is new. Titles are now conveyed through ExtraFields rather than a wrapper struct, eliminating SaveMessageOptions entirely. The SavingMiddleware propagates the saved assistant message back into resp.Candidates so outer middlewares (ResponsePrinter) can read the assigned message ID and print it to the user.
The type name now reflects the implementation rather than the domain concept, which is already captured by the interfaces it satisfies. Callers reference the concrete type only at initialization boundaries; everywhere else they depend on DialogSaver or MessageDB.
…internally Sqlite previously opened and owned the *sql.DB connection, making it impossible to inject faults or wrap the database layer for testing without touching real SQLite files. Accepting an interface pushes connection lifecycle management to callers, so each call site explicitly opens, uses, and closes the database. This enables wrapping the concrete *sql.DB with decorators that simulate errors, record calls, or enforce invariants—improving testability and making the dependency surface visible at each call site.
The test suite targeted 100% line coverage but had redundant tests, copy-pasted SQL patterns, scattered error-path tests, and no verification of important behavioral invariants like tree branching, block ordering, or special character handling. Consolidate test helpers (single newTestDB returning both *Sqlite and *sql.DB, shared trigger/poison helpers, expectSaveDialogError), combine related error tests into table-driven groups, and remove tests that duplicated coverage without adding behavioral value. This reduces the file from 1942 to ~1480 lines for the coverage tests while keeping 100% statement coverage. Add 15 behavioral tests that catch classes of bugs invisible to line coverage: block sequence ordering, unicode/NUL/SQL-injection round-trips, conversation tree branching with independent branch deletion, mid-chain dialog retrieval, orphan detection after middle-node deletion, ON DELETE CASCADE verification, empty/duplicate ID edge cases, and message ExtraFields persistence semantics. Fix a bug where getMessage omitted the title from ExtraFields on retrieval despite it being stored as a dedicated column. All known message metadata keys (id, parent_id, created_at, title) are now returned consistently. Add doc comments to every test function explaining what invariant it verifies and why the test exists, so future contributors and agents can navigate the suite without reading every assertion.
The urlhandler package had no tests. This adds comprehensive tests for IsURL, DefaultConfig, and DownloadContent covering all branches including edge cases like control characters in URLs, chunked transfers exceeding size limits, mid-stream connection resets, empty Content-Type fallback, and cancelled contexts.
ExecuteRoot previously performed its own config file loading and SQLite database initialization, making it impossible to test without real file I/O and database connections. Callers had no control over how these dependencies were created, coupling the core generation flow to concrete infrastructure. Push dependency wiring to the caller by replacing ConfigPath, ModelRef, GenParams, Timeout, and IncognitoMode fields with a pre-resolved *config.Config and an optional storage.MessageDB. The CLI entrypoint in cmd/root.go now owns config resolution and database lifecycle, while ExecuteRoot operates purely on injected dependencies. Add an in-memory MessageDB implementation (MemDB) backed by a tree of message nodes with parent/child references, enabling tests to assert on conversation structure without SQLite. The tree mirrors the branching history model used in production: each node holds a message and a slice of children, supporting forks and linear continuations alike. BREAKING CHANGE: ExecuteRootOptions no longer accepts ConfigPath, ModelRef, GenParams, Timeout, or IncognitoMode. Callers must provide a resolved *config.Config via the Config field and an optional storage.MessageDB via the MessageDB field.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Overhaul of the testing infrastructure and storage layer to make the codebase testable without real databases, file I/O, or API calls. This branch removes all snapshot-based tests (cupaloy), redesigns the storage interfaces, and progressively decouples internal components from concrete infrastructure.
482 files changed, +5,056 / −23,898 lines across 12 commits.
Motivation
The codebase was tightly coupled to concrete implementations —
ExecuteRootopened its own SQLite database, config was loaded from disk inside business logic, and tests relied on snapshot files that were brittle and hard to maintain. This made unit testing impractical and forced integration tests to hit real APIs and databases.Changes
Testing strategy reset
Remove all snapshot-based tests (cupaloy) and the testing strategy spec. This clears the way for table-driven unit tests with exact matching, as specified in AGENTS.md.
Storage layer redesign
The concrete
DialogStoragetype is replaced with a composedMessageDBinterface (DialogSaver+MessagesDeleter+MessagesLister+MessagesGetter), allowing consumers to depend on exactly the subset they need. TheSaveMessagesmethod is replaced with a transactionalSaveDialogiterator API that saves messages lazily as the caller consumes the iterator. Message identity and lineage are conveyed throughExtraFieldsrather than auxiliary return values. The SQLite implementation is renamed fromDialogStoragetoSqliteand accepts an injectedDBinterface instead of opening its own connection.In-memory MessageDB
A new
storage.MemDBimplements the fullMessageDBinterface using a tree of message nodes with parent/child references. This enables tests to assert on conversation structure (forks, linear chains, deletions) without SQLite. Includes 11 unit tests covering save, continue, fork, get, list, delete (leaf/recursive/atomic), error paths, and dialog reconstruction.ExecuteRoot dependency injection
ExecuteRootno longer performs config file loading or database initialization internally. It accepts a pre-resolved*config.Configand an optionalstorage.MessageDB, with the CLI entrypoint (cmd/root.go) owning the wiring. Integration tests construct configs in-memory viaconfig.ResolveFromRaw.Other
scripts/→build/and simplify the Goyek task runnerBreaking Changes
ExecuteRootOptionsno longer acceptsConfigPath,ModelRef,GenParams,Timeout, orIncognitoMode. Callers must provide a resolved*config.Configand optionalstorage.MessageDB.DialogStoragetype removed; replaced bySqlite+MessageDBinterface.SaveMessagesreplaced bySaveDialogiterator API.