Skip to content

refactor: decouple storage and commands from concrete I/O for testability#174

Merged
spachava753 merged 12 commits intomainfrom
testing-refactor
Feb 17, 2026
Merged

refactor: decouple storage and commands from concrete I/O for testability#174
spachava753 merged 12 commits intomainfrom
testing-refactor

Conversation

@spachava753
Copy link
Copy Markdown
Owner

@spachava753 spachava753 commented Feb 17, 2026

Summary

Overhaul of the testing infrastructure and storage layer to make the codebase testable without real databases, file I/O, or API calls. This branch removes all snapshot-based tests (cupaloy), redesigns the storage interfaces, and progressively decouples internal components from concrete infrastructure.

482 files changed, +5,056 / −23,898 lines across 12 commits.

Motivation

The codebase was tightly coupled to concrete implementations — ExecuteRoot opened its own SQLite database, config was loaded from disk inside business logic, and tests relied on snapshot files that were brittle and hard to maintain. This made unit testing impractical and forced integration tests to hit real APIs and databases.

Changes

Testing strategy reset

Remove all snapshot-based tests (cupaloy) and the testing strategy spec. This clears the way for table-driven unit tests with exact matching, as specified in AGENTS.md.

Storage layer redesign

The concrete DialogStorage type is replaced with a composed MessageDB interface (DialogSaver + MessagesDeleter + MessagesLister + MessagesGetter), allowing consumers to depend on exactly the subset they need. The SaveMessages method is replaced with a transactional SaveDialog iterator API that saves messages lazily as the caller consumes the iterator. Message identity and lineage are conveyed through ExtraFields rather than auxiliary return values. The SQLite implementation is renamed from DialogStorage to Sqlite and accepts an injected DB interface instead of opening its own connection.

In-memory MessageDB

A new storage.MemDB implements the full MessageDB interface using a tree of message nodes with parent/child references. This enables tests to assert on conversation structure (forks, linear chains, deletions) without SQLite. Includes 11 unit tests covering save, continue, fork, get, list, delete (leaf/recursive/atomic), error paths, and dialog reconstruction.

ExecuteRoot dependency injection

ExecuteRoot no longer performs config file loading or database initialization internally. It accepts a pre-resolved *config.Config and an optional storage.MessageDB, with the CLI entrypoint (cmd/root.go) owning the wiring. Integration tests construct configs in-memory via config.ResolveFromRaw.

Other

  • Rename scripts/build/ and simplify the Goyek task runner
  • Add URL handler tests achieving 100% coverage
  • Comprehensive storage test suite with behavioral coverage (2,080 lines)

Breaking Changes

  • ExecuteRootOptions no longer accepts ConfigPath, ModelRef, GenParams, Timeout, or IncognitoMode. Callers must provide a resolved *config.Config and optional storage.MessageDB.
  • DialogStorage type removed; replaced by Sqlite + MessageDB interface.
  • SaveMessages replaced by SaveDialog iterator API.

Remove all test files (46 *_test.go files), snapshot directories (10
.snapshots dirs), and the testing strategy specification document. Drop
the cupaloy/v2 and go-faker/v4 dependencies which were test-only, and
demote stretchr/testify to indirect.

Tests will be rewritten incrementally with a focus on exact-match
assertions, higher code coverage, and improved testability through
refactoring. The AGENTS.md testing guidelines are updated to reflect
this and to explicitly prohibit snapshot-based testing going forward.
Thread an io.Writer for stdout through the generator pipeline so tests
can silence or capture model output. Add a WithStdout generator option
and a Stdout field on ExecuteRootOptions, both defaulting to os.Stdout
when nil.

Add root_test.go with integration tests covering the full generation
flow: new conversation creation, same-provider continuation,
cross-provider forked continuation via ContinueID, context cancellation
with partial persistence, and incognito mode storage suppression. Tests
run against live Anthropic and Gemini APIs, verify message chains and
content blocks via direct SQLite queries on the .cpeconvo database.
Consumers of the storage package were tightly coupled to *DialogStorage
and its dozen exported methods, making it impossible to depend on only
the subset of operations actually needed. This made testing harder and
obscured the real dependency surface of each caller.

Introduce a composed MessageDB interface (MessagesSaver, MessagesGetter,
MessagesLister, MessagesDeleter) so that each consumer declares exactly
what it requires — the saving middleware needs only MessagesSaver, the
conversation list command needs only MessagesLister, and so on. Message
identity and lineage are now conveyed through gai.Message.ExtraFields
rather than auxiliary return values, eliminating the need for separate
ID slices and parent-ID returns that previously leaked storage internals
into every call site.

GetDialogForMessage becomes a standalone function over the MessagesGetter
interface, making dialog reconstruction testable without a real database.
The tree display type MessageIdNode moves to the commands package where
it belongs as a presentation concern. Comprehensive tests verify all
four interface methods and the dialog reconstruction utility through the
public API only.
The scripts/ directory name was generic and didn't convey that it
contains build tooling. Renaming to build/ follows the common Go
project convention for auxiliary build programs.

Flag accessor functions (GetTargetURL, GetPort, etc.) added unnecessary
indirection over the package-level flag pointers. Tasks now reference
the flag variables directly, and the logFile variable in the MCP debug
proxy is renamed to avoid shadowing the flag. Running with no arguments
now defaults to a new list task that prints all available tasks,
improving discoverability.
…used queries

The hand-written checkMessageIDExists method redundantly checked
sqlite_master for table existence before falling back to GetMessage with
ErrNoRows handling. Since InitDialogStorage always creates the schema
before any queries run, the table-existence guard was unnecessary. A
dedicated sqlc-managed CheckMessageIDExists query using SELECT EXISTS
replaces the entire method body.

Five other sqlc queries had no callers anywhere in the codebase:
GetMostRecentMessage, GetChildrenCount, ListRootMessages,
GetMessageChildrenId, and DeleteBlock. Removing them shrinks the
generated query surface to only what is actually used.
The eager slice-in/iterator-out signature forced all messages to be
saved inside a single transaction before any ID was available to the
caller. This prevented streaming save-and-use patterns where each
saved ID feeds into the next message's ParentID, and coupled the API
to batch semantics that most callers never needed — every call site
was already passing a single-element slice.

Accept iter.Seq[SaveMessageOptions] and return iter.Seq2[string, error]
so that each message is persisted independently as the caller pulls
from the output iterator. This turns SaveMessages into a composable
pipeline stage: pull an option, save it, yield its ID, repeat. Callers
that need the old slice-based convenience wrap their slice with
slices.Values.

BREAKING CHANGE: MessagesSaver.SaveMessages signature changed from
(ctx, []SaveMessageOptions) (iter.Seq[string], error) to
(ctx, iter.Seq[SaveMessageOptions]) iter.Seq2[string, error].
Messages are no longer saved atomically in a shared transaction.
… API

SaveMessages persisted each message independently without transactions,
requiring callers to manually thread parent IDs via SaveMessageOptions.
This made partial saves possible on failure and spread parent-chain
bookkeeping across every call site.

SaveDialog accepts an iterator of messages representing a conversation
thread and saves them atomically in a single transaction. Already-
persisted messages (those with MessageIDKey set) are verified against
storage but not re-saved, so callers can pass a full dialog where only
the tail is new. Titles are now conveyed through ExtraFields rather than
a wrapper struct, eliminating SaveMessageOptions entirely.

The SavingMiddleware propagates the saved assistant message back into
resp.Candidates so outer middlewares (ResponsePrinter) can read the
assigned message ID and print it to the user.
The type name now reflects the implementation rather than the domain
concept, which is already captured by the interfaces it satisfies.
Callers reference the concrete type only at initialization boundaries;
everywhere else they depend on DialogSaver or MessageDB.
…internally

Sqlite previously opened and owned the *sql.DB connection, making it
impossible to inject faults or wrap the database layer for testing
without touching real SQLite files. Accepting an interface pushes
connection lifecycle management to callers, so each call site
explicitly opens, uses, and closes the database.

This enables wrapping the concrete *sql.DB with decorators that
simulate errors, record calls, or enforce invariants—improving
testability and making the dependency surface visible at each call site.
The test suite targeted 100% line coverage but had redundant tests, copy-pasted
SQL patterns, scattered error-path tests, and no verification of important
behavioral invariants like tree branching, block ordering, or special character
handling.

Consolidate test helpers (single newTestDB returning both *Sqlite and *sql.DB,
shared trigger/poison helpers, expectSaveDialogError), combine related error
tests into table-driven groups, and remove tests that duplicated coverage
without adding behavioral value. This reduces the file from 1942 to ~1480 lines
for the coverage tests while keeping 100% statement coverage.

Add 15 behavioral tests that catch classes of bugs invisible to line coverage:
block sequence ordering, unicode/NUL/SQL-injection round-trips, conversation
tree branching with independent branch deletion, mid-chain dialog retrieval,
orphan detection after middle-node deletion, ON DELETE CASCADE verification,
empty/duplicate ID edge cases, and message ExtraFields persistence semantics.

Fix a bug where getMessage omitted the title from ExtraFields on retrieval
despite it being stored as a dedicated column. All known message metadata keys
(id, parent_id, created_at, title) are now returned consistently.

Add doc comments to every test function explaining what invariant it verifies
and why the test exists, so future contributors and agents can navigate the
suite without reading every assertion.
The urlhandler package had no tests. This adds comprehensive tests for
IsURL, DefaultConfig, and DownloadContent covering all branches
including edge cases like control characters in URLs, chunked transfers
exceeding size limits, mid-stream connection resets, empty Content-Type
fallback, and cancelled contexts.
ExecuteRoot previously performed its own config file loading and SQLite
database initialization, making it impossible to test without real file
I/O and database connections. Callers had no control over how these
dependencies were created, coupling the core generation flow to concrete
infrastructure.

Push dependency wiring to the caller by replacing ConfigPath, ModelRef,
GenParams, Timeout, and IncognitoMode fields with a pre-resolved
*config.Config and an optional storage.MessageDB. The CLI entrypoint in
cmd/root.go now owns config resolution and database lifecycle, while
ExecuteRoot operates purely on injected dependencies.

Add an in-memory MessageDB implementation (MemDB) backed by a tree of
message nodes with parent/child references, enabling tests to assert on
conversation structure without SQLite. The tree mirrors the branching
history model used in production: each node holds a message and a slice
of children, supporting forks and linear continuations alike.

BREAKING CHANGE: ExecuteRootOptions no longer accepts ConfigPath,
ModelRef, GenParams, Timeout, or IncognitoMode. Callers must provide a
resolved *config.Config via the Config field and an optional
storage.MessageDB via the MessageDB field.
@spachava753 spachava753 changed the title refactor: decouple ExecuteRoot from I/O, add in-memory MessageDB for testing refactor: decouple storage and commands from concrete I/O for testability Feb 17, 2026
@spachava753 spachava753 merged commit 187a4f3 into main Feb 17, 2026
@spachava753 spachava753 deleted the testing-refactor branch April 10, 2026 00:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant