Skip to content

Persist step/stage state and improve error visibility#112

Merged
nicpottier merged 5 commits intomainfrom
nicpottier/db-error-handling
Feb 24, 2026
Merged

Persist step/stage state and improve error visibility#112
nicpottier merged 5 commits intomainfrom
nicpottier/db-error-handling

Conversation

@nicpottier
Copy link
Contributor

Summary

Replaces fragile multi-source state tracking (in-memory StageRunJob, in-memory runningSteps, DB step_completions) with a unified step_runs table as the single source of truth. Errors now persist across page refreshes and server restarts, per-step error messages display in the UI, and queued stages correctly show when a book is added.

Key Changes

  • New step_runs table (schema v8): Tracks full step lifecycle with status (running/done/error/skipped), timestamps, and error messages
  • All step events persisted: Backend writes every step-start, step-progress, step-complete, step-skip, step-error event to DB
  • Pure stage state derivation: Stage state is now running > queued > error > done > idle, computed from step states + in-memory run queue
  • Per-step error messages: Displayed below step labels when errored (e.g., "LLM rate limit exceeded")
  • Fixed "no icon on book add": getQueuedStages() now includes the active run's stage range so not-yet-started stages show as queued
  • Improved SSE race handling: Only cancel in-flight fetches if baseline data already loaded; prevents UI hang on page load
  • Skipped steps show as done: Steps that are skipped (e.g., translation for single-language book) now render with checkmarks
  • Startup recovery: Server startup sweep marks any step_runs with status='running' as status='error' to handle mid-run crashes

Tests

All 623 tests passing, including 13 new tests for step/stage completion logic, queued stages during active runs, and error state preservation.

Files Changed

  • Backend: storage layer (new table + migration), stage-runner (event writing), stage-service (queued stage calculation), routes (step-status endpoint)
  • Frontend: hooks (context + SSE logic), components (error display), pure helpers (extracted for testability)

Replace fragile in-memory state tracking with step_runs table as single
source of truth. Persist full step lifecycle (running/done/error/skipped)
with error messages. Derive stage state purely from steps, add per-step
error display in UI, show queued stages on book add, and clean up
interrupted steps on server restart.

Key changes:
- New step_runs table (schema v8) with status/error/message
- All step events written to DB in real-time
- Stage state: running > queued > error > done > idle (pure derivation)
- Per-step error messages displayed in StageRunCard
- Active run range included in queued stages (fixes "no icon on book add")
- Improved frontend-backend state sync via SSE and query prioritization
- 12 new tests for step/stage completion logic
Separate explicit queue items (re-runs behind another job) from the
active run range. Completed stages within the active range now correctly
show "done" instead of staying "queued" after their steps finish.
- Clean up stale .lock dir, -journal, -wal, -shm files on startup
  before opening any book DB (node-sqlite3-wasm can't recover once
  it hits a stale lock)
- Close DB handles on open failure to prevent leaked locks
- Add busy_timeout pragma for concurrent request contention
- Gracefully degrade on "database is locked" in book list/detail
- Red retry button on errored stages, enable extract retry on error
@nicpottier nicpottier merged commit f90112c into main Feb 24, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant