Skip to content

Add integration test for full pipeline with LLM cache fixtures#16

Merged
nicpottier merged 1 commit intomainfrom
nicpottier/raven-integration-test
Feb 11, 2026
Merged

Add integration test for full pipeline with LLM cache fixtures#16
nicpottier merged 1 commit intomainfrom
nicpottier/raven-integration-test

Conversation

@nicpottier
Copy link
Contributor

Summary

Refactored the CLI to extract core pipeline orchestration into a reusable runPipeline() function, eliminating logic duplication between the CLI and tests. The CLI is now a thin wrapper that translates progress events to terminal UI.

Created an integration test that runs the full pipeline against raven.pdf (pages 1-3) with pre-populated LLM cache fixtures for reproducible, fast execution (~5 seconds with no API calls).

Details

  • pipeline.ts: New file with runPipeline() and RunPipelineOptions interface
  • cli.ts: Refactored to be a thin CLI wrapper; all pipeline logic moved to pipeline.ts
  • index.ts: Exports runPipeline and RunPipelineOptions
  • pipeline-integration.test.ts: Integration test validating all 6 pipeline steps end-to-end
  • fixtures/raven-cache/: Git-tracked LLM cache fixtures (7 JSON files, ~3.5KB total)

Cache regeneration is simple: delete fixtures/raven-cache/ and rerun test with OPENAI_API_KEY set.

Tests

All 272 tests pass. The integration test confirms all pipeline steps complete successfully:

  • PDF extraction (3 pages)
  • Metadata extraction (title + metadata)
  • Text classification (groups per page)
  • Image classification (filtered by size)
  • Page sectioning (sections per page)
  • Web rendering (HTML per section)

Refactor cli.ts to extract core pipeline orchestration into a reusable runPipeline() function in pipeline.ts. Both the CLI and integration tests call this same function, eliminating pipeline logic duplication.

The integration test runs the full pipeline against raven.pdf (pages 1-3) using pre-populated LLM cache fixtures for reproducible, fast execution (~5s with no API calls). Cache files are git-tracked; regenerate by deleting fixtures/raven-cache/ and rerunning with OPENAI_API_KEY set.

All 272 tests pass; integration test validates complete pipeline: PDF extraction, metadata extraction, text classification, image classification, page sectioning, and web rendering.
@nicpottier nicpottier merged commit 8e975df into main Feb 11, 2026
1 check passed
@nicpottier nicpottier deleted the nicpottier/raven-integration-test branch February 11, 2026 19:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant