Add integration test for full pipeline with LLM cache fixtures by nicpottier · Pull Request #16 · unicef/adt-studio

nicpottier · 2026-02-11T19:35:36Z

Summary

Refactored the CLI to extract core pipeline orchestration into a reusable runPipeline() function, eliminating logic duplication between the CLI and tests. The CLI is now a thin wrapper that translates progress events to terminal UI.

Created an integration test that runs the full pipeline against raven.pdf (pages 1-3) with pre-populated LLM cache fixtures for reproducible, fast execution (~5 seconds with no API calls).

Details

pipeline.ts: New file with runPipeline() and RunPipelineOptions interface
cli.ts: Refactored to be a thin CLI wrapper; all pipeline logic moved to pipeline.ts
index.ts: Exports runPipeline and RunPipelineOptions
pipeline-integration.test.ts: Integration test validating all 6 pipeline steps end-to-end
fixtures/raven-cache/: Git-tracked LLM cache fixtures (7 JSON files, ~3.5KB total)

Cache regeneration is simple: delete fixtures/raven-cache/ and rerun test with OPENAI_API_KEY set.

Tests

All 272 tests pass. The integration test confirms all pipeline steps complete successfully:

PDF extraction (3 pages)
Metadata extraction (title + metadata)
Text classification (groups per page)
Image classification (filtered by size)
Page sectioning (sections per page)
Web rendering (HTML per section)

Refactor cli.ts to extract core pipeline orchestration into a reusable runPipeline() function in pipeline.ts. Both the CLI and integration tests call this same function, eliminating pipeline logic duplication. The integration test runs the full pipeline against raven.pdf (pages 1-3) using pre-populated LLM cache fixtures for reproducible, fast execution (~5s with no API calls). Cache files are git-tracked; regenerate by deleting fixtures/raven-cache/ and rerunning with OPENAI_API_KEY set. All 272 tests pass; integration test validates complete pipeline: PDF extraction, metadata extraction, text classification, image classification, page sectioning, and web rendering.

nicpottier merged commit 8e975df into main Feb 11, 2026
1 check passed

nicpottier deleted the nicpottier/raven-integration-test branch February 11, 2026 19:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add integration test for full pipeline with LLM cache fixtures#16

Add integration test for full pipeline with LLM cache fixtures#16
nicpottier merged 1 commit intomainfrom
nicpottier/raven-integration-test

nicpottier commented Feb 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

nicpottier commented Feb 11, 2026

Summary

Details

Tests

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant