Skip to content

Add page sectioning, web rendering and replace Listr2 with cli-progress#4

Merged
nicpottier merged 1 commit intomainfrom
nicpottier/section-render
Feb 10, 2026
Merged

Add page sectioning, web rendering and replace Listr2 with cli-progress#4
nicpottier merged 1 commit intomainfrom
nicpottier/section-render

Conversation

@nicpottier
Copy link
Contributor

Summary

  • Add page sectioning pipeline step with LLM-based section grouping and pruned section type support
  • Add web rendering pipeline step that generates HTML for each section via LLM with HTML validation
  • Replace Listr2 with cli-progress for pipeline orchestration — fixes concurrency bug where nested subtask lists weren't properly awaited
  • Add per-step progress bars (Classify Images, Classify Text, Section Pages, Render Pages) and PDF extraction progress bar with metadata spinner
  • Add step and item_id fields to LLM log with v2→v3 schema migration
  • Make max retries configurable per step (default 8 for web rendering)

Test plan

  • pnpm typecheck passes
  • pnpm test — all tests pass
  • Manual test with pnpm pipeline raven assets/raven.pdf — pages process through Classify→Section→Render sequentially per page, concurrently across pages

…progress

Implements the remaining pipeline steps (page-sectioning, web-rendering with
HTML validation) and replaces Listr2 with cli-progress for reliable concurrent
page processing. Listr2 had a fundamental bug where nested task.newListr()
subtasks were not properly awaited under concurrency, causing pipeline steps
to run out of order.

Key changes:
- Add page-sectioning and web-rendering pipeline steps with Liquid prompts
- Add HTML validation (data-id uniqueness, text containment, image refs)
- Replace Listr2 with cli-progress MultiBar + custom async concurrency pool
- Generate unique per-text IDs (pg001_gp001_tx001) for web rendering
- Add configurable max_retries to StepConfig (default 8 for web rendering)
- Add step/item_id columns to llm_log table with v2→v3 migration
- Progress bars per pipeline step instead of per page for better scaling
- Spinner for metadata extraction, progress bar for PDF extraction
@nicpottier nicpottier merged commit 1dbfbe8 into main Feb 10, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant