Skip to content

Add glossary feature with dynamic translation step hiding#43

Merged
nicpottier merged 3 commits intomainfrom
nicpottier/glossary
Feb 14, 2026
Merged

Add glossary feature with dynamic translation step hiding#43
nicpottier merged 3 commits intomainfrom
nicpottier/glossary

Conversation

@nicpottier
Copy link
Contributor

Summary

Adds the glossary feature ported from the Python adt-press repository, integrated as a post-storyboard proof step. Extracts pedagogically relevant vocabulary from rendered HTML text and generates definitions via LLM.

Also implements dynamic translation step hiding: the backend now emits a step-skip event when translation is determined unnecessary (editing language matches book language), allowing the frontend to reactively hide the step without stale config data issues.

Changes

  • Glossary generation: Extract text from web-rendering output, batch pages (~10/call), generate vocab items with LLM caching
  • Proof stage integration: Glossary runs after image captioning, stores items with page count and timestamp
  • Dynamic step filtering: Backend-driven visibility prevents race conditions with config updates
  • Complete test coverage: 27 new tests (glossary pipeline, API, proof runner) — all 389 tests passing

Test Plan

  • Typecheck: pnpm typecheck
  • All tests pass: pnpm test ✓ (389/389)
  • Glossary generation with batching and deduplication
  • Translation step hidden when languages match, shown when they differ

- Implement glossary generation from rendered HTML text with batching
  * Zod schemas for glossary items and LLM output
  * Pure extraction function with DOM parser (htmlparser2)
  * Configurable batching, model, and language settings
  * Automatic deduplication (case-insensitive, first-wins)

- Integrate glossary as post-storyboard proof step (after image captioning)
  * Add glossary to StepName enum and AppConfig schema
  * Run glossary generation per page batch with LLM caching
  * Store glossary items with page count and timestamp

- Dynamically hide translation step when not needed
  * Add step-skip event type to ProgressEvent schema
  * Backend emits step-skip when translation is determined unnecessary
  * Frontend tracks skipped steps and filters them from display
  * Fixes race condition with stale config data

- Add complete test coverage (27 new tests, all passing)
  * Pipeline glossary tests: HTML stripping, config, text collection, generation
  * API route tests: retrieval, error handling
  * Proof runner integration tests with cache pre-population
@nicpottier nicpottier merged commit d31e6d2 into main Feb 14, 2026
1 check passed
@nicpottier nicpottier deleted the nicpottier/glossary branch February 14, 2026 18:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant