A Ruby script that scrapes Writebook sites and converts them to local Markdown files with downloaded images.
- Ruby 3.4+
bundle installbin/writebook_scraper <book_url># Scrape a Writebook site
bin/writebook_scraper https://books.example.com/1/my-book
# Another example
bin/writebook_scraper https://docs.example.org/2/user-guideThe scraper creates an output/<book-slug>/ directory containing:
output/my-book/
├── index.md # Table of contents
├── getting-started.md # Chapter files
├── hotkeys.md
├── themes.md
├── ...
└── images/
├── getting-started-screenshot.png
├── themes-tokyo-night.png
└── ...
- Chapter content with proper Markdown formatting
- Headings (h1, h2, h3, etc.)
- Tables
- Code blocks and inline code
- Links (internal and external)
- Images (downloaded locally)
- Lists (ordered and unordered)
After scraping, you can generate an LLM-friendly summary:
bin/summarize output/my-bookThis creates my-book.md with:
- Book overview
- Section summaries with key points
- Quick reference tables (hotkeys, config paths, troubleshooting)
Requires the claude CLI to be installed and configured.
For a sample 39-chapter technical manual:
| Metric | Full Book | Summary | Reduction |
|---|---|---|---|
| Characters | 89,441 | 9,422 | 89% |
| Words | 11,786 | 1,500 | 87% |
| Est. Tokens | ~22,360 | ~2,355 | ~90% |
The summary preserves key information, commands, and configuration paths while reducing token usage by approximately 90%.
bundle exec rake test- Fetches the book index page and extracts all chapter links
- Downloads each chapter page
- Extracts the main content area
- Downloads all images to a local
images/directory - Converts HTML to Markdown using
reverse_markdown - Cleans up anchor links and duplicate titles
- Generates an
index.mdwith table of contents