Writebook Scraper

A Ruby script that scrapes Writebook sites and converts them to local Markdown files with downloaded images.

Requirements

Ruby 3.4+

Installation

bundle install

Usage

bin/writebook_scraper <book_url>

Examples

# Scrape a Writebook site
bin/writebook_scraper https://books.example.com/1/my-book

# Another example
bin/writebook_scraper https://docs.example.org/2/user-guide

Output

The scraper creates an output/<book-slug>/ directory containing:

output/my-book/
├── index.md                 # Table of contents
├── getting-started.md       # Chapter files
├── hotkeys.md
├── themes.md
├── ...
└── images/
    ├── getting-started-screenshot.png
    ├── themes-tokyo-night.png
    └── ...

What gets converted

Chapter content with proper Markdown formatting
Headings (h1, h2, h3, etc.)
Tables
Code blocks and inline code
Links (internal and external)
Images (downloaded locally)
Lists (ordered and unordered)

Summarizing a Book

After scraping, you can generate an LLM-friendly summary:

bin/summarize output/my-book

This creates my-book.md with:

Book overview
Section summaries with key points
Quick reference tables (hotkeys, config paths, troubleshooting)

Requires the claude CLI to be installed and configured.

Token Reduction

For a sample 39-chapter technical manual:

Metric	Full Book	Summary	Reduction
Characters	89,441	9,422	89%
Words	11,786	1,500	87%
Est. Tokens	~22,360	~2,355	~90%

The summary preserves key information, commands, and configuration paths while reducing token usage by approximately 90%.

Running Tests

bundle exec rake test

How It Works

Fetches the book index page and extracts all chapter links
Downloads each chapter page
Extracts the main content area
Downloads all images to a local images/ directory
Converts HTML to Markdown using reverse_markdown
Cleans up anchor links and duplicate titles
Generates an index.md with table of contents

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.github/workflows		.github/workflows
bin		bin
lib		lib
test		test
.gitignore		.gitignore
Gemfile		Gemfile
README.md		README.md
Rakefile		Rakefile

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Writebook Scraper

Requirements

Installation

Usage

Examples

Output

What gets converted

Summarizing a Book

Token Reduction

Running Tests

How It Works

About

Uh oh!

Releases

Packages

Languages

robzolkos/writebook-scraper

Folders and files

Latest commit

History

Repository files navigation

Writebook Scraper

Requirements

Installation

Usage

Examples

Output

What gets converted

Summarizing a Book

Token Reduction

Running Tests

How It Works

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages