Skip to content

profullstack/summary-forge-module

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Summary Forge Module

An intelligent tool that uses OpenAI's GPT-5 to forge comprehensive summaries of ebooks in multiple formats.

Repository: [email protected]:profullstack/summary-forge-module.git

Features

  • ๐Ÿ“š Multiple Input Formats: Supports PDF, EPUB files, and web page URLs
  • ๐ŸŒ Web Page Summarization: Fetch and summarize any web page with automatic content extraction
  • ๐Ÿค– AI-Powered Summaries: Uses GPT-5 with direct PDF upload for better quality
  • ๐Ÿ“Š Vision API: Preserves formatting, tables, diagrams, and images from PDFs
  • ๐Ÿงฉ Intelligent Chunking: Automatically processes large PDFs (500+ pages) without truncation
  • ๐Ÿ›ก๏ธ Directory Protection: Prompts before overwriting existing summaries (use --force to skip)
  • ๐Ÿ“ฆ Multiple Output Formats: Creates Markdown, PDF, EPUB, plain text, and MP3 audio summaries
  • ๐Ÿƒ Printable Flashcards: Generates double-sided flashcard PDFs for studying
  • ๐Ÿ–ผ๏ธ Flashcard Images: Individual PNG images for web app integration (q-001.png, a-001.png, etc.)
  • ๐ŸŽ™๏ธ Natural Audio Narration: AI-generated conversational audio script for better listening
  • ๐Ÿ—œ๏ธ Bundled Output: Packages everything into a convenient .tgz archive
  • ๐Ÿ”„ Auto-Conversion: Automatically converts EPUB to PDF using Calibre
  • ๐Ÿ” Book Search: Search Amazon by title using Rainforest API
  • ๐Ÿ“– Auto-Download: Downloads books from Anna's Archive with CAPTCHA solving
  • ๐Ÿ’ป CLI & Module: Use as a command-line tool or import as an ESM module
  • ๐ŸŽจ Interactive Mode: Guided workflow with inquirer prompts
  • ๐Ÿ“ฅ EPUB Priority: Automatically prefers EPUB format (open standard, more flexible)

Installation

Global Installation (CLI)

pnpm install -g @profullstack/summary-forge-module

Local Installation (Module)

pnpm add @profullstack/summary-forge-module

Prerequisites

  1. Node.js v20 or newer

  2. Calibre (for EPUB conversion - provides ebook-convert command)

    # macOS
    brew install calibre
    
    # Ubuntu/Debian
    sudo apt-get install calibre
    
    # Arch Linux
    sudo pacman -S calibre
  3. Pandoc (for document conversion)

    # macOS
    brew install pandoc
    
    # Ubuntu/Debian
    sudo apt-get install pandoc
    
    # Arch Linux
    sudo pacman -S pandoc
  4. XeLaTeX (for PDF generation)

    # macOS
    brew install --cask mactex
    
    # Ubuntu/Debian
    sudo apt-get install texlive-xetex
    
    # Arch Linux
    sudo pacman -S texlive-core texlive-xetex

CLI Usage

First-Time Setup

Before using the CLI, configure your API keys:

summary setup

This interactive command will prompt you for:

  • OpenAI API Key (required)
  • Rainforest API Key (optional - for Amazon book search)
  • ElevenLabs API Key (optional - for audio generation, get key here)
  • 2Captcha API Key (optional - for CAPTCHA solving, sign up here)
  • Browserless API Key (optional)
  • Browser and proxy settings

Configuration is saved to ~/.config/summary-forge/settings.json and used automatically by all CLI commands.

Managing Configuration

# View current configuration
summary config

# Update configuration
summary setup

# Delete configuration
summary config --delete

Note: The CLI will use configuration in this priority order:

  1. Environment variables (.env file)
  2. Configuration file (~/.config/summary-forge/settings.json)

Interactive Mode (Recommended)

summary interactive
# or
summary i

This launches an interactive menu where you can:

  • Process local files (PDF/EPUB)
  • Process web page URLs
  • Search for books by title
  • Look up books by ISBN/ASIN

Process a File

summary file /path/to/book.pdf
summary file /path/to/book.epub

# Force overwrite if directory already exists
summary file /path/to/book.pdf --force
summary file /path/to/book.pdf -f

Process a Web Page URL

summary url https://example.com/article
summary url https://blog.example.com/post/123

# Force overwrite if directory already exists
summary url https://example.com/article --force
summary url https://example.com/article -f

Features:

  • Automatically fetches web page content using Puppeteer
  • Sanitizes HTML to remove navigation, ads, footers, and other non-content elements
  • Saves web page as PDF for processing
  • Generates clean title from page title or uses OpenAI to create one
  • Prompts specifically optimized for web page content (ignores nav/ads/footers)
  • Creates same output formats as book processing (MD, TXT, PDF, EPUB, MP3, flashcards)

Search by Title

# Search for books (defaults to 1lib.sk - faster, no DDoS protection)
summary search "LLM Fine Tuning"
summary search "JavaScript" --max-results 5 --extensions pdf,epub
summary search "Python" --year-from 2020 --year-to 2024
summary search "Machine Learning" --languages english --order date

# Use Anna's Archive instead (has DDoS protection, slower)
summary search "Clean Code" --source anna
summary search "Rare Book" --source anna --sources zlib,lgli

# Title search (shortcut for search command)
summary title "A Philosophy of Software Design"
summary title "Clean Code" --force  # Auto-select first result
summary title "Python" --source anna  # Use Anna's Archive

# ISBN lookup (defaults to 1lib.sk)
summary isbn 9780134685991
summary isbn B075HYVHWK --force  # Auto-select and process
summary isbn 9780134685991 --source anna  # Use Anna's Archive

# Common Options:
#   --source <source>              Search source: zlib (1lib.sk, default) or anna (Anna's Archive)
#   -n, --max-results <number>     Maximum results to display (default: 10)
#   -f, --force                    Auto-select first result and process immediately
#
# 1lib.sk Options (--source zlib, default):
#   --year-from <year>             Filter by publication year from (e.g., 2020)
#   --year-to <year>               Filter by publication year to (e.g., 2024)
#   -l, --languages <languages>    Language filter, comma-separated (default: english)
#   -e, --extensions <extensions>  File extensions, comma-separated (case-insensitive, default: PDF)
#   --content-types <types>        Content types, comma-separated (default: book)
#   -s, --order <order>            Sort order: date (newest) or empty for relevance
#   --view <view>                  View type: list or grid (default: list)
#
# Anna's Archive Options (--source anna):
#   -f, --format <format>          Filter by format: pdf, epub, pdf,epub, or all (default: pdf)
#   -s, --sort <sort>              Sort by: date (newest) or empty for relevance (default: '')
#   -l, --language <language>      Language code(s), comma-separated (e.g., en, es, fr) (default: en)
#   --sources <sources>            Data sources, comma-separated (default: all sources)
#                                  Options: zlib, lgli, lgrs, and others

Look up by ISBN/ASIN

summary isbn B075HYVHWK

# Force overwrite if directory already exists
summary isbn B075HYVHWK --force
summary isbn B075HYVHWK -f

Help

summary --help
summary file --help

Programmatic Usage

JSON API Format

All methods now return consistent JSON objects with the following structure:

{
  success: true | false,  // Indicates if operation succeeded
  ...data,                // Method-specific data fields
  error?: string,         // Error message (only when success is false)
  message?: string        // Success message (optional)
}

This enables:

  • โœ… Consistent error handling - Check success field instead of try-catch
  • โœ… REST API ready - Direct JSON responses for HTTP endpoints
  • โœ… Better debugging - Rich metadata in all responses
  • โœ… Type-safe - Predictable structure for TypeScript users

Basic Example

import { SummaryForge } from '@profullstack/summary-forge-module';
import { loadConfig } from '@profullstack/summary-forge-module/config';

// Load config from ~/.config/summary-forge/settings.json
const configResult = await loadConfig();
if (!configResult.success) {
  console.error('Failed to load config:', configResult.error);
  process.exit(1);
}

const forge = new SummaryForge(configResult.config);

const result = await forge.processFile('./my-book.pdf');
if (result.success) {
  console.log('Summary created:', result.archive);
  console.log('Files:', result.files);
  console.log('Costs:', result.costs);
} else {
  console.error('Processing failed:', result.error);
}

Configuration Options

import { SummaryForge } from '@profullstack/summary-forge-module';

const forge = new SummaryForge({
  // Required
  openaiApiKey: 'sk-...',
  
  // Optional API keys
  rainforestApiKey: 'your-key',      // For Amazon search
  elevenlabsApiKey: 'sk-...',        // For audio generation (get key: https://try.elevenlabs.io/oh7kgotrpjnv)
  twocaptchaApiKey: 'your-key',      // For CAPTCHA solving (sign up: https://2captcha.com/?from=9630996)
  browserlessApiKey: 'your-key',     // For browserless.io
  
  // Processing options
  maxChars: 500000,                  // Max chars to process
  maxTokens: 20000,                  // Max tokens in summary
  
  // Audio options
  voiceId: '21m00Tcm4TlvDq8ikWAM',  // ElevenLabs voice
  voiceSettings: {
    stability: 0.5,
    similarity_boost: 0.75
  },
  
  // Browser options
  headless: true,                    // Run browser in headless mode
  enableProxy: false,                // Enable proxy
  proxyUrl: 'http://proxy.com',     // Proxy URL
  proxyUsername: 'user',             // Proxy username
  proxyPassword: 'pass',             // Proxy password
  proxyPoolSize: 36                  // Number of proxies in pool (default: 36)
});

const result = await forge.processFile('./book.epub');
console.log('Archive:', result.archive);

Search for Books

Using Amazon/Rainforest API

const forge = new SummaryForge({
  openaiApiKey: process.env.OPENAI_API_KEY,
  rainforestApiKey: process.env.RAINFOREST_API_KEY
});

const searchResult = await forge.searchBookByTitle('Clean Code');
if (!searchResult.success) {
  console.error('Search failed:', searchResult.error);
  process.exit(1);
}

console.log(`Found ${searchResult.count} results:`);
console.log(searchResult.results.map(b => ({
  title: b.title,
  author: b.author,
  asin: b.asin
})));

// Get download URL
const url = forge.getAnnasArchiveUrl(searchResult.results[0].asin);
console.log('Download from:', url);

Using Anna's Archive Direct Search (No Rainforest API Required)

const forge = new SummaryForge({
  openaiApiKey: process.env.OPENAI_API_KEY,
  enableProxy: true,
  proxyUrl: process.env.PROXY_URL,
  proxyUsername: process.env.PROXY_USERNAME,
  proxyPassword: process.env.PROXY_PASSWORD
});

// Basic search
const searchResult = await forge.searchAnnasArchive('JavaScript', {
  maxResults: 10,
  format: 'pdf',
  sortBy: 'date'  // Sort by newest
});

if (!searchResult.success) {
  console.error('Search failed:', searchResult.error);
  process.exit(1);
}

console.log(`Found ${searchResult.count} results`);
console.log(searchResult.results.map(r => ({
  title: r.title,
  author: r.author,
  format: r.format,
  size: `${r.sizeInMB.toFixed(1)} MB`,
  url: r.url
})));

// Download the first result
if (searchResult.results.length > 0) {
  const md5 = searchResult.results[0].href.match(/\/md5\/([a-f0-9]+)/)[1];
  const downloadResult = await forge.downloadFromAnnasArchive(md5, '.', searchResult.results[0].title);
  
  if (downloadResult.success) {
    console.log('Downloaded:', downloadResult.filepath);
    console.log('Directory:', downloadResult.directory);
  } else {
    console.error('Download failed:', downloadResult.error);
  }
}

Using 1lib.sk Search (Faster, No DDoS Protection)

const forge = new SummaryForge({
  openaiApiKey: process.env.OPENAI_API_KEY,
  enableProxy: true,
  proxyUrl: process.env.PROXY_URL,
  proxyUsername: process.env.PROXY_USERNAME,
  proxyPassword: process.env.PROXY_PASSWORD
});

// Basic search
const searchResult = await forge.search1lib('LLM Fine Tuning', {
  maxResults: 10,
  yearFrom: 2020,
  languages: ['english'],
  extensions: ['PDF']
});

if (!searchResult.success) {
  console.error('Search failed:', searchResult.error);
  process.exit(1);
}

console.log(`Found ${searchResult.count} results`);
console.log(searchResult.results.map(r => ({
  title: r.title,
  author: r.author,
  year: r.year,
  extension: r.extension,
  size: r.size,
  language: r.language,
  isbn: r.isbn,
  url: r.url
})));

// Download the first result
if (searchResult.results.length > 0) {
  const downloadResult = await forge.downloadFrom1lib(
    searchResult.results[0].url,
    '.',
    searchResult.results[0].title
  );
  
  if (downloadResult.success) {
    console.log('Downloaded:', downloadResult.filepath);
    
    // Process the downloaded book
    const processResult = await forge.processFile(downloadResult.filepath, downloadResult.identifier);
    if (processResult.success) {
      console.log('Summary created:', processResult.archive);
      console.log('Costs:', processResult.costs);
    } else {
      console.error('Processing failed:', processResult.error);
    }
  } else {
    console.error('Download failed:', downloadResult.error);
  }
}

Enhanced Error Handling:

The 1lib.sk download functionality includes robust error handling with automatic debugging:

  • Multiple Selector Fallbacks: Tries 6 different selectors to find download buttons
  • Debug HTML Capture: Saves page HTML when download button isn't found
  • Link Analysis: Lists all links on the page for troubleshooting
  • Detailed Error Messages: Provides actionable information for debugging

If a download fails, check the debug-book-page.html file in the book's directory for detailed page structure information.

API Reference

Constructor Options

new SummaryForge({
  // API Keys
  openaiApiKey: string,      // Required: OpenAI API key
  rainforestApiKey: string,  // Optional: For title search
  elevenlabsApiKey: string,  // Optional: For audio generation
  twocaptchaApiKey: string,  // Optional: For CAPTCHA solving
  browserlessApiKey: string, // Optional: For browserless.io
  
  // Processing Options
  maxChars: number,          // Optional: Max chars to process (default: 400000)
  maxTokens: number,         // Optional: Max tokens in summary (default: 16000)
  
  // Audio Options
  voiceId: string,           // Optional: ElevenLabs voice ID (default: Brian)
  voiceSettings: object,     // Optional: Voice customization settings
  
  // Browser Options
  headless: boolean,         // Optional: Run browser in headless mode (default: true)
  enableProxy: boolean,      // Optional: Enable proxy (default: false)
  proxyUrl: string,          // Optional: Proxy URL
  proxyUsername: string,     // Optional: Proxy username
  proxyPassword: string,     // Optional: Proxy password
  proxyPoolSize: number      // Optional: Number of proxies in pool (default: 36)
})

Methods

All methods return JSON objects with { success, ...data, error?, message? } format.

Processing Methods
  • processFile(filePath, asin?) - Process a PDF or EPUB file

    • Returns: { success, basename, markdown, files, archive, hasAudio, asin, costs, message, error? }
    • Example:
      const result = await forge.processFile('./book.pdf');
      if (result.success) {
        console.log('Archive:', result.archive);
        console.log('Costs:', result.costs);
      }
  • processWebPage(url, outputDir?) - Process a web page URL

    • Returns: { success, basename, dirName, markdown, files, directory, archive, hasAudio, url, title, costs, message, error? }
    • Example:
      const result = await forge.processWebPage('https://example.com/article');
      if (result.success) {
        console.log('Summary:', result.markdown.substring(0, 100));
      }
Search Methods
  • searchBookByTitle(title) - Search Amazon using Rainforest API

    • Returns: { success, results, count, query, message, error? }
    • Example:
      const result = await forge.searchBookByTitle('Clean Code');
      if (result.success) {
        console.log(`Found ${result.count} books`);
      }
  • searchAnnasArchive(query, options?) - Search Anna's Archive directly

    • Returns: { success, results, count, query, options, message, error? }
    • Example:
      const result = await forge.searchAnnasArchive('JavaScript', {
        maxResults: 10,
        format: 'pdf',
        sortBy: 'date'
      });
      if (result.success) {
        console.log(`Found ${result.count} results`);
      }
  • search1lib(query, options?) - Search 1lib.sk

    • Returns: { success, results, count, query, options, message, error? }
Download Methods
  • downloadFromAnnasArchive(asin, outputDir?, bookTitle?) - Download from Anna's Archive

    • Returns: { success, filepath, directory, asin, format, message, error? }
    • Example:
      const result = await forge.downloadFromAnnasArchive('B075HYVHWK', '.');
      if (result.success) {
        console.log('Downloaded to:', result.filepath);
      }
  • downloadFrom1lib(bookUrl, outputDir?, bookTitle?, downloadUrl?) - Download from 1lib.sk

    • Returns: { success, filepath, directory, title, format, message, error? }
  • search1libAndDownload(query, searchOptions?, outputDir?, selectCallback?) - Search and download in one session

    • Returns: { success, results, download, message, error? }
Generation Methods
  • generateSummary(pdfPath) - Generate AI summary from PDF

    • Returns: { success, markdown, length, method, chunks?, message, error? }
    • Methods: gpt5_pdf_upload, text_extraction_single, text_extraction_chunked
    • Example:
      const result = await forge.generateSummary('./book.pdf');
      if (result.success) {
        console.log(`Generated ${result.length} char summary using ${result.method}`);
      }
  • generateAudioScript(markdown) - Generate audio-friendly narration script

    • Returns: { success, script, length, message }
  • generateAudio(text, outputPath) - Generate audio using ElevenLabs TTS

    • Returns: { success, path, size, duration, message, error? }
  • generateOutputFiles(markdown, basename, outputDir) - Generate all output formats

    • Returns: { success, files: {...}, message }
Utility Methods
  • convertEpubToPdf(epubPath) - Convert EPUB to PDF

    • Returns: { success, pdfPath, originalPath, message, error? }
  • createBundle(files, archiveName) - Create tar.gz archive

    • Returns: { success, path, files, message, error? }
  • getCostSummary() - Get cost tracking information

    • Returns: { success, openai, elevenlabs, rainforest, total, breakdown }

Configuration

CLI Configuration (Recommended)

For CLI usage, run the setup command to configure your API keys:

summary setup

This saves your configuration to ~/.config/summary-forge/settings.json so you don't need to manage environment variables.

Environment Variables (Alternative)

For programmatic usage or if you prefer environment variables, create a .env file:

OPENAI_API_KEY=sk-your-key-here
RAINFOREST_API_KEY=your-key-here
ELEVENLABS_API_KEY=sk-your-key-here  # Optional: for audio generation
TWOCAPTCHA_API_KEY=your-key-here      # Optional: for CAPTCHA solving
BROWSERLESS_API_KEY=your-key-here     # Optional

# Browser Configuration
HEADLESS=true                          # Run browser in headless mode
ENABLE_PROXY=false                     # Enable proxy for browser requests
PROXY_URL=http://proxy.example.com    # Proxy URL (if enabled)
PROXY_USERNAME=username                # Proxy username (if enabled)
PROXY_PASSWORD=password                # Proxy password (if enabled)
PROXY_POOL_SIZE=36                     # Number of proxies in your pool (default: 36)

Or set them in your shell:

export OPENAI_API_KEY=sk-your-key-here
export RAINFOREST_API_KEY=your-key-here
export ELEVENLABS_API_KEY=sk-your-key-here  # Optional

Configuration Priority

When using the module programmatically, configuration is loaded in this order (highest priority first):

  1. Constructor options - Passed directly to new SummaryForge(options)
  2. Environment variables - From .env file or shell
  3. Config file - From ~/.config/summary-forge/settings.json (CLI only)

Proxy Configuration (Recommended for Anna's Archive)

To avoid IP bans when downloading from Anna's Archive, configure a proxy during setup:

summary setup

When prompted:

  1. Enable proxy: Yes
  2. Enter proxy URL: http://your-proxy.com:8080
  3. Enter proxy username and password

Why use a proxy?

  • โœ… Avoids IP bans from Anna's Archive
  • โœ… USA-based proxies prevent geo-location issues
  • โœ… Works with both browser navigation and file downloads
  • โœ… Automatically applied to all download operations

Recommended Proxy Service:

We recommend Webshare.io for reliable, USA-based proxies:

  • ๐ŸŒŽ USA-based IPs (no geo-location issues)
  • โšก Fast and reliable
  • ๐Ÿ’ฐ Affordable pricing with free tier
  • ๐Ÿ”’ HTTP/HTTPS/SOCKS5 support

Important: Use Static Proxies for Sticky Sessions

For Anna's Archive downloads, you need a static/direct proxy (not rotating) to maintain the same IP:

  1. In your Webshare dashboard, go to Proxy โ†’ List
  2. Copy a Static Proxy endpoint (not the rotating endpoint)
  3. Use the format: http://host:port (e.g., http://45.95.96.132:8080)
  4. Username format: dmdgluqz-US-{session_id} (session ID added automatically)

The tool automatically generates a unique session ID (1 to PROXY_POOL_SIZE) for each download to get a fresh IP, while maintaining that IP throughout the 5-10 minute download process.

Proxy Pool Size Configuration:

Set PROXY_POOL_SIZE to match your Webshare plan (default: 36):

  • Free tier: 10 proxies โ†’ PROXY_POOL_SIZE=10
  • Starter plan: 25 proxies โ†’ PROXY_POOL_SIZE=25
  • Professional plan: 100 proxies โ†’ PROXY_POOL_SIZE=100
  • Enterprise plan: 250+ proxies โ†’ PROXY_POOL_SIZE=250

The tool will randomly select a session ID from 1 to your pool size, distributing load across all available proxies.

Smart ISBN Detection:

When searching Anna's Archive, the tool automatically detects whether an identifier is a real ISBN or an Amazon ASIN:

  • Real ISBNs (10 or 13 numeric digits): Searches by ISBN for precise results
  • Amazon ASINs (alphanumeric): Searches by book title instead for better results
  • This ensures you get relevant search results even when Amazon returns proprietary ASINs instead of standard ISBNs

Note: Rotating proxies (p.webshare.io) don't support sticky sessions. Use individual static proxy IPs from your proxy list instead.

Testing your proxy:

node test-proxy.js <ASIN>

This will verify your proxy configuration by attempting to download a book.

Audio Generation

Audio generation is optional and requires an ElevenLabs API key. If the key is not provided, the tool will skip audio generation and only create text-based outputs.

Get ElevenLabs API Key: Sign up here for high-quality text-to-speech.

Features:

  • Uses ElevenLabs Turbo v2.5 model (optimized for audiobooks)
  • Default voice: Brian (best for technical content, customizable)
  • Automatically truncates long texts to fit API limits
  • Generates high-quality MP3 audio files
  • Natural, conversational narration style

Output

The tool generates:

  • <book_name>_summary.md - Markdown summary
  • <book_name>_summary.txt - Plain text summary
  • <book_name>_summary.pdf - PDF summary with table of contents
  • <book_name>_summary.epub - EPUB summary with clickable TOC
  • <book_name>_summary.mp3 - Audio summary (if ElevenLabs key provided)
  • <book_name>.pdf - Original or converted PDF
  • <book_name>.epub - Original EPUB (if input was EPUB)
  • <book_name>_bundle.tgz - Compressed archive containing all files

Example Workflow

# 1. Search for a book
summary search
# Enter: "A Philosophy of Software Design"
# Select from results, get ASIN

# 2. Download and process automatically
summary isbn B075HYVHWK
# Downloads, asks if you want to process
# Creates summary bundle automatically!

# Alternative: Process a local file
summary file ~/Downloads/book.epub

How It Works

  1. Input Processing: Accepts PDF or EPUB files (EPUB is converted to PDF)
  2. Smart Processing Strategy:
    • Small PDFs (<400k chars): Direct upload to OpenAI's vision API
    • Large PDFs (>400k chars): Intelligent chunking with synthesis
  3. AI Summarization: GPT-5 analyzes content with full formatting, tables, and diagrams
  4. Format Conversion: Uses Pandoc to convert the Markdown summary to PDF and EPUB
  5. Audio Generation: Optional TTS conversion using ElevenLabs
  6. Bundling: Creates a compressed archive with all generated files

Intelligent Chunking for Large PDFs

For PDFs exceeding 400,000 characters (typically 500+ pages), the tool automatically uses an intelligent chunking strategy:

How it works:

  1. Analysis: Calculates optimal chunk size based on PDF statistics
  2. Page-Based Chunking: Splits PDF into logical chunks (typically 50-150k chars each)
  3. Parallel Processing: Each chunk is summarized independently by GPT-5
  4. Intelligent Synthesis: All chunk summaries are combined into a cohesive final summary
  5. Quality Preservation: Maintains narrative flow and eliminates redundancy

Benefits:

  • โœ… Complete Coverage: Processes entire books without truncation
  • โœ… High Quality: Each section gets full AI attention
  • โœ… Seamless Output: Final summary reads as a unified document
  • โœ… Cost Efficient: Optimizes token usage across multiple API calls
  • โœ… Automatic: No configuration needed - works transparently

Example Output:

๐Ÿ“Š PDF Stats: 523 pages, 1,245,678 chars, ~311,420 tokens
๐Ÿ“š PDF is large - using intelligent chunking strategy
   This will process the ENTIRE 523-page PDF without truncation
๐Ÿ“ Using chunk size: 120,000 chars
๐Ÿ“ฆ Created 11 chunks for processing
   Chunk 1: Pages 1-48 (119,234 chars)
   Chunk 2: Pages 49-95 (118,901 chars)
   ...
โœ… All 11 chunks processed successfully
๐Ÿ”„ Synthesizing chunk summaries into final comprehensive summary...
โœ… Final summary synthesized: 45,678 characters

Why Direct PDF Upload?

The tool prioritizes OpenAI's vision API for direct PDF upload when possible:

  • โœ… Better Quality: Preserves document formatting, tables, and diagrams
  • โœ… More Accurate: AI can see the actual PDF layout and structure
  • โœ… Better for Technical Books: Code examples and diagrams are preserved
  • โœ… Fallback Strategy: Automatically switches to intelligent chunking for large files

Testing

Summary Forge includes a comprehensive test suite using Vitest.

Run Tests

# Run all tests
pnpm test

# Run tests in watch mode
pnpm test:watch

# Run tests with coverage report
pnpm test:coverage

Test Coverage

The test suite includes:

  • โœ… 30+ passing tests
  • Constructor validation
  • Helper method tests
  • PDF upload functionality tests
  • API integration tests
  • Error handling tests
  • Edge case coverage
  • File operation tests

See test/summary-forge.test.js for the complete test suite.

Flashcard Generation

Summary Forge includes powerful flashcard generation capabilities for study and review.

Printable PDF Flashcards

Generate double-sided flashcard PDFs optimized for printing:

import { extractFlashcards, generateFlashcardsPDF } from '@profullstack/summary-forge-module/flashcards';
import fs from 'node:fs/promises';

// Read your markdown summary
const markdown = await fs.readFile('./book_summary.md', 'utf-8');

// Extract Q&A pairs
const extractResult = extractFlashcards(markdown, { maxCards: 50 });
console.log(`Extracted ${extractResult.count} flashcards`);

// Generate printable PDF
const pdfResult = await generateFlashcardsPDF(
  extractResult.flashcards,
  './flashcards.pdf',
  {
    title: 'JavaScript Fundamentals',
    branding: 'SummaryForge.com',
    cardWidth: 3.5,   // inches
    cardHeight: 2.5,  // inches
    fontSize: 11
  }
);

console.log(`PDF created: ${pdfResult.path}`);
console.log(`Total pages: ${pdfResult.pages}`);

Individual Flashcard Images

Generate individual PNG images for each flashcard, perfect for web applications:

import { extractFlashcards, generateFlashcardImages } from '@profullstack/summary-forge-module/flashcards';
import fs from 'node:fs/promises';

// Read your markdown summary
const markdown = await fs.readFile('./book_summary.md', 'utf-8');

// Extract Q&A pairs
const extractResult = extractFlashcards(markdown);

// Generate individual PNG images
const imageResult = await generateFlashcardImages(
  extractResult.flashcards,
  './flashcards',  // Output directory
  {
    title: 'JavaScript Fundamentals',
    branding: 'SummaryForge.com',
    width: 800,   // pixels
    height: 600,  // pixels
    fontSize: 24
  }
);

if (imageResult.success) {
  console.log(`Generated ${imageResult.images.length} images`);
  console.log('Files:', imageResult.images);
  // Output: ['./flashcards/q-001.png', './flashcards/a-001.png', ...]
}

Image Naming Convention:

  • q-001.png, q-002.png, etc. - Question cards
  • a-001.png, a-002.png, etc. - Answer cards

Use Cases:

  • ๐ŸŒ Web-based flashcard applications
  • ๐Ÿ“ฑ Mobile learning apps
  • ๐ŸŽฎ Interactive quiz games
  • ๐Ÿ“Š Study progress tracking systems
  • ๐Ÿ”„ Spaced repetition software

Features:

  • โœ… Clean, professional design with book title
  • โœ… Automatic text wrapping for long content
  • โœ… Customizable dimensions and styling
  • โœ… SVG-based rendering for crisp quality
  • โœ… Works in Docker (no native dependencies)

Flashcard Extraction Formats

The extractFlashcards function supports multiple markdown formats:

1. Explicit Q&A Format:

**Q: What is a closure?**
A: A closure is a function that has access to variables in its outer scope.

2. Definition Lists:

**Closure**
: A function that has access to variables in its outer scope.

3. Question Headers:

### What is a closure?

A closure is a function that has access to variables in its outer scope.

Examples

See the examples/ directory for more usage examples:

Troubleshooting

Rate Limiting (1lib.sk)

If you encounter "Too many requests" errors from 1lib.sk:

Error Message:

Too many requests from your IP xxx.xxx.xxx.xxx
Please wait 10 seconds. [email protected]. Err #ipd1

Automatic Handling: The tool automatically detects rate limiting and:

  • โœ… Waits the requested time (usually 10 seconds)
  • โœ… Retries up to 3 times with exponential backoff
  • โœ… Adds a 2-second buffer to ensure rate limit has cleared

Manual Solutions:

  1. Wait a few minutes before trying again
  2. Use a different proxy session (the tool rotates through your proxy pool automatically)
  3. Switch to Anna's Archive: summary search "book title" --source anna
  4. Reduce concurrent requests if running multiple downloads

Note: The proxy pool helps distribute requests across different IPs, reducing rate limiting issues.

Download Button Not Found (1lib.sk)

If you encounter "Download button not found" errors when downloading from 1lib.sk:

  1. Check Debug Files: The tool automatically saves debug-book-page.html in the book's directory

    • Open this file to inspect the actual page structure
    • Look for download links or buttons that might have different selectors
  2. Review Error Output: The error message includes:

    • All selectors that were tried
    • List of links found on the page
    • Location of the debug HTML file
  3. Common Causes:

    • Z-Access/Library Access Page: Book page redirects to authentication page (most common)
    • Page structure changed (1lib.sk updates their site)
    • Book is deleted or unavailable
    • Session expired or cookies not maintained
    • Proxy issues preventing proper page load
  4. Solutions:

    • Recommended: Use Anna's Archive instead: summary search "book title" --source anna
    • Try the search1lib command separately to verify the book exists
    • Check if the book page loads correctly in a regular browser with the same proxy
    • Verify proxy configuration is working correctly
    • Try a different book from search results
  5. Known Issue - Z-Access Page: If you see links to library-access.sk or Z-Access page in the debug output, this means:

    • The book page requires authentication or special access
    • 1lib.sk's session management is blocking automated access
    • Workaround: Use Anna's Archive which has better automation support

Example Debug Output (Z-Access Issue):

โŒ Download button not found on book page
   Debug HTML saved to: ./uploads/book_name/debug-book-page.html
   Found 6 links on page
   First 5 links:
   - https://library-access.sk (Z-Access page)
   - mailto:[email protected] ([email protected])
   - https://www.reddit.com/r/zlibrary (https://www.reddit.com/r/zlibrary)

Recommended Alternative:

# Use Anna's Archive instead (more reliable for automation)
summary search "prompt engineering" --source anna

IP Bans from Anna's Archive

If you're getting blocked by Anna's Archive:

  1. Enable proxy in your configuration:

    summary setup
  2. Use a USA-based proxy to avoid geo-location issues

  3. Test your proxy before downloading:

    node test-proxy.js B0BCTMXNVN
  4. Run browser in visible mode to debug:

    summary config --headless false

Proxy Configuration

The proxy is used for:

  • โœ… Browser navigation (Puppeteer)
  • โœ… File downloads (fetch with https-proxy-agent)
  • โœ… All HTTP requests to Anna's Archive

Supported proxy formats:

  • http://proxy.example.com:8080
  • https://proxy.example.com:8080
  • socks5://proxy.example.com:1080
  • http://proxy.example.com:8080-session-<SESSION_ID> (sticky session)

Recommended Service: Webshare.io - Reliable USA-based proxies with free tier available.

Webshare Sticky Sessions: Add -session-<YOUR_SESSION_ID> to your proxy URL to maintain the same IP:

http://p.webshare.io:80-session-myapp123

CAPTCHA Solving

When downloading from Anna's Archive, you may encounter CAPTCHAs. To automatically solve them:

  1. Sign up for 2Captcha: Get API key here
  2. Add to configuration:
    summary setup
  3. Enter your 2Captcha API key when prompted

The tool will automatically detect and solve CAPTCHAs during downloads, making the process fully automated.

Limitations

  • Maximum PDF file size: No practical limit (intelligent chunking handles any size)
  • GPT-5 uses default temperature of 1 (not configurable)
  • Requires external tools: Calibre, Pandoc, XeLaTeX
  • CAPTCHA solving requires 2captcha.com API key (optional)
  • Very large PDFs (1000+ pages) may incur higher API costs due to multiple chunk processing
  • Anna's Archive may block IPs without proxy configuration
  • Chunked processing uses text extraction (images/diagrams described in text only)

Roadmap

  • ISBN/ASIN lookup via Anna's Archive
  • Automatic download from Anna's Archive with CAPTCHA solving
  • Book title search via Rainforest API
  • CLI with interactive mode
  • ESM module for programmatic use
  • Audio generation with ElevenLabs TTS
  • Direct PDF upload to OpenAI vision API
  • EPUB format prioritization (open standard)
  • Support for more input formats (MOBI, AZW3)
  • Chunked processing for very large books (>100MB)
  • Custom summary templates
  • Web interface
  • Multiple voice options for audio
  • Audio chapter markers
  • Batch processing multiple books

License

ISC

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

About

An intelligent tool that uses OpenAI's GPT-5 to forge comprehensive summaries of ebooks in multiple formats.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published