A Python web scraping tool that extracts talks from Church of Jesus Christ General Conference pages and generates formatted PDF documents.
NOTE: This tool was AI Generated and not reflective of my skills. I am a JavaScript developer not a Python developer. I am making this public in case someone else would like a PDF of confrence in a nice simple format like I do.
- Scrapes conference talks from churchofjesuschrist.org
- Extracts speaker names, talk titles, and full content
- Includes images from talks in the PDF (photos, artwork, etc.)
- PDF Bookmarks/Outline - Navigate easily between sessions and talks using the PDF reader's sidebar
- Generates professionally formatted PDF documents
- Includes cover page and session dividers
- Reusable for different conference years and sessions
- Saves intermediate JSON data for debugging
- Can adjust page size
- Python 3.7 or higher
- pip (Python package manager)
Run the setup script to automatically create a virtual environment and install dependencies:
./setup.shThen activate the virtual environment:
source venv/bin/activateIf you prefer to set up manually:
# Create virtual environment
python3 -m venv venv
# Activate virtual environment
source venv/bin/activate
# Install dependencies
pip install -r requirements.txtOr install packages directly:
pip install reportlab PillowImportant: Make sure to activate the virtual environment before running the scripts:
source venv/bin/activateGenerate a PDF from a conference URL (note the quotes around the URL):
python generate_conference_pdf.py "https://www.churchofjesuschrist.org/study/general-conference/2025/04?lang=eng"Important: Always wrap the URL in quotes to prevent shell expansion issues.
This will create a PDF file named 2025_April.pdf in the Output directory with automatic bookmarks for easy navigation.
π‘ New! See QUICK_START_BOOKMARKS.md to learn about the bookmark navigation feature.
python generate_conference_pdf.py "https://www.churchofjesuschrist.org/study/general-conference/2025/04?lang=eng" my_conference.pdfTo scrape conference data and save as JSON without generating a PDF:
python conference_scraper.py "https://www.churchofjesuschrist.org/study/general-conference/2025/04?lang=eng"This creates a JSON file with all scraped data.
If you already have scraped data in JSON format:
python pdf_generator.py conference_data.json output.pdf.
βββ generate_conference_pdf.py # Main script (scrape + generate PDF)
βββ conference_scraper.py # Web scraping module
βββ pdf_generator.py # PDF generation module
βββ requirements.txt # Python dependencies
βββ README.md # This file
βββ example/
βββ 2025_April.pdf # Example output
-
Web Scraping: The script uses the Church's API to fetch conference data
- Retrieves the main conference page
- Extracts links to individual talks
- Fetches full content for each talk
- Parses HTML to extract clean text and images
- Downloads images from the Church's servers
-
PDF Generation: Creates a formatted PDF using ReportLab
- Session divider pages for each conference session
- Individual pages for each talk with speaker and title
- Images embedded at their proper locations within talks
- Image captions with titles/descriptions
- Professional formatting and styling
python generate_conference_pdf.py "https://www.churchofjesuschrist.org/study/general-conference/2025/04?lang=eng"python generate_conference_pdf.py "https://www.churchofjesuschrist.org/study/general-conference/2024/10?lang=eng"The script generates two files:
-
PDF File: Formatted document with all talks
- Example:
2025_April.pdf
- Example:
-
JSON Data File: Raw scraped data (for debugging)
- Example:
2025_April_data.json
- Example:
If you get import errors, make sure all dependencies are installed:
pip install --upgrade reportlabIf scraping fails, check your internet connection and verify the conference URL is correct.
If PDF generation fails, check that you have write permissions in the output directory.
Edit pdf_generator.py and modify the _setup_custom_styles() method to change:
- Font sizes
- Colors
- Spacing
- Alignment
Edit conference_scraper.py to:
- Filter specific types of talks
- Extract additional metadata
- Modify text extraction logic
- The script respects the Church's website structure and uses their public API
- Scraping may take several minutes depending on the number of talks
- Generated PDFs match the format of the example in
example/2025_April.pdf
This tool is provided as-is for personal use. Please respect the Church's copyright and terms of service when using scraped content.
For issues or questions, please check:
- Python version (3.7+)
- All dependencies installed
- Valid conference URL format
- Internet connection
- Bookmarks/Outline Feature - Navigate PDFs with hierarchical bookmarks
- Image Support - How images are handled in PDFs
- Footnotes Feature - Footnote extraction and formatting
-
v1.2 (2025-10-05): PDF Bookmarks/Outline added
- Hierarchical bookmarks for sessions and talks
- Easy navigation in PDF readers
- Verification script to check bookmarks
-
v1.1 (2025-10-05): Image support added
- Images from talks are now included in PDFs
- Automatic image downloading and embedding
- Image captions with titles/descriptions
- Proper image positioning within talk content
-
v1.0 (2025-10-05): Initial release
- Web scraping functionality
- PDF generation with formatting
- Support for different conference years/sessions