Skip to content

rafa-rrayes/structured-ocr

Repository files navigation

Structured OCR

Structured OCR Screenshot

An app that uses LLMs to extract structured data from screenshots.

Download

Download the latest release - Get the pre-built .dmg file for macOS.

Installation

Option 1: Download Pre-built App

  1. Download StructuredOCR.dmg from the latest release
  2. Open the DMG file and drag StructuredOCR to your Applications folder
  3. Launch the app and configure your Gemini API key via the menu bar interface

Option 2: Run from Source

  1. Install the required dependencies:
uv sync

You also have to install a modified version of rumps that supports text input:

https://github.com/rafa-rrayes/rumps

  1. Configure your Gemini API key (either method works):

    • Create a .env file in the project root:
      GEMINI_API_KEY=your_gemini_api_key_here
      
    • Or use the menu bar app to paste your API key from clipboard
  2. Run the application:

uv run app.py

Option 3: Build from Source

For a standalone macOS application that doesn't require Python to be installed:

  1. Build the application bundle:
./build.sh
  1. The app will be created at dist/StructuredOCR.app

  2. Copy it to your Applications folder:

cp -r dist/StructuredOCR.app /Applications/
  1. Open the app from Applications and configure your API key via the menu bar interface

Usage

  1. Start the application. It will run in the menu bar.
  2. Configure settings via the menu bar:
    • Set your Gemini API key (paste from clipboard or use text field)
    • Choose your preferred model (default: gemini-flash-lite-latest)
    • Customize the hotkey (default: Ctrl + Shift + 4)
  3. Press your configured hotkey to capture a screenshot region
  4. The extracted text will be automatically copied to your clipboard
  5. Enable "Open at Login" from the menu to start the app automatically

Features

  • 🖼️ Interactive screenshot capture with crosshair selection
  • 🤖 AI-powered text extraction using Google's Gemini models
  • ⌨️ Customizable keyboard hotkeys
  • 📋 Automatic clipboard integration
  • 🔔 macOS native notifications
  • 🚀 Optional auto-start at login
  • 📦 Self-contained .app bundle (no Python installation required)

Building for Distribution

To create a DMG file for easy distribution:

# First build the app
./build.sh

# Install create-dmg (if not already installed)
brew install create-dmg

# Then create a DMG with custom layout
create-dmg \
  --volname "StructuredOCR" \
  --window-size 500 300 \
  --icon-size 120 \
  --icon "StructuredOCR.app" 100 150 \
  --app-drop-link 400 150 \
  StructuredOCR.dmg \
  dist/

Requirements

  • macOS 10.13 or later
  • Google Gemini API key
  • Accessibility permissions (granted automatically on first run)

About

A mac app for easy and structured OCR.

Resources

License

Stars

Watchers

Forks