An app that uses LLMs to extract structured data from screenshots.
Download the latest release - Get the pre-built .dmg file for macOS.
- Download
StructuredOCR.dmgfrom the latest release - Open the DMG file and drag StructuredOCR to your Applications folder
- Launch the app and configure your Gemini API key via the menu bar interface
- Install the required dependencies:
uv syncYou also have to install a modified version of rumps that supports text input:
https://github.com/rafa-rrayes/rumps
-
Configure your Gemini API key (either method works):
- Create a
.envfile in the project root:GEMINI_API_KEY=your_gemini_api_key_here - Or use the menu bar app to paste your API key from clipboard
- Create a
-
Run the application:
uv run app.pyFor a standalone macOS application that doesn't require Python to be installed:
- Build the application bundle:
./build.sh-
The app will be created at
dist/StructuredOCR.app -
Copy it to your Applications folder:
cp -r dist/StructuredOCR.app /Applications/- Open the app from Applications and configure your API key via the menu bar interface
- Start the application. It will run in the menu bar.
- Configure settings via the menu bar:
- Set your Gemini API key (paste from clipboard or use text field)
- Choose your preferred model (default: gemini-flash-lite-latest)
- Customize the hotkey (default: Ctrl + Shift + 4)
- Press your configured hotkey to capture a screenshot region
- The extracted text will be automatically copied to your clipboard
- Enable "Open at Login" from the menu to start the app automatically
- 🖼️ Interactive screenshot capture with crosshair selection
- 🤖 AI-powered text extraction using Google's Gemini models
- ⌨️ Customizable keyboard hotkeys
- 📋 Automatic clipboard integration
- 🔔 macOS native notifications
- 🚀 Optional auto-start at login
- 📦 Self-contained .app bundle (no Python installation required)
To create a DMG file for easy distribution:
# First build the app
./build.sh
# Install create-dmg (if not already installed)
brew install create-dmg
# Then create a DMG with custom layout
create-dmg \
--volname "StructuredOCR" \
--window-size 500 300 \
--icon-size 120 \
--icon "StructuredOCR.app" 100 150 \
--app-drop-link 400 150 \
StructuredOCR.dmg \
dist/- macOS 10.13 or later
- Google Gemini API key
- Accessibility permissions (granted automatically on first run)
