OCR workspace for scanned PDFs and images. Extract text, clean it up, review page by page, organize documents, and export polished output.
Cloud or local OCR. Use Gemini, LM Studio, or Ollama. Autodetect installed local models, choose the default provider in settings, and keep working in one review workflow.
ocrAI is built for documents that need more than a raw OCR dump. It combines OCR extraction, cleanup, review, editing, organization, and export in a single workspace.
The app is especially useful for:
- scanned books and articles
- lecture notes and handwritten or mixed-source documents
- research PDFs that need cleanup before reading
- archives that need folder organization, labels, and reprocessing
- users who want local OCR with
LM StudioorOllamainstead of sending files to a cloud provider
Settings
Manage OCR behavior, reusable prompts, labels, and model configuration.
- OCR workflows for PDFs and images
Gemini,LM Studio, andOllamasupport- local model autodetection for
LM StudioandOllama - configurable host and port for both local providers
- shared OCR prompt rules for paragraph reconstruction, de-hyphenation, and multi-column reading order
- per-page and full-document reprocessing
- side-by-side page preview and cleaned transcription
- rich text editing
- clean document reconstruction instead of line-by-line OCR noise
- optional full-text search through OCR text and saved edits
- folders and nested navigation
- document labels
- read and unread state
- rename, move, and delete actions
- automatic AI labeling for new documents
- filters by label, status, date, and folder
- TXT
- HTML
- EPUB
- ZIP batch export
- login-protected workspace
- Redis-backed session handling
- filesystem persistence for processed document assets and metadata
ocrAI supports three OCR backends:
| Provider | Type | Notes |
|---|---|---|
Gemini |
Cloud | Best when you want a managed OCR model and already have a GEMINI_API_KEY. |
LM Studio |
Local | OpenAI-compatible local endpoint. Configure host and port, autodetect installed models, and choose one as the default OCR model. |
Ollama |
Local | Local chat API. Configure host and port, autodetect installed models, and select a vision-capable model for OCR tasks. |
- Open
Settings. - Go to
AI > Models. - Choose the active OCR provider.
- For
LM StudioorOllama, enter host and port. - Click
Autodetectto load installed local models. - Select the default OCR model for that provider.
- Save OCR settings.
Default local endpoints:
LM Studio:127.0.0.1:1234Ollama:127.0.0.1:11434
React 18+TypeScriptViteExpress 5Redisfor session state and login protection@google/genaifor Gemini OCR- local provider integration for
LM StudioandOllama - filesystem storage under
data/
Node.js 20+Redis- a Gemini API key if you want to use the Gemini provider
- optional local OCR runtime:
LM StudioorOllama
Create a .env.local file in the project root:
ADMIN_USERNAME=admin
ADMIN_PASSWORD=change-me
REDIS_URL=redis://localhost:6379
# Required only if you want to use Gemini
GEMINI_API_KEY=your-gemini-api-key
# Optional
PORT=5037
CORS_ORIGIN=http://localhost:5173
TRUST_PROXY=falseNotes:
- You can use
ADMIN_PASSWORD_HASHinstead ofADMIN_PASSWORD. - If you plan to work only with
LM StudioorOllama, the Gemini key is optional.
-
Install dependencies:
npm install
-
Start Redis:
docker run -d --name ocrai-redis -p 6379:6379 redis:alpine
Or use your local Redis installation.
-
Start the backend API:
node server.js
-
In a second terminal, start the frontend:
npm run dev
-
Open http://localhost:5173.
Vite proxies /api requests to http://localhost:5037.
Build the frontend and run the bundled server:
npm run build
npm startThe production server listens on 5037 by default.
docker compose up --buildThis starts:
redisocrAIonhttp://127.0.0.1:5039
docker build -t drakonis96/ocrai:local .
docker run -p 5037:5037 --env-file .env.local -v "$(pwd)/data:/app/data" drakonis96/ocrai:localdocker pull drakonis96/ocrai:latestRun the test suite:
npm testRun a production build check:
npm run buildRuntime data is stored under data/.
Typical contents include:
data/<docId>/metadata.json- page images for each uploaded document
- generated page markdown
- persisted prompt, label, model, and OCR settings JSON files
This makes the workspace easy to inspect and back up.
components/ React UI
services/ backend services and OCR integrations
utils/ storage and shared helpers
tests/ Vitest test suite
screenshots/ README screenshots
images/ branding assets
data/ runtime document storage
- Cannot log in:
verify
ADMIN_USERNAMEand eitherADMIN_PASSWORDorADMIN_PASSWORD_HASH. - Sessions fail or login is unstable:
verify that Redis is running and that
REDIS_URLis correct. - Uploads or OCR requests fail with Gemini:
verify
GEMINI_API_KEYand model access. - Local OCR autodetect fails:
verify the provider is running and that host and port in
Settingsmatch the local server. - Files are processed but the UI cannot load them:
ensure
data/exists and is writable by the app.
This repository does not currently declare a license file.



