ocrAI Workspace

OCR workspace for scanned PDFs and images. Extract text, clean it up, review page by page, organize documents, and export polished output.

Cloud or local OCR. Use Gemini, LM Studio, or Ollama. Autodetect installed local models, choose the default provider in settings, and keep working in one review workflow.

Overview

ocrAI is built for documents that need more than a raw OCR dump. It combines OCR extraction, cleanup, review, editing, organization, and export in a single workspace.

The app is especially useful for:

scanned books and articles
lecture notes and handwritten or mixed-source documents
research PDFs that need cleanup before reading
archives that need folder organization, labels, and reprocessing
users who want local OCR with LM Studio or Ollama instead of sending files to a cloud provider

Screenshots


Dashboard Browse your library, search documents, filter by label, status, date, or folder, and manage read state, renaming, moving, reprocessing, and deletion.	Reader + Editor Review the original page beside the cleaned transcription, reprocess a page or the full document, and export the final result once it is ready.

Settings
Manage OCR behavior, reusable prompts, labels, and model configuration.

What It Does

OCR and extraction

OCR workflows for PDFs and images
Gemini, LM Studio, and Ollama support
local model autodetection for LM Studio and Ollama
configurable host and port for both local providers
shared OCR prompt rules for paragraph reconstruction, de-hyphenation, and multi-column reading order
per-page and full-document reprocessing

Review and cleanup

side-by-side page preview and cleaned transcription
rich text editing
clean document reconstruction instead of line-by-line OCR noise
optional full-text search through OCR text and saved edits

Organization

folders and nested navigation
document labels
read and unread state
rename, move, and delete actions
automatic AI labeling for new documents
filters by label, status, date, and folder

Export

TXT
HTML
EPUB
PDF
ZIP batch export

Workspace and persistence

login-protected workspace
Redis-backed session handling
filesystem persistence for processed document assets and metadata

OCR Providers

ocrAI supports three OCR backends:

Provider	Type	Notes
`Gemini`	Cloud	Best when you want a managed OCR model and already have a `GEMINI_API_KEY`.
`LM Studio`	Local	OpenAI-compatible local endpoint. Configure host and port, autodetect installed models, and choose one as the default OCR model.
`Ollama`	Local	Local chat API. Configure host and port, autodetect installed models, and select a vision-capable model for OCR tasks.

Local provider setup

Open Settings.
Go to AI > Models.
Choose the active OCR provider.
For LM Studio or Ollama, enter host and port.
Click Autodetect to load installed local models.
Select the default OCR model for that provider.
Save OCR settings.

Default local endpoints:

LM Studio: 127.0.0.1:1234
Ollama: 127.0.0.1:11434

Tech Stack

React 18 + TypeScript
Vite
Express 5
Redis for session state and login protection
@google/genai for Gemini OCR
local provider integration for LM Studio and Ollama
filesystem storage under data/

Local Development

Requirements

Node.js 20+
Redis
a Gemini API key if you want to use the Gemini provider
optional local OCR runtime: LM Studio or Ollama

Environment

Create a .env.local file in the project root:

ADMIN_USERNAME=admin
ADMIN_PASSWORD=change-me
REDIS_URL=redis://localhost:6379

# Required only if you want to use Gemini
GEMINI_API_KEY=your-gemini-api-key

# Optional
PORT=5037
CORS_ORIGIN=http://localhost:5173
TRUST_PROXY=false

Notes:

You can use ADMIN_PASSWORD_HASH instead of ADMIN_PASSWORD.
If you plan to work only with LM Studio or Ollama, the Gemini key is optional.

Start the app

Install dependencies:
```
npm install
```

Start Redis:

docker run -d --name ocrai-redis -p 6379:6379 redis:alpine

Or use your local Redis installation.

Start the backend API:
```
node server.js
```
In a second terminal, start the frontend:
```
npm run dev
```
Open http://localhost:5173.

Vite proxies /api requests to http://localhost:5037.

Production Build

Build the frontend and run the bundled server:

npm run build
npm start

The production server listens on 5037 by default.

Docker

Run with Docker Compose

docker compose up --build

This starts:

redis
ocrAI on http://127.0.0.1:5039

Build and run manually

docker build -t drakonis96/ocrai:local .
docker run -p 5037:5037 --env-file .env.local -v "$(pwd)/data:/app/data" drakonis96/ocrai:local

Pull the published image

docker pull drakonis96/ocrai:latest

Testing

Run the test suite:

npm test

Run a production build check:

npm run build

Data Layout

Runtime data is stored under data/.

Typical contents include:

data/<docId>/metadata.json
page images for each uploaded document
generated page markdown
persisted prompt, label, model, and OCR settings JSON files

This makes the workspace easy to inspect and back up.

Project Structure

components/    React UI
services/      backend services and OCR integrations
utils/         storage and shared helpers
tests/         Vitest test suite
screenshots/   README screenshots
images/        branding assets
data/          runtime document storage

Troubleshooting

Cannot log in: verify ADMIN_USERNAME and either ADMIN_PASSWORD or ADMIN_PASSWORD_HASH.
Sessions fail or login is unstable: verify that Redis is running and that REDIS_URL is correct.
Uploads or OCR requests fail with Gemini: verify GEMINI_API_KEY and model access.
Local OCR autodetect fails: verify the provider is running and that host and port in Settings match the local server.
Files are processed but the UI cannot load them: ensure data/ exists and is writable by the app.

License

This repository does not currently declare a license file.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
components		components
images		images
public		public
screenshots		screenshots
services		services
tests		tests
utils		utils
.dockerignore		.dockerignore
.gitignore		.gitignore
App.tsx		App.tsx
CHANGELOG.md		CHANGELOG.md
Dockerfile		Dockerfile
README.md		README.md
constants.ts		constants.ts
docker-compose.yml		docker-compose.yml
index.css		index.css
index.html		index.html
index.tsx		index.tsx
metadata.json		metadata.json
package-lock.json		package-lock.json
package.json		package.json
postcss.config.cjs		postcss.config.cjs
server.js		server.js
tailwind.config.cjs		tailwind.config.cjs
tsconfig.json		tsconfig.json
tsconfig.node.json		tsconfig.node.json
types.ts		types.ts
vite.config.ts		vite.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ocrAI Workspace

Overview

Screenshots

What It Does

OCR and extraction

Review and cleanup

Organization

Export

Workspace and persistence

OCR Providers

Local provider setup

Tech Stack

Local Development

Requirements

Environment

Start the app

Production Build

Docker

Run with Docker Compose

Build and run manually

Pull the published image

Testing

Data Layout

Project Structure

Troubleshooting

License

About

Uh oh!

Releases 23

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ocrAI Workspace

Overview

Screenshots

What It Does

OCR and extraction

Review and cleanup

Organization

Export

Workspace and persistence

OCR Providers

Local provider setup

Tech Stack

Local Development

Requirements

Environment

Start the app

Production Build

Docker

Run with Docker Compose

Build and run manually

Pull the published image

Testing

Data Layout

Project Structure

Troubleshooting

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 23

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages