JonaWhisper

Local-first voice-to-text dictation app. Runs in the system tray, records audio via a global hotkey, transcribes with your choice of speech recognition engine, and pastes the result into the active application.

All processing happens on your machine — no data leaves your computer unless you choose a cloud provider. API keys are stored securely in the OS keychain, never in plaintext files.

Features

Menu bar app — lives in the system tray, no dock icon
Global hotkey — push-to-talk or toggle mode, any key combination (modifier, combo, or standalone key)
6 ASR engines — Whisper, Canary, Parakeet-TDT, Qwen3-ASR, Voxtral (all local, GPU-accelerated), plus any OpenAI-compatible cloud API
24 cloud presets — OpenAI, Groq, Cerebras, Gemini, Mistral, Fireworks, Together, DeepSeek, OpenRouter, xAI, SambaNova, Nebius, Anthropic, Deepgram, GitHub Copilot, Gemini ASR, Rev.ai, AssemblyAI, ElevenLabs, Cohere, Gladia, Speechmatics, Azure Speech, Google Speech
Text cleanup pipeline — VAD silence trimming, hallucination filter, dictation commands, punctuation (BERT/PCS), grammar correction (T5), or full LLM cleanup (local/cloud)
Floating pill — real-time spectrum visualization, recording/transcribing states, cancel support
Model manager — parallel downloads with progress, pause/resume, benchmarks
Paginated history — backend-driven search, infinite scroll, processing badges
Bilingual UI — French and English

Engines

ASR (Speech-to-Text)

Engine	Params	Size	Languages	GPU	Best WER	RTF
Whisper	39M–1.55B	75 MB–3.1 GB	99	Metal	1.5%	0.05–0.50
Canary	182M	213 MB	4 (FR/EN/DE/ES)	CoreML	1.87%	0.15
Parakeet-TDT	600M	703 MB	25 European	CoreML	1.5%	0.10
Qwen3-ASR	600M	1.88 GB	30	Accelerate/AMX	2.0%	0.15
Voxtral	4.4B	8.9 GB	13	Metal	8.7%	0.40
Cloud (OpenAI API)	—	—	depends on provider	—	—	—

Recommendations: Parakeet-TDT for best accuracy (1.5% WER, 25 languages). Whisper V3 Turbo for best balance (2.1% WER, 99 languages, 0.25 RTF). Whisper Tiny for speed (75 MB, 0.05 RTF).

Punctuation & Capitalization

Engine	Params	Size	Languages	Capitalization	Speed
BERT Fullstop Large (ort)	560M	562 MB	4 (FR/EN/DE/IT)	No	~100ms
BERT Fullstop Base (Candle)	280M	1.1 GB	5 (+ NL)	No	~80ms
PCS 47 Languages (ort)	230M	233 MB	47	Yes (4 heads)	~50ms

Recommendation: PCS — smaller, faster, 47 languages, native capitalization.

Grammar & Spelling Correction

Engine	Params	Size	Languages	Speed
GEC T5 Small	60M	96 MB	11 (multilingual)	~200ms
T5 Spell FR	220M	276 MB	FR	~500ms
FlanEC Base	250M	276 MB	EN	~500ms
FlanEC Large	800M	821 MB	EN	~1s

All run via ONNX Runtime with CoreML, autoregressive decoding with repeat penalty and n-gram blocking.

Local LLM (Text Cleanup)

Model	Params	Size	Languages
Qwen3 0.6B	0.6B	484 MB	FR/EN/ES/DE
Gemma 3 1B	1.0B	806 MB	FR/EN/ES/DE
Qwen3 1.7B	1.7B	1.28 GB	FR/EN/ES/DE
Ministral 3B	3.0B	2.15 GB	FR/EN/ES/DE
Qwen3 4B	4.0B	2.50 GB	FR/EN/ES/DE

All GGUF Q4 quantized, run via llama.cpp with Metal GPU. 11 models available total.

Processing pipelines

Audio pipeline

Mic (cpal) → WAV 16 kHz → VAD (Silero v6.2) → Trim silence → ASR

Stage	Status	Description
VAD (Silero)	Done	Discards silent recordings, trims leading/trailing silence
Denoising	Planned	Hybrid approach: denoised for VAD boundaries, original for ASR
Device presets	Planned	Per-mic gain, noise gate, normalization

See docs/AUDIO-PIPELINE.md for the full architecture.

Text pipeline

ASR raw → Hallucination filter → Dictation commands → Disfluency removal → Punctuation → Spell-check → Correction/LLM → Finalize → ITN → Paste

Stage	Status	Description
Hallucination filter	Done	30+ regex patterns (FR/EN)
Dictation commands	Done	Voice commands → punctuation ("virgule" → ",")
Disfluency removal	Done	Strip fillers (euh, uh, um) — regex-based, FR/EN
Punctuation	Done	BERT or PCS token classification
Correction	Done	T5 encoder-decoder (grammar, spelling)
LLM cleanup	Done	Local (llama.cpp) or cloud (OpenAI/Anthropic)
ITN	Done	Inverse text normalization — numbers, ordinals, %, hours, currencies, units (FR/EN)
Finalize	Done	Spacing, capitalization

See docs/TEXT-PIPELINE.md for the full architecture.

Requirements

macOS 14.0+ (Apple Silicon)
Rust (stable)
Node.js 24+
Xcode Command Line Tools (xcode-select --install)

Build

./build.sh
open build/JonaWhisper.app

The build script installs npm dependencies automatically, then produces build/JonaWhisper.app and build/JonaWhisper.dmg. Tauri handles code signing automatically — if a Developer certificate is available (via APPLE_SIGNING_IDENTITY), the app is signed with hardened runtime and entitlements. Notarization is supported via APPLE_ID, APPLE_PASSWORD, and APPLE_TEAM_ID environment variables.

For a debug build:

./build.sh debug

Development

npm install
npm run tauri dev

This starts the Vite dev server with hot reload for the frontend. Rust changes trigger a rebuild automatically. Note: npm install is only needed once for dev mode — build.sh handles it automatically for production builds.

Permissions

On first launch, a setup wizard asks for three macOS permissions:

Permission	Used for	macOS API
Microphone	Audio recording	AVCaptureDevice authorization
Accessibility	Paste simulation (Cmd+V via CGEvent)	AXIsProcessTrusted
Input Monitoring	Global hotkey detection (CGEvent tap)	TCC ListenEvent

Usage

Launch the app — it appears as a menu bar icon
Press and hold the hotkey (default: Right Command) to record
Release to transcribe — the text is pasted into the active app
In toggle mode, press once to start, press again to stop

Cancel: Press Escape at any time to cancel recording or transcription.

Settings: Open from the tray menu to configure language, ASR model, text cleanup, hotkey, microphone, and cloud providers.

Models are downloaded and managed from within the app. All models are stored in ~/Library/Application Support/JonaWhisper/models/.

Tech stack

Layer	Technologies
Framework	Tauri 2
Backend	Rust
Frontend	Vue 3, TypeScript, Pinia, Tailwind CSS, shadcn-vue
Audio	cpal + hound (recording), rustfft (spectrum), CoreAudio FFI (ducking)
ASR	whisper-rs (Metal), ort + CoreML (Canary, Parakeet), qwen-asr (AMX), voxtral.c (Metal)
Text cleanup	ort + CoreML (BERT, PCS, T5 correction), candle (BERT Metal), llama-cpp-2 (local LLM Metal)
Icons	SDF (Signed Distance Field) in Rust, RGBA bitmaps — zero image dependencies
Hotkey	Raw CGEvent tap (CoreGraphics FFI)
Permissions	objc2 (AVFoundation, CoreGraphics, ApplicationServices)
i18n	vue-i18n (frontend), rust-i18n (backend)

See docs/DEPENDENCIES.md for the full dependency list with rationale for each choice.

Architecture

JonaWhisper follows a thin orchestrator design. The main Tauri crate (src-tauri/src/) contains no engine logic nor model definitions — it orchestrates infrastructure crates and independent engine crates via dynamic dispatch.

Workspace crates (26 total):

Crate	Role
`jona-types`	Shared types (`ASREngine` trait, `ASRModel`, `EngineError`, `Preferences`, `Provider`)
`jona-engines`	Infrastructure: `EngineCatalog`, downloader, ort session builder, mel features
`jona-platform`	OS-specific code: hotkey (CGEvent tap), permissions, paste, audio devices, ducking
`jona-provider` + `jona-provider-*` (13)	Cloud provider backends (OpenAI-compatible, Anthropic, Deepgram, Copilot, Gemini ASR, Rev.ai, AssemblyAI, ElevenLabs, Cohere, Gladia, Speechmatics, Azure Speech, Google Speech)
`jona-engine-*` (11)	One crate per engine: whisper, canary, parakeet, qwen, voxtral, llama, bert, pcs, correction, spellcheck, lm

Plug-and-play engines: each engine crate registers itself via inventory::submit! at link time. The main crate calls EngineCatalog::init_auto() at startup — no hardcoded engine list, no re-exports. Adding an engine = add a crate + cargo dependency, zero changes to the orchestrator.

Dynamic dispatch: all ASR and cleanup operations go through the ASREngine trait. The orchestrator resolves models by ID, obtains the engine, and calls transcribe() or cleanup() — it never knows which engine it is talking to.

Project structure

src/                       Vue frontend
  views/                   Pages (Panel, SetupWizard)
  sections/                Settings sections (10 sections)
  components/              UI components
  stores/                  Pinia stores (app, history, settings, engines, downloads)
  utils/                   Shared utilities (shortcut, formatting)
  i18n/                    Translations (en.json, fr.json)
src-tauri/                 Rust backend (thin orchestrator)
  src/
    lib.rs                 Tauri setup & app lifecycle (10 modules)
    state.rs               App state & persistent preferences
    recording/             Recording lifecycle, transcription pipeline, thread spawning
    commands/              Tauri IPC commands (7 sub-modules: audio, engines, history, ...)
    audio.rs               cpal recording & FFT
    cleanup/               Text cleanup (bert, candle, pcs, t5, vad, llm, post_processor)
    platform/              macOS-specific (hotkey, permissions, paste, audio devices)
    ui/                    Native UI (tray, pill overlay, SDF icons)
  voxtral-c/               Vendored voxtral.c sources (C + Metal)
crates/                    Workspace crates (types, engines, platform, 13 provider crates, 11 engine crates)
docs/                      Technical documentation
  AUDIO-PIPELINE.md        Audio preprocessing architecture & roadmap
  TEXT-PIPELINE.md         Text postprocessing architecture & roadmap
  BENCHMARK.md             Full benchmark data (ASR, LLM, punctuation, correction)
build.sh                   Build + package script (signing handled by Tauri)

See ARCHITECTURE.md for the full architecture guide with data flows, threading model, and module responsibilities.

Contributing

See CONTRIBUTING.md for commit conventions, development setup, and how to submit PRs.

License

This project is licensed under the GNU General Public License v3.0 or later.

Name		Name	Last commit message	Last commit date
Latest commit History 887 Commits
.github		.github
.vscode		.vscode
docs		docs
scripts		scripts
src-tauri		src-tauri
src		src
.gitignore		.gitignore
ARCHITECTURE.md		ARCHITECTURE.md
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
TODO.md		TODO.md
build.sh		build.sh
cliff.toml		cliff.toml
components.json		components.json
histoire.config.ts		histoire.config.ts
index.html		index.html
package-lock.json		package-lock.json
package.json		package.json
postcss.config.js		postcss.config.js
social-preview.svg		social-preview.svg
tailwind.config.ts		tailwind.config.ts
tsconfig.app.json		tsconfig.app.json
tsconfig.json		tsconfig.json
tsconfig.node.json		tsconfig.node.json
vite.config.ts		vite.config.ts
vitest.config.ts		vitest.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

JonaWhisper

Features

Engines

ASR (Speech-to-Text)

Punctuation & Capitalization

Grammar & Spelling Correction

Local LLM (Text Cleanup)

Processing pipelines

Audio pipeline

Text pipeline

Requirements

Build

Development

Permissions

Usage

Tech stack

Architecture

Project structure

Contributing

License

About

Uh oh!

Releases

Sponsor this project

Uh oh!

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

JonaWhisper

Features

Engines

ASR (Speech-to-Text)

Punctuation & Capitalization

Grammar & Spelling Correction

Local LLM (Text Cleanup)

Processing pipelines

Audio pipeline

Text pipeline

Requirements

Build

Development

Permissions

Usage

Tech stack

Architecture

Project structure

Contributing

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Sponsor this project

Uh oh!

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages