A comprehensive UI for basic AI VTuber setup, geared with modules to allow natural operations across modalities and apps.
Has support for:
reading Twitch chat
TTS(Chatterbox, SoVITS)
Discord(with navigation, channel reading, voice chat(both STT and TTS), user DMs, user memory)
Memory(long term memory in FAISS database with doubled plaintext content for user interaction(suggestion: Obsidian.md), mid-term memory management(Condensed session context)), proactive memory insertion, passive memory access.
Motion(Integration with VTube Studio, motion presets, TTS motion insertion, direct triggers, idle motion, blend, variational motion)
Browsing(UnderectedChromeDriver session, unrestricted navigation through tools, integration with vision module, DuckDuckGo search content retrieval)
Vision(Simple vision integration through VLM API)
Hearing(Direct interaction with local User, interruption on speech)
Moderation(Optionally block words, filter speech and text commands contents, LLM moderation if needed.)
Jailbreak(Insert conversation turns at start, replace rejection context)
Additional features include Streaming Mode, in which model is being constantly re-initialized, to continue independent operation.
Specialized vision was never started, and is just a placeholder.(Was meant for specialized things, like playing games)
As of now, this is final version, as im moving on to start Orchestrator 2.0, to make it better from groud-up, to allow better scalability and ease of integration.
- Multi-provider LLM router with automatic key rotation (Groq, Cerebras, Gemini) and model aliasing.
- Extensible module system covering chat ingestion, voice synthesis, motion control, memory, Discord automation, web browsing, computer vision, and speech detection.
- MCP-like tool system so models can trigger actions (
tool:Say,tool:Motion,tool:Visit,tool:See,tool:Recall, etc.). You can find all tools in Glossary Tab. - Electron dashboard with live feed, module controls, logging browser, motion designer, memory inspector, moderation settings, and glossary.
- Subtitles/overlay server (SSE) for piping feed messages into broadcast software.
- Persistent settings with migration support, automatic prompt/memory backups, and detailed feed logging for post-run analysis.
┌────────────────────────────┐ ┌──────────────────────┐
│ Electron UI (TypeScript) │ <─────> │ Python STDIO bridge │
│ tabs, overlays, logging │ │ (hub.bridge.stdio) │
└────────────┬──────────────┘ └─────────────┬────────┘
│ JSON-RPC │
┌────────────▼──────────────┐ ┌──────────────────────┐
│ Orchestrator core │ │ ModuleManager / │
│ character, runtime, feed │◄──────►│ modules.* │
│ provider routing, tools │ │ chat, voice, motion… │
└───────────────────────────┘ └──────────────────────┘
Key parts:
- backend/ – Python package
hubwith the orchestrator core, module implementations, provider clients, settings, and bridge server. - electron/ – Electron app (main + renderer) that talks to the backend over JSON-RPC, renders the UI, and serves overlay assets.
- keys/ – API credential files read by
KeyPool(not tracked in git). - logging/ & backup/ – runtime artefacts written by the application (feed logs, prompt snapshots, etc.).
- Python: 3.10+ (tested on 3.10). Recommended to manage with
pyenv,conda, orvenv. - Node.js: 20+ (Electron 38).
- Build tools for native dependencies (FAISS, PyAudio/sounddevice) appropriate for your OS.
- Optional GPU stack:
- CUDA/cuDNN (for PyTorch + SoVITS).
sounddevice,pyaudio, or platform specific audio backends.
Tip: On Windows install the Visual Studio Build Tools and the latest NVIDIA Toolkit if you intend to use GPU-backed modules.
git clone https://github.com/your-org/VT-Orchestrator.git
cd VT-Orchestratorcd backend
python -m venv .venv
# Windows
.venv\Scripts\activate
# macOS / Linux
source .venv/bin/activate
pip install --upgrade pip wheel
pip install -r requirements.txt
pip install -r requirements-tts.txt # voice playback, optional
pip install -e . # install the hub package (pyproject)Optional extras:
pip install torch torchvision torchaudio(CUDA/MPS builds) for Chatterbox or SoVITS acceleration.pip install sounddevicefor audio playback output selection.pip install sentence-transformers faiss-cpuif the default install skipped heavy packages.pip install undetected-chromedriver pillow mssfor browsing + vision modules.
The backend loads modules lazily—install only the packages you need. Missing dependencies will be logged as warnings and the affected module will stay disabled.
cd .. # back to repo root
npm install
npm run build # compiles main + renderer TypeScriptDuring development:
npm run dev # build then launch ElectronFor a quick run without rebuilding:
npm run start- Ensure your Python virtual environment is activated (for tests/CLI tools).
- Populate API keys (see Configuration & Secrets).
- From the repository root run
npm run dev. The script compiles TypeScript, spawns Electron, and the Electron worker launches the Python backend viabackend/.venvif present. - The main window opens with the Feed tab. Modules can be toggled/enabled from the “Modules” tab. Logging, motion tuning, and memory administration live in dedicated tabs.
- Optional overlay/subtitle view is served at
http://localhost:3033/subtitles.
The backend process can also be started manually (useful for debugging):
cd backend
python -m hub.bridge.stdio_serverKeyPool reads plaintext files from the keys/ directory (or HUB_KEYS_DIR). Each file contains one key per line:
| Provider | File | Notes |
|---|---|---|
| Groq | keys/groq_api.txt |
additional pools: groq_api-motion.txt used by Motion voice annotations |
| Cerebras | keys/cerebras_api.txt |
|
| Google Gemini | keys/gemini_api.txt |
Keys rotate automatically when rate-limited/invalid. Keep per-provider keys on separate lines.
- Default template:
backend/hub/settings.json - User overrides:
backend/hub/settings.user.json(auto-created) - Backups (system prompts, voice prompts):
backup/ - Feed logging sessions:
logging/
Set HUB_SETTINGS_FILE to redirect settings to a custom location.
PYTHON: override interpreter used by Electron’s bridge (else.venvis auto-detected).HUB_KEYS_DIR: point to an alternative credentials directory.VTS_WS_URLorVTS_PORT: override VTube Studio endpoint for the Motion module.NODE_BIN: force Node binary used by the Discord module subprocess.
The Discord module persists its bot token under backend/hub/modules/discord/auth.py helpers. Tokens are stored in the module directory and picked up by the subprocess.
Modules live under backend/hub/modules. The orchestrator auto-discovers them and exposes enable/disable + configuration endpoints.
orchestrator.py– session state, queue management, streaming/resume, wait/stop, and tool execution loop.module_manager.py– discovery, lifecycle, autopush loop management, tool registration.provider_router.py– round-robin streaming across Groq/Cerebras/Gemini with failure detection and token accounting.model_registry.py– universal alias resolution (universal.text.fast, etc.).tool_manager– built-in tools (ChatLast,CommandHelp,Wait,Say,DebugAPI) and module-registered helpers.settings.py– persisted runtime + module configs with migrations, default injection, and volatile stripping (Motion presets).
- Connects to Twitch IRC over TLS (no external deps).
- Maintains recent history + backlog for LLM injection.
- Autopush support: automatically submits curated chat excerpts when idle.
- Configurable channel, autopush cadence, backlog pruning.
- Wraps Chatterbox TTS out of the box and can integrate with GPT-SoVITS when supplied with the custom forked weights/code expected by
modules/voice/sovits(not bundled). - Handles queuing, sentence chunking, playback to primary/secondary audio devices, optional subtitle emission, and integration with the Motion module for voice-driven gestures.
playback.pylinks to Motion for “voice emotion” cues, handles Discord TTS queue bridging, throttle-on-speech, and autop scheduling.runtime.pyresolves devices, manages backend/model lifecycle, lazy loads dependencies, and restarts on config changes.voicebank/hosts reference WAVs; SoVITS weights live undermodules/voice/sovits.
- Controls VTube Studio via WebSocket (port 8012 by default).
- Supports manual preset triggering (
tool:Motion("Preset")), idle animations, motion blending, and optional LLM-driven gestures tied to voice transcripts. - Stores presets (
presets.json), parameter metadata (parameters.json), and idle state snapshots (idle_state.json). - Separate key pool (
-motionsuffix) for the micro-agent that decides when to fire motions. - CLI helpers (
backend/tools/vts_dump_params.py,vts_set_param.py) ease VTS configuration.
- Local vector store using SentenceTransformers embeddings + FAISS (
IndexFlatIP). - Markdown export support, version tagging, manual + automatic memorization workflows, and mid-term session summaries.
- Tools exposed:
tool:Recall("topic", k, "version")tool:Memorize("text")/tool:Forget(...)tool:RecentMemoriestool:Memory
- Configurable summarizer/proposer models, autop proposals based on word counts, and markdown folder integration for knowledge bases.
- Storage files default to
backend/hub/data/memory_*.
- Spawns
bot.js(discord.js v14) with voice support via@discordjs/voice. - Handles guild listing, channel joins, message send/receive, voice capture, DM notifications, and integrates with Memory (Discord-specific embeddings).
- Provides a broad suite of tools (
tool:DiscordMsg,tool:ChannelRecent,tool:DiscordServers,tool:JoinVoice,tool:LeaveChannel, DM helpers, notification management). - Maintains autopush backlog (server + DM) and can auto-memorize Discord user interactions.
- Uses Node subprocess environment variables for intents, token, application/guild IDs.
- Records raw audio for transcripts and can feed into voice playback queue.
- Launches a visible Chrome via
undetected-chromedriver. - Tools:
tool:Search,tool:Visit,tool:ScrollPage,tool:ViewPage. - Fetches HTML search results via DuckDuckGo’s light endpoint; integrates with Vision module to describe the current tab screenshot.
- General Vision (
vision_general.py):- Captures screenshots via
mss/Pillow, resizes/compresses to target size, and sends to configured multimodal model (Groq Scout/Maverick or Gemini Flash/Pro). - Reports token usage back to orchestrator stats.
- Exposed tool
tool:See("prompt").
- Captures screenshots via
- Specialized Vision (
vision_special.py):- Lightweight hint injector for UI-specific prompts (disabled by default).
- Captures microphone input using
sounddevice, applies VAD (faster-whisper.vad), and transcribes segments withfaster-whisper(distil-large-v3by default). - Detects user speech and triggers orchestrator to pause/stop TTS playback to avoid overlap.
- Supports “express” cut-offs for fast speech detection and exposes input level telemetry to the UI.
- Moderates outgoing tool commands (default:
Say,DiscordMsg). - Two-stage filter:
- Stop-list to replace offending sentences with
[Filtered]. - Optional LLM rewrite with custom system prompt + provider preference.
- Stop-list to replace offending sentences with
- UI configurables include command list, stop phrases, alternative rewrite text.
- Injects synthetic user/assistant turns at the start of a conversation to shape behaviour.
- Optional apology/refusal rewrite: replaces early “I’m sorry, but…” responses with user-defined text.
- Tool definitions rely on regex detection and the core tool manager, injecting system blocks back into the assembled output. Each module exposes a
get_toolsfunction that returnsToolinstances.
- PyQt-based experimental UI for Chatterbox TTS (kept for reference). Modern setups should rely on the new
voicemodule.
- Main process (
electron/main):main.ts– creates the BrowserWindow.ipc.ts– bridges renderer IPC to backend JSON-RPC, starts overlay server (port 3033) and feed logger.pythonBridge.ts– spawns the Python backend (hub.bridge.stdio_server), auto-detects.venv, and routes requests.overlayServer.ts– static file + server-sent events for subtitles.feedLogger.ts– session-based feed logging with aggregation.
- Renderer (
electron/renderer):renderer.tsorchestrates tabs and initial data loading.feed.tsrenders multi-column feed panels including module-specific views.modules.tsprovides module toggle UI + config editors (Motion idle tuning, Memory paths, Moderation toggles, Vision model selection, etc.).- Dedicated tabs for Character (
character.ts), Motion (motion.ts), Memory (memory.ts), Moderation (moderation.ts), Glossary (glossary.ts), and Logging (logging.ts). - Styles:
styles.css, HTML scaffoldindex.html.
- Overlay (
electron/overlay/subtitles.html) – simple HTML/CSS page that subscribes to/events.
VT-Orchestrator/
├─ backend/
│ ├─ hub/
│ │ ├─ core/ # orchestrator, providers, settings, tools
│ │ ├─ modules/ # module implementations
│ │ ├─ bridge/ # stdio JSON-RPC server
│ │ ├─ config/models.json # model alias map
│ │ └─ settings*.json # defaults + user overrides
│ ├─ tools/ # VTS helper scripts
│ ├─ tests/ # pytest suite
│ ├─ pyproject.toml # backend package metadata
│ └─ requirements*.txt
├─ electron/
│ ├─ main/ # Electron main process code
│ ├─ renderer/ # UI logic
│ ├─ overlay/ # subtitle page
│ └─ tsconfig.*.json
├─ scripts/start-electron.js # launches Electron with correct env
├─ keys/ # place API keys here (ignored by git)
├─ logging/ # generated feed logs
├─ backup/ # prompt/memory backups
└─ README.md
Backend tests live under backend/tests. Activate the virtual environment first, then run:
cd backend
pytestTests cover module discovery, provider routing, settings migrations, voice-motion coordination, and orchestrator helpers. Additions should include targeted tests to keep regression safety high.
- Missing dependency warnings: check module-specific requirements (e.g.,
pip install pillow mssfor Vision,pip install undetected-chromedriverfor Browsing,pip install faster-whisperfor Hearing). - No keys available: verify
keys/*.txtfiles exist and contain valid API keys. The UI exposes counts under the Stats tab. - Motion module not connecting: Confirm VTube Studio has the WebSocket API enabled (port 8012) and that
vts_auth_token.txtexists. Use the helper scripts inbackend/tools. - Audio output not found: Install
sounddeviceand ensure the chosen device index matches the OS audio device list. The Voice tab lists raw device info. - Discord module crashes immediately: Ensure the Node runtime has the necessary voice dependencies and the bot token/intents are configured. Review logs under
logging/module-discord*.txt. - Gemini requests fail: Place the API key in
keys/gemini_api.txtand confirm the selected vision model starts withgemini-. - Re-running after config edits: Delete problematic entries from
settings.user.jsonor setHUB_SETTINGS_FILEto a clean path; the registry will fall back to defaults. - Overlay not updating: Confirm Electron main process launched
overlayServer(logged in console). Refreshhttp://localhost:3033/subtitles.
- Tweak module defaults via the UI and persist to
settings.user.json. - Extend functionality by adding new modules: drop a package under
backend/hub/modules, exposeMODULE_CLASS, and the orchestrator will auto-discover it. - Explore feed logs in
logging/to fine-tune prompts, moderation, and motion presets.
Happy streaming!