Skip to content

Anzhc/VT-Orchestrator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

VT-Orchestrator

A comprehensive UI for basic AI VTuber setup, geared with modules to allow natural operations across modalities and apps.
Has support for:
reading Twitch chat
TTS(Chatterbox, SoVITS)
Discord(with navigation, channel reading, voice chat(both STT and TTS), user DMs, user memory)
Memory(long term memory in FAISS database with doubled plaintext content for user interaction(suggestion: Obsidian.md), mid-term memory management(Condensed session context)), proactive memory insertion, passive memory access.
Motion(Integration with VTube Studio, motion presets, TTS motion insertion, direct triggers, idle motion, blend, variational motion)
Browsing(UnderectedChromeDriver session, unrestricted navigation through tools, integration with vision module, DuckDuckGo search content retrieval)
Vision(Simple vision integration through VLM API)
Hearing(Direct interaction with local User, interruption on speech)
Moderation(Optionally block words, filter speech and text commands contents, LLM moderation if needed.)
Jailbreak(Insert conversation turns at start, replace rejection context)

Additional features include Streaming Mode, in which model is being constantly re-initialized, to continue independent operation.

Specialized vision was never started, and is just a placeholder.(Was meant for specialized things, like playing games)

As of now, this is final version, as im moving on to start Orchestrator 2.0, to make it better from groud-up, to allow better scalability and ease of integration.

Feature Highlights

  • Multi-provider LLM router with automatic key rotation (Groq, Cerebras, Gemini) and model aliasing.
  • Extensible module system covering chat ingestion, voice synthesis, motion control, memory, Discord automation, web browsing, computer vision, and speech detection.
  • MCP-like tool system so models can trigger actions (tool:Say, tool:Motion, tool:Visit, tool:See, tool:Recall, etc.). You can find all tools in Glossary Tab.
  • Electron dashboard with live feed, module controls, logging browser, motion designer, memory inspector, moderation settings, and glossary.
  • Subtitles/overlay server (SSE) for piping feed messages into broadcast software.
  • Persistent settings with migration support, automatic prompt/memory backups, and detailed feed logging for post-run analysis.

Architecture Overview

┌────────────────────────────┐        ┌──────────────────────┐
│ Electron UI (TypeScript)  │ <─────> │ Python STDIO bridge  │
│ tabs, overlays, logging   │        │ (hub.bridge.stdio)    │
└────────────┬──────────────┘        └─────────────┬────────┘
             │ JSON-RPC                            │
┌────────────▼──────────────┐        ┌──────────────────────┐
│ Orchestrator core         │        │ ModuleManager /      │
│ character, runtime, feed  │◄──────►│ modules.*            │
│ provider routing, tools   │        │ chat, voice, motion… │
└───────────────────────────┘        └──────────────────────┘

Key parts:

  • backend/ – Python package hub with the orchestrator core, module implementations, provider clients, settings, and bridge server.
  • electron/ – Electron app (main + renderer) that talks to the backend over JSON-RPC, renders the UI, and serves overlay assets.
  • keys/ – API credential files read by KeyPool (not tracked in git).
  • logging/ & backup/ – runtime artefacts written by the application (feed logs, prompt snapshots, etc.).

Prerequisites

  • Python: 3.10+ (tested on 3.10). Recommended to manage with pyenv, conda, or venv.
  • Node.js: 20+ (Electron 38).
  • Build tools for native dependencies (FAISS, PyAudio/sounddevice) appropriate for your OS.
  • Optional GPU stack:
    • CUDA/cuDNN (for PyTorch + SoVITS).
    • sounddevice, pyaudio, or platform specific audio backends.

Tip: On Windows install the Visual Studio Build Tools and the latest NVIDIA Toolkit if you intend to use GPU-backed modules.

Initial Setup

1. Clone and bootstrap

git clone https://github.com/your-org/VT-Orchestrator.git
cd VT-Orchestrator

2. Python backend (recommended virtual environment)

cd backend
python -m venv .venv
# Windows
.venv\Scripts\activate
# macOS / Linux
source .venv/bin/activate

pip install --upgrade pip wheel
pip install -r requirements.txt
pip install -r requirements-tts.txt      # voice playback, optional
pip install -e .                         # install the hub package (pyproject)

Optional extras:

  • pip install torch torchvision torchaudio (CUDA/MPS builds) for Chatterbox or SoVITS acceleration.
  • pip install sounddevice for audio playback output selection.
  • pip install sentence-transformers faiss-cpu if the default install skipped heavy packages.
  • pip install undetected-chromedriver pillow mss for browsing + vision modules.

The backend loads modules lazily—install only the packages you need. Missing dependencies will be logged as warnings and the affected module will stay disabled.

3. Node/Electron frontend

cd ..                     # back to repo root
npm install
npm run build             # compiles main + renderer TypeScript

During development:

npm run dev               # build then launch Electron

For a quick run without rebuilding:

npm run start

Running the Hub

  1. Ensure your Python virtual environment is activated (for tests/CLI tools).
  2. Populate API keys (see Configuration & Secrets).
  3. From the repository root run npm run dev. The script compiles TypeScript, spawns Electron, and the Electron worker launches the Python backend via backend/.venv if present.
  4. The main window opens with the Feed tab. Modules can be toggled/enabled from the “Modules” tab. Logging, motion tuning, and memory administration live in dedicated tabs.
  5. Optional overlay/subtitle view is served at http://localhost:3033/subtitles.

The backend process can also be started manually (useful for debugging):

cd backend
python -m hub.bridge.stdio_server

Configuration & Secrets

API Keys

KeyPool reads plaintext files from the keys/ directory (or HUB_KEYS_DIR). Each file contains one key per line:

Provider File Notes
Groq keys/groq_api.txt additional pools: groq_api-motion.txt used by Motion voice annotations
Cerebras keys/cerebras_api.txt
Google Gemini keys/gemini_api.txt

Keys rotate automatically when rate-limited/invalid. Keep per-provider keys on separate lines.

Settings persistence

  • Default template: backend/hub/settings.json
  • User overrides: backend/hub/settings.user.json (auto-created)
  • Backups (system prompts, voice prompts): backup/
  • Feed logging sessions: logging/

Set HUB_SETTINGS_FILE to redirect settings to a custom location.

Environment Variables

  • PYTHON: override interpreter used by Electron’s bridge (else .venv is auto-detected).
  • HUB_KEYS_DIR: point to an alternative credentials directory.
  • VTS_WS_URL or VTS_PORT: override VTube Studio endpoint for the Motion module.
  • NODE_BIN: force Node binary used by the Discord module subprocess.

Discord tokens

The Discord module persists its bot token under backend/hub/modules/discord/auth.py helpers. Tokens are stored in the module directory and picked up by the subprocess.

Module Catalog

Modules live under backend/hub/modules. The orchestrator auto-discovers them and exposes enable/disable + configuration endpoints.

Core runtime (backend/hub/core)

  • orchestrator.py – session state, queue management, streaming/resume, wait/stop, and tool execution loop.
  • module_manager.py – discovery, lifecycle, autopush loop management, tool registration.
  • provider_router.py – round-robin streaming across Groq/Cerebras/Gemini with failure detection and token accounting.
  • model_registry.py – universal alias resolution (universal.text.fast, etc.).
  • tool_manager – built-in tools (ChatLast, CommandHelp, Wait, Say, DebugAPI) and module-registered helpers.
  • settings.py – persisted runtime + module configs with migrations, default injection, and volatile stripping (Motion presets).

Chat (modules/chat.py)

  • Connects to Twitch IRC over TLS (no external deps).
  • Maintains recent history + backlog for LLM injection.
  • Autopush support: automatically submits curated chat excerpts when idle.
  • Configurable channel, autopush cadence, backlog pruning.

Voice (modules/voice/*)

  • Wraps Chatterbox TTS out of the box and can integrate with GPT-SoVITS when supplied with the custom forked weights/code expected by modules/voice/sovits (not bundled).
  • Handles queuing, sentence chunking, playback to primary/secondary audio devices, optional subtitle emission, and integration with the Motion module for voice-driven gestures.
  • playback.py links to Motion for “voice emotion” cues, handles Discord TTS queue bridging, throttle-on-speech, and autop scheduling.
  • runtime.py resolves devices, manages backend/model lifecycle, lazy loads dependencies, and restarts on config changes.
  • voicebank/ hosts reference WAVs; SoVITS weights live under modules/voice/sovits.

Motion (modules/Motion)

  • Controls VTube Studio via WebSocket (port 8012 by default).
  • Supports manual preset triggering (tool:Motion("Preset")), idle animations, motion blending, and optional LLM-driven gestures tied to voice transcripts.
  • Stores presets (presets.json), parameter metadata (parameters.json), and idle state snapshots (idle_state.json).
  • Separate key pool (-motion suffix) for the micro-agent that decides when to fire motions.
  • CLI helpers (backend/tools/vts_dump_params.py, vts_set_param.py) ease VTS configuration.

Memory (modules/memory/*)

  • Local vector store using SentenceTransformers embeddings + FAISS (IndexFlatIP).
  • Markdown export support, version tagging, manual + automatic memorization workflows, and mid-term session summaries.
  • Tools exposed:
    • tool:Recall("topic", k, "version")
    • tool:Memorize("text") / tool:Forget(...)
    • tool:RecentMemories
    • tool:Memory
  • Configurable summarizer/proposer models, autop proposals based on word counts, and markdown folder integration for knowledge bases.
  • Storage files default to backend/hub/data/memory_*.

Discord (modules/discord/*)

  • Spawns bot.js (discord.js v14) with voice support via @discordjs/voice.
  • Handles guild listing, channel joins, message send/receive, voice capture, DM notifications, and integrates with Memory (Discord-specific embeddings).
  • Provides a broad suite of tools (tool:DiscordMsg, tool:ChannelRecent, tool:DiscordServers, tool:JoinVoice, tool:LeaveChannel, DM helpers, notification management).
  • Maintains autopush backlog (server + DM) and can auto-memorize Discord user interactions.
  • Uses Node subprocess environment variables for intents, token, application/guild IDs.
  • Records raw audio for transcripts and can feed into voice playback queue.

Browsing (modules/browsing.py)

  • Launches a visible Chrome via undetected-chromedriver.
  • Tools: tool:Search, tool:Visit, tool:ScrollPage, tool:ViewPage.
  • Fetches HTML search results via DuckDuckGo’s light endpoint; integrates with Vision module to describe the current tab screenshot.

Vision

  • General Vision (vision_general.py):
    • Captures screenshots via mss/Pillow, resizes/compresses to target size, and sends to configured multimodal model (Groq Scout/Maverick or Gemini Flash/Pro).
    • Reports token usage back to orchestrator stats.
    • Exposed tool tool:See("prompt").
  • Specialized Vision (vision_special.py):
    • Lightweight hint injector for UI-specific prompts (disabled by default).

Hearing (modules/hearing/module.py)

  • Captures microphone input using sounddevice, applies VAD (faster-whisper.vad), and transcribes segments with faster-whisper (distil-large-v3 by default).
  • Detects user speech and triggers orchestrator to pause/stop TTS playback to avoid overlap.
  • Supports “express” cut-offs for fast speech detection and exposes input level telemetry to the UI.

Moderation (modules/moderation.py)

  • Moderates outgoing tool commands (default: Say, DiscordMsg).
  • Two-stage filter:
    • Stop-list to replace offending sentences with [Filtered].
    • Optional LLM rewrite with custom system prompt + provider preference.
  • UI configurables include command list, stop phrases, alternative rewrite text.

Jailbreak (modules/jailbreak.py)

  • Injects synthetic user/assistant turns at the start of a conversation to shape behaviour.
  • Optional apology/refusal rewrite: replaces early “I’m sorry, but…” responses with user-defined text.

Browsing/Vision Tools (modules/*_tools.py)

  • Tool definitions rely on regex detection and the core tool manager, injecting system blocks back into the assembled output. Each module exposes a get_tools function that returns Tool instances.

Legacy TTS (modules/tts_module.py)

  • PyQt-based experimental UI for Chatterbox TTS (kept for reference). Modern setups should rely on the new voice module.

Electron App

  • Main process (electron/main):
    • main.ts – creates the BrowserWindow.
    • ipc.ts – bridges renderer IPC to backend JSON-RPC, starts overlay server (port 3033) and feed logger.
    • pythonBridge.ts – spawns the Python backend (hub.bridge.stdio_server), auto-detects .venv, and routes requests.
    • overlayServer.ts – static file + server-sent events for subtitles.
    • feedLogger.ts – session-based feed logging with aggregation.
  • Renderer (electron/renderer):
    • renderer.ts orchestrates tabs and initial data loading.
    • feed.ts renders multi-column feed panels including module-specific views.
    • modules.ts provides module toggle UI + config editors (Motion idle tuning, Memory paths, Moderation toggles, Vision model selection, etc.).
    • Dedicated tabs for Character (character.ts), Motion (motion.ts), Memory (memory.ts), Moderation (moderation.ts), Glossary (glossary.ts), and Logging (logging.ts).
    • Styles: styles.css, HTML scaffold index.html.
  • Overlay (electron/overlay/subtitles.html) – simple HTML/CSS page that subscribes to /events.

Project Layout

VT-Orchestrator/
├─ backend/
│  ├─ hub/
│  │  ├─ core/               # orchestrator, providers, settings, tools
│  │  ├─ modules/            # module implementations
│  │  ├─ bridge/             # stdio JSON-RPC server
│  │  ├─ config/models.json  # model alias map
│  │  └─ settings*.json      # defaults + user overrides
│  ├─ tools/                 # VTS helper scripts
│  ├─ tests/                 # pytest suite
│  ├─ pyproject.toml         # backend package metadata
│  └─ requirements*.txt
├─ electron/
│  ├─ main/                  # Electron main process code
│  ├─ renderer/              # UI logic
│  ├─ overlay/               # subtitle page
│  └─ tsconfig.*.json
├─ scripts/start-electron.js # launches Electron with correct env
├─ keys/                     # place API keys here (ignored by git)
├─ logging/                  # generated feed logs
├─ backup/                   # prompt/memory backups
└─ README.md

Testing

Backend tests live under backend/tests. Activate the virtual environment first, then run:

cd backend
pytest

Tests cover module discovery, provider routing, settings migrations, voice-motion coordination, and orchestrator helpers. Additions should include targeted tests to keep regression safety high.

Troubleshooting & Tips

  • Missing dependency warnings: check module-specific requirements (e.g., pip install pillow mss for Vision, pip install undetected-chromedriver for Browsing, pip install faster-whisper for Hearing).
  • No keys available: verify keys/*.txt files exist and contain valid API keys. The UI exposes counts under the Stats tab.
  • Motion module not connecting: Confirm VTube Studio has the WebSocket API enabled (port 8012) and that vts_auth_token.txt exists. Use the helper scripts in backend/tools.
  • Audio output not found: Install sounddevice and ensure the chosen device index matches the OS audio device list. The Voice tab lists raw device info.
  • Discord module crashes immediately: Ensure the Node runtime has the necessary voice dependencies and the bot token/intents are configured. Review logs under logging/module-discord*.txt.
  • Gemini requests fail: Place the API key in keys/gemini_api.txt and confirm the selected vision model starts with gemini-.
  • Re-running after config edits: Delete problematic entries from settings.user.json or set HUB_SETTINGS_FILE to a clean path; the registry will fall back to defaults.
  • Overlay not updating: Confirm Electron main process launched overlayServer (logged in console). Refresh http://localhost:3033/subtitles.

Next Steps

  • Tweak module defaults via the UI and persist to settings.user.json.
  • Extend functionality by adding new modules: drop a package under backend/hub/modules, expose MODULE_CLASS, and the orchestrator will auto-discover it.
  • Explore feed logs in logging/ to fine-tune prompts, moderation, and motion presets.

Happy streaming!

About

An Orchestrator App for AI VTubers

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •