Skip to content

Conversation

@bon-zai
Copy link

@bon-zai bon-zai commented Nov 13, 2025

Description

Please include a summary of the change and which issue is fixed. Please also include relevant motivation and context. List any dependencies that are required for this change.

Fixes # (issue)

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Refactor (does not change functionality, e.g. code style improvements, linting)
  • Documentation update

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration

Please delete options that are not relevant.

  • Unit Test
  • Test Script (please provide)

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • Any dependent changes have been merged and published in downstream modules
  • I have checked my code and corrected any misspellings

Maintainer Checklist

  • closes #xxxx (Replace xxxx with the GitHub issue number)
  • Made sure Checks passed

Implemented a full-stack real-time voice application featuring:

Backend (Node.js + TypeScript):
- WebSocket server for real-time bidirectional communication
- Voice pipeline: Whisper (STT) → Claude with MCP → Eleven Labs (TTS)
- MCP client manager with mem0 and Tavily integration
- Session-based audio processing and conversation history
- Comprehensive error handling and logging

Frontend (Next.js + React):
- Animated 3D orb with Three.js and React Three Fiber
- Blue particle effects when user speaks
- Pink particle effects when AI responds
- Real-time chat UI with speech bubble transcriptions
- WebSocket client with audio recording and playback
- Responsive design with Tailwind CSS

MCP Integration:
- mem0 server for persistent conversation memory
- Tavily server for web search capabilities
- User ID: mem0-zai-crew, Org: daddyholmes-default-org
- Automatic context retrieval and storage

Key Features:
- Real-time voice transcription with streaming updates
- Context-aware responses using conversation history
- Beautiful 3D animated orb visualization
- Production-ready with Docker support
- Comprehensive documentation (README + ARCHITECTURE)

Tech Stack:
- Backend: Node.js 20+, TypeScript, Express, ws
- Frontend: Next.js 15, React 19, Three.js
- AI: Claude Sonnet 4.5, OpenAI Whisper, Eleven Labs
- Memory: mem0 Enterprise Cloud
- MCP: @modelcontextprotocol/sdk

File Structure:
- voice-app/backend: Complete TypeScript backend
- voice-app/frontend: Next.js frontend with 3D graphics
- voice-app/ARCHITECTURE.md: Detailed system design
- voice-app/README.md: Setup and usage guide
- voice-app/docker-compose.yml: Docker configuration
This commit implements a complete dual-agent voice assistant system that allows
users to toggle between two different AI voice providers:

Backend Changes:
- Add QWen 3 Omni real-time voice provider with WebSocket integration
- Implement voice agent switching in WebSocket handlers
- Update session management to track selected voice agent
- Add support for SET_VOICE_AGENT and VOICE_AGENT_CHANGED messages
- Route audio processing to appropriate agent based on selection

Frontend Changes:
- Create VoiceAgentToggle component for switching between agents
- Update VoiceAssistant to display current agent and support switching
- Extend useWebSocket hook with agent management capabilities
- Add visual feedback for active agent (blue for ElevenLabs, purple for QWen)

Configuration:
- Add all required API keys to .env.example (ElevenLabs, QWen, Azure, Tavily)
- Configure QWen Omni settings (model, voice, endpoint)
- Set DEFAULT_VOICE_AGENT environment variable

Features:
- Real-time agent switching (when assistant is idle)
- QWen 3 Omni emotional voice synthesis
- Support for 10+ languages
- Multiple voice options (Cherry, Ethan, Jennifer, Ryan, etc.)
- Seamless WebSocket communication for both agents

Documentation:
- Add comprehensive SETUP.md with usage instructions
- Document API rate limits and pricing
- Include troubleshooting guide
- Provide architecture diagrams

The implementation follows the Model Context Protocol and integrates with
existing MCP servers for extended capabilities.
Copilot AI review requested due to automatic review settings November 13, 2025 10:16
@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

Copilot finished reviewing on behalf of bon-zai November 13, 2025 10:21
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces a comprehensive dual-agent AI voice assistant application with real-time WebSocket communication, 3D animated visualization, and MCP integration. The application supports two voice AI providers: ElevenLabs (with Claude AI and Whisper STT) and Qwen 3 Omni (Alibaba's multimodal voice AI).

Key Changes:

  • Full-stack voice application with Next.js frontend and Node.js backend
  • WebSocket-based real-time bidirectional audio streaming
  • Integration with Claude AI, Whisper, ElevenLabs, and Qwen Omni APIs
  • MCP (Model Context Protocol) support for mem0 memory and Tavily search
  • 3D animated orb visualization using Three.js/React Three Fiber

Reviewed Changes

Copilot reviewed 39 out of 39 changed files in this pull request and generated 13 comments.

Show a summary per file
File Description
voice-app/frontend/package.json Frontend dependencies including Next.js 15, React 19, Three.js for 3D visualization
voice-app/frontend/tsconfig.json TypeScript configuration for Next.js with bundler module resolution
voice-app/frontend/lib/types.ts Type definitions for WebSocket messages, voice agents, and chat data
voice-app/frontend/hooks/useWebSocket.ts Custom hook for WebSocket connection and message handling
voice-app/frontend/hooks/useAudio.ts Audio recording and playback functionality using Web Audio API
voice-app/frontend/components/VoiceAssistant.tsx Main component orchestrating voice interaction and UI state
voice-app/frontend/components/VoiceAgentToggle.tsx UI component for switching between ElevenLabs and Qwen agents
voice-app/frontend/components/ChatUI.tsx Chat interface displaying conversation history
voice-app/frontend/components/AnimatedOrb.tsx 3D visualization with particles that change color based on speaker
voice-app/backend/package.json Backend dependencies including Anthropic SDK, OpenAI, MCP SDK
voice-app/backend/tsconfig.json TypeScript configuration for Node.js backend
voice-app/backend/src/types/index.ts Backend type definitions matching frontend types
voice-app/backend/src/websocket/server.ts WebSocket server setup with session management and heartbeat
voice-app/backend/src/websocket/handlers.ts Message routing and processing for audio chunks and agent switching
voice-app/backend/src/voice/stt.ts OpenAI Whisper integration for speech-to-text
voice-app/backend/src/voice/tts.ts ElevenLabs integration for text-to-speech synthesis
voice-app/backend/src/voice/qwen-omni.ts Qwen 3 Omni real-time voice provider with WebSocket connection
voice-app/backend/src/voice/pipeline.ts Voice processing pipeline orchestrating STT, LLM, and TTS
voice-app/backend/src/llm/claude.ts Claude AI integration with MCP tools for memory and search
voice-app/backend/src/mcp/client.ts MCP client manager for connecting to MCP servers
voice-app/backend/src/mcp/servers.ts Configuration for mem0 and Tavily MCP servers
voice-app/docker-compose.yml Docker Compose setup for running frontend and backend services
voice-app/SETUP.md Comprehensive setup guide with environment variables and troubleshooting
voice-app/README.md Project overview with quick start instructions
voice-app/ARCHITECTURE.md Detailed architecture documentation with data flow diagrams
Comments suppressed due to low confidence (1)

voice-app/frontend/hooks/useWebSocket.ts:31

  • Unused variable audioQueueRef.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

async start(): Promise<void> {
this.wss = new WSServer({
port: this.port,
maxPayload: Number(process.env.WS_MAX_PAYLOAD) || 10 * 1024 * 1024, // 10MB
Copy link

Copilot AI Nov 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security concern: The WebSocket server accepts connections without any authentication or origin validation. In production, you should implement:

  1. Origin validation to prevent unauthorized cross-origin connections
  2. Authentication tokens or API keys
  3. Rate limiting per connection
  4. CORS restrictions

Consider adding a verifyClient callback to the WSServer configuration:

this.wss = new WSServer({
  port: this.port,
  maxPayload: Number(process.env.WS_MAX_PAYLOAD) || 10 * 1024 * 1024,
  verifyClient: (info) => {
    // Verify origin, authentication token, etc.
    const origin = info.origin;
    // Add your validation logic here
    return true; // or false to reject
  }
});
Suggested change
maxPayload: Number(process.env.WS_MAX_PAYLOAD) || 10 * 1024 * 1024, // 10MB
maxPayload: Number(process.env.WS_MAX_PAYLOAD) || 10 * 1024 * 1024, // 10MB
verifyClient: (info, done) => {
// Origin validation
const allowedOrigins = (process.env.WS_ALLOWED_ORIGINS || '').split(',').map(o => o.trim()).filter(o => o);
const origin = info.origin;
if (allowedOrigins.length > 0 && !allowedOrigins.includes(origin)) {
logger.warn(`WebSocket connection rejected due to invalid origin: ${origin}`);
return done(false, 403, 'Forbidden');
}
// Token validation (query param)
const url = require('url');
const parsedUrl = url.parse(info.req.url || '', true);
const token = parsedUrl.query && parsedUrl.query.token;
const expectedToken = process.env.WS_AUTH_TOKEN;
if (expectedToken && token !== expectedToken) {
logger.warn(`WebSocket connection rejected due to invalid token: ${token}`);
return done(false, 401, 'Unauthorized');
}
// Passed all checks
return done(true);
}

Copilot uses AI. Check for mistakes.
Comment on lines +249 to +265
qwenProvider.on('response_text', (text: string) => {
fullResponse += text;
this.sendMessage(session.ws, WebSocketMessageType.AI_RESPONSE_TEXT, {
text: fullResponse,
messageId,
});
});

qwenProvider.on('audio_chunk', (audioChunk: Buffer) => {
this.sendMessage(session.ws, WebSocketMessageType.AI_RESPONSE_AUDIO, {
audio: audioChunk.toString('base64'),
messageId,
isLast: false,
});
});

qwenProvider.once('response_complete', () => {
Copy link

Copilot AI Nov 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential memory leak: Event listeners are registered on the qwenProvider using .on() for response_text and audio_chunk events, but these are never removed. If this method is called multiple times (e.g., user sends multiple messages), listeners will accumulate, causing memory leaks and potentially duplicate event handling.

Either:

  1. Use .once() instead of .on() if you only expect one event per request
  2. Remove the listeners after the response_complete event using .off() or .removeAllListeners()
  3. Create a new provider instance per request instead of reusing the same one

Example fix:

const handlers = {
  responseText: (text: string) => { /* ... */ },
  audioChunk: (audioChunk: Buffer) => { /* ... */ }
};

qwenProvider.on('response_text', handlers.responseText);
qwenProvider.on('audio_chunk', handlers.audioChunk);

qwenProvider.once('response_complete', () => {
  // Clean up listeners
  qwenProvider.off('response_text', handlers.responseText);
  qwenProvider.off('audio_chunk', handlers.audioChunk);
  // ... rest of the code
});
Suggested change
qwenProvider.on('response_text', (text: string) => {
fullResponse += text;
this.sendMessage(session.ws, WebSocketMessageType.AI_RESPONSE_TEXT, {
text: fullResponse,
messageId,
});
});
qwenProvider.on('audio_chunk', (audioChunk: Buffer) => {
this.sendMessage(session.ws, WebSocketMessageType.AI_RESPONSE_AUDIO, {
audio: audioChunk.toString('base64'),
messageId,
isLast: false,
});
});
qwenProvider.once('response_complete', () => {
// Store handler references for cleanup
const responseTextHandler = (text: string) => {
fullResponse += text;
this.sendMessage(session.ws, WebSocketMessageType.AI_RESPONSE_TEXT, {
text: fullResponse,
messageId,
});
};
const audioChunkHandler = (audioChunk: Buffer) => {
this.sendMessage(session.ws, WebSocketMessageType.AI_RESPONSE_AUDIO, {
audio: audioChunk.toString('base64'),
messageId,
isLast: false,
});
};
qwenProvider.on('response_text', responseTextHandler);
qwenProvider.on('audio_chunk', audioChunkHandler);
qwenProvider.once('response_complete', () => {
// Clean up listeners to prevent memory leaks
qwenProvider.off('response_text', responseTextHandler);
qwenProvider.off('audio_chunk', audioChunkHandler);

Copilot uses AI. Check for mistakes.
import { useAudioRecorder, useAudioPlayer } from '@/hooks/useAudio';
import { ChatMessage, AssistantStatus, VoiceAgent } from '@/lib/types';
import { Mic, MicOff } from 'lucide-react';
import { cn } from '@/lib/utils';
Copy link

Copilot AI Nov 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cn utility function is imported from @/lib/utils but this file is missing from the repository. You need to create voice-app/frontend/lib/utils.ts with the following implementation:

import { type ClassValue, clsx } from 'clsx';
import { twMerge } from 'tailwind-merge';

export function cn(...inputs: ClassValue[]) {
  return twMerge(clsx(inputs));
}

This utility is commonly used for merging Tailwind CSS classes and is required by VoiceAssistant.tsx, VoiceAgentToggle.tsx, and ChatBubble.tsx.

Copilot uses AI. Check for mistakes.
}, [status, setVoiceAgent]);

const getAgentName = () => {
return currentAgent === 'elevenlabs' ? 'ElevenLabs' : 'QWen 3 Omni';
Copy link

Copilot AI Nov 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent naming: The product name is "Qwen" (all lowercase except the first letter), not "QWen". Throughout the codebase and documentation, you should use the official spelling "Qwen 3 Omni" or "Qwen2.5-Omni" for consistency.

Copilot uses AI. Check for mistakes.
Comment on lines +75 to +79
this.ws.on('error', (error) => {
logger.error('QWen Omni WebSocket error:', error);
this.emit('error', error);
reject(error);
});
Copy link

Copilot AI Nov 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential race condition: If the WebSocket connection fails immediately after creation but before the 'error' handler is registered, or if multiple errors occur rapidly, the promise could be rejected multiple times. Consider adding a flag to ensure the promise is only resolved/rejected once, or use a timeout to handle connection that never opens.

Copilot uses AI. Check for mistakes.
throw new Error('QWen Omni provider not initialized');
}

const messageId = require('uuid').v4();
Copy link

Copilot AI Nov 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using require() for importing uuid is inconsistent with the ES module imports used elsewhere in the file. Change this to:

import { v4 as uuidv4 } from 'uuid';

And then use uuidv4() instead of require('uuid').v4(). Note that uuid is already imported at the top of the pipeline.ts file this way.

Copilot uses AI. Check for mistakes.
)}
>
<Sparkles className="w-4 h-4" />
<span>QWen 3 Omni</span>
Copy link

Copilot AI Nov 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent naming: The product name is "Qwen" (all lowercase except the first letter), not "QWen". Throughout the codebase and documentation, you should use the official spelling "Qwen 3 Omni" or "Qwen2.5-Omni" for consistency.

Suggested change
<span>QWen 3 Omni</span>
<span>Qwen 3 Omni</span>

Copilot uses AI. Check for mistakes.
private ws: WebSocket | null = null;
private isConnected = false;
private sessionId: string | null = null;
private audioQueue: Buffer[] = [];
Copy link

Copilot AI Nov 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The audioQueue field is declared but never used anywhere in this class. Consider removing it or implementing the intended audio queue functionality if it was meant to buffer audio chunks.

Suggested change
private audioQueue: Buffer[] = [];

Copilot uses AI. Check for mistakes.
Comment on lines +174 to +197
<button
onMouseDown={handleMouseDown}
onMouseUp={handleMouseUp}
onTouchStart={handleMouseDown}
onTouchEnd={handleMouseUp}
disabled={!isConnected || status === 'processing' || status === 'speaking'}
className={cn(
'relative group p-8 rounded-full transition-all duration-200',
'disabled:opacity-50 disabled:cursor-not-allowed',
isRecording
? 'bg-blue-500 shadow-lg shadow-blue-500/50 scale-110'
: 'bg-gray-700 hover:bg-gray-600 hover:shadow-lg'
)}
>
{isRecording ? (
<Mic className="w-12 h-12 text-white" />
) : (
<MicOff className="w-12 h-12 text-gray-300" />
)}

<div className="absolute -bottom-12 left-1/2 -translate-x-1/2 whitespace-nowrap text-sm text-gray-400">
{isRecording ? 'Release to send' : 'Hold to speak'}
</div>
</button>
Copy link

Copilot AI Nov 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The voice recording button is missing accessibility attributes. Add aria-label, aria-pressed, and role attributes to improve screen reader support:

<button
  onMouseDown={handleMouseDown}
  onMouseUp={handleMouseUp}
  onTouchStart={handleMouseDown}
  onTouchEnd={handleMouseUp}
  disabled={!isConnected || status === 'processing' || status === 'speaking'}
  aria-label={isRecording ? 'Recording - Release to send' : 'Hold to record voice message'}
  aria-pressed={isRecording}
  role="button"
  className={cn(
    // ... rest of classes
  )}
>

Copilot uses AI. Check for mistakes.
WebSocketMessageType,
AudioChunkPayload,
SetVoiceAgentPayload,
VoiceAgent,
Copy link

Copilot AI Nov 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused import VoiceAgent.

Suggested change
VoiceAgent,

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants