-
Notifications
You must be signed in to change notification settings - Fork 4.7k
Claude/voice application setup 011 cv57 d2by dc qc yu puw fye1 #3747
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Claude/voice application setup 011 cv57 d2by dc qc yu puw fye1 #3747
Conversation
Implemented a full-stack real-time voice application featuring: Backend (Node.js + TypeScript): - WebSocket server for real-time bidirectional communication - Voice pipeline: Whisper (STT) → Claude with MCP → Eleven Labs (TTS) - MCP client manager with mem0 and Tavily integration - Session-based audio processing and conversation history - Comprehensive error handling and logging Frontend (Next.js + React): - Animated 3D orb with Three.js and React Three Fiber - Blue particle effects when user speaks - Pink particle effects when AI responds - Real-time chat UI with speech bubble transcriptions - WebSocket client with audio recording and playback - Responsive design with Tailwind CSS MCP Integration: - mem0 server for persistent conversation memory - Tavily server for web search capabilities - User ID: mem0-zai-crew, Org: daddyholmes-default-org - Automatic context retrieval and storage Key Features: - Real-time voice transcription with streaming updates - Context-aware responses using conversation history - Beautiful 3D animated orb visualization - Production-ready with Docker support - Comprehensive documentation (README + ARCHITECTURE) Tech Stack: - Backend: Node.js 20+, TypeScript, Express, ws - Frontend: Next.js 15, React 19, Three.js - AI: Claude Sonnet 4.5, OpenAI Whisper, Eleven Labs - Memory: mem0 Enterprise Cloud - MCP: @modelcontextprotocol/sdk File Structure: - voice-app/backend: Complete TypeScript backend - voice-app/frontend: Next.js frontend with 3D graphics - voice-app/ARCHITECTURE.md: Detailed system design - voice-app/README.md: Setup and usage guide - voice-app/docker-compose.yml: Docker configuration
This commit implements a complete dual-agent voice assistant system that allows users to toggle between two different AI voice providers: Backend Changes: - Add QWen 3 Omni real-time voice provider with WebSocket integration - Implement voice agent switching in WebSocket handlers - Update session management to track selected voice agent - Add support for SET_VOICE_AGENT and VOICE_AGENT_CHANGED messages - Route audio processing to appropriate agent based on selection Frontend Changes: - Create VoiceAgentToggle component for switching between agents - Update VoiceAssistant to display current agent and support switching - Extend useWebSocket hook with agent management capabilities - Add visual feedback for active agent (blue for ElevenLabs, purple for QWen) Configuration: - Add all required API keys to .env.example (ElevenLabs, QWen, Azure, Tavily) - Configure QWen Omni settings (model, voice, endpoint) - Set DEFAULT_VOICE_AGENT environment variable Features: - Real-time agent switching (when assistant is idle) - QWen 3 Omni emotional voice synthesis - Support for 10+ languages - Multiple voice options (Cherry, Ethan, Jennifer, Ryan, etc.) - Seamless WebSocket communication for both agents Documentation: - Add comprehensive SETUP.md with usage instructions - Document API rate limits and pricing - Include troubleshooting guide - Provide architecture diagrams The implementation follows the Model Context Protocol and integrates with existing MCP servers for extended capabilities.
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces a comprehensive dual-agent AI voice assistant application with real-time WebSocket communication, 3D animated visualization, and MCP integration. The application supports two voice AI providers: ElevenLabs (with Claude AI and Whisper STT) and Qwen 3 Omni (Alibaba's multimodal voice AI).
Key Changes:
- Full-stack voice application with Next.js frontend and Node.js backend
- WebSocket-based real-time bidirectional audio streaming
- Integration with Claude AI, Whisper, ElevenLabs, and Qwen Omni APIs
- MCP (Model Context Protocol) support for mem0 memory and Tavily search
- 3D animated orb visualization using Three.js/React Three Fiber
Reviewed Changes
Copilot reviewed 39 out of 39 changed files in this pull request and generated 13 comments.
Show a summary per file
| File | Description |
|---|---|
| voice-app/frontend/package.json | Frontend dependencies including Next.js 15, React 19, Three.js for 3D visualization |
| voice-app/frontend/tsconfig.json | TypeScript configuration for Next.js with bundler module resolution |
| voice-app/frontend/lib/types.ts | Type definitions for WebSocket messages, voice agents, and chat data |
| voice-app/frontend/hooks/useWebSocket.ts | Custom hook for WebSocket connection and message handling |
| voice-app/frontend/hooks/useAudio.ts | Audio recording and playback functionality using Web Audio API |
| voice-app/frontend/components/VoiceAssistant.tsx | Main component orchestrating voice interaction and UI state |
| voice-app/frontend/components/VoiceAgentToggle.tsx | UI component for switching between ElevenLabs and Qwen agents |
| voice-app/frontend/components/ChatUI.tsx | Chat interface displaying conversation history |
| voice-app/frontend/components/AnimatedOrb.tsx | 3D visualization with particles that change color based on speaker |
| voice-app/backend/package.json | Backend dependencies including Anthropic SDK, OpenAI, MCP SDK |
| voice-app/backend/tsconfig.json | TypeScript configuration for Node.js backend |
| voice-app/backend/src/types/index.ts | Backend type definitions matching frontend types |
| voice-app/backend/src/websocket/server.ts | WebSocket server setup with session management and heartbeat |
| voice-app/backend/src/websocket/handlers.ts | Message routing and processing for audio chunks and agent switching |
| voice-app/backend/src/voice/stt.ts | OpenAI Whisper integration for speech-to-text |
| voice-app/backend/src/voice/tts.ts | ElevenLabs integration for text-to-speech synthesis |
| voice-app/backend/src/voice/qwen-omni.ts | Qwen 3 Omni real-time voice provider with WebSocket connection |
| voice-app/backend/src/voice/pipeline.ts | Voice processing pipeline orchestrating STT, LLM, and TTS |
| voice-app/backend/src/llm/claude.ts | Claude AI integration with MCP tools for memory and search |
| voice-app/backend/src/mcp/client.ts | MCP client manager for connecting to MCP servers |
| voice-app/backend/src/mcp/servers.ts | Configuration for mem0 and Tavily MCP servers |
| voice-app/docker-compose.yml | Docker Compose setup for running frontend and backend services |
| voice-app/SETUP.md | Comprehensive setup guide with environment variables and troubleshooting |
| voice-app/README.md | Project overview with quick start instructions |
| voice-app/ARCHITECTURE.md | Detailed architecture documentation with data flow diagrams |
Comments suppressed due to low confidence (1)
voice-app/frontend/hooks/useWebSocket.ts:31
- Unused variable audioQueueRef.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| async start(): Promise<void> { | ||
| this.wss = new WSServer({ | ||
| port: this.port, | ||
| maxPayload: Number(process.env.WS_MAX_PAYLOAD) || 10 * 1024 * 1024, // 10MB |
Copilot
AI
Nov 13, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Security concern: The WebSocket server accepts connections without any authentication or origin validation. In production, you should implement:
- Origin validation to prevent unauthorized cross-origin connections
- Authentication tokens or API keys
- Rate limiting per connection
- CORS restrictions
Consider adding a verifyClient callback to the WSServer configuration:
this.wss = new WSServer({
port: this.port,
maxPayload: Number(process.env.WS_MAX_PAYLOAD) || 10 * 1024 * 1024,
verifyClient: (info) => {
// Verify origin, authentication token, etc.
const origin = info.origin;
// Add your validation logic here
return true; // or false to reject
}
});| maxPayload: Number(process.env.WS_MAX_PAYLOAD) || 10 * 1024 * 1024, // 10MB | |
| maxPayload: Number(process.env.WS_MAX_PAYLOAD) || 10 * 1024 * 1024, // 10MB | |
| verifyClient: (info, done) => { | |
| // Origin validation | |
| const allowedOrigins = (process.env.WS_ALLOWED_ORIGINS || '').split(',').map(o => o.trim()).filter(o => o); | |
| const origin = info.origin; | |
| if (allowedOrigins.length > 0 && !allowedOrigins.includes(origin)) { | |
| logger.warn(`WebSocket connection rejected due to invalid origin: ${origin}`); | |
| return done(false, 403, 'Forbidden'); | |
| } | |
| // Token validation (query param) | |
| const url = require('url'); | |
| const parsedUrl = url.parse(info.req.url || '', true); | |
| const token = parsedUrl.query && parsedUrl.query.token; | |
| const expectedToken = process.env.WS_AUTH_TOKEN; | |
| if (expectedToken && token !== expectedToken) { | |
| logger.warn(`WebSocket connection rejected due to invalid token: ${token}`); | |
| return done(false, 401, 'Unauthorized'); | |
| } | |
| // Passed all checks | |
| return done(true); | |
| } |
| qwenProvider.on('response_text', (text: string) => { | ||
| fullResponse += text; | ||
| this.sendMessage(session.ws, WebSocketMessageType.AI_RESPONSE_TEXT, { | ||
| text: fullResponse, | ||
| messageId, | ||
| }); | ||
| }); | ||
|
|
||
| qwenProvider.on('audio_chunk', (audioChunk: Buffer) => { | ||
| this.sendMessage(session.ws, WebSocketMessageType.AI_RESPONSE_AUDIO, { | ||
| audio: audioChunk.toString('base64'), | ||
| messageId, | ||
| isLast: false, | ||
| }); | ||
| }); | ||
|
|
||
| qwenProvider.once('response_complete', () => { |
Copilot
AI
Nov 13, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential memory leak: Event listeners are registered on the qwenProvider using .on() for response_text and audio_chunk events, but these are never removed. If this method is called multiple times (e.g., user sends multiple messages), listeners will accumulate, causing memory leaks and potentially duplicate event handling.
Either:
- Use
.once()instead of.on()if you only expect one event per request - Remove the listeners after the
response_completeevent using.off()or.removeAllListeners() - Create a new provider instance per request instead of reusing the same one
Example fix:
const handlers = {
responseText: (text: string) => { /* ... */ },
audioChunk: (audioChunk: Buffer) => { /* ... */ }
};
qwenProvider.on('response_text', handlers.responseText);
qwenProvider.on('audio_chunk', handlers.audioChunk);
qwenProvider.once('response_complete', () => {
// Clean up listeners
qwenProvider.off('response_text', handlers.responseText);
qwenProvider.off('audio_chunk', handlers.audioChunk);
// ... rest of the code
});| qwenProvider.on('response_text', (text: string) => { | |
| fullResponse += text; | |
| this.sendMessage(session.ws, WebSocketMessageType.AI_RESPONSE_TEXT, { | |
| text: fullResponse, | |
| messageId, | |
| }); | |
| }); | |
| qwenProvider.on('audio_chunk', (audioChunk: Buffer) => { | |
| this.sendMessage(session.ws, WebSocketMessageType.AI_RESPONSE_AUDIO, { | |
| audio: audioChunk.toString('base64'), | |
| messageId, | |
| isLast: false, | |
| }); | |
| }); | |
| qwenProvider.once('response_complete', () => { | |
| // Store handler references for cleanup | |
| const responseTextHandler = (text: string) => { | |
| fullResponse += text; | |
| this.sendMessage(session.ws, WebSocketMessageType.AI_RESPONSE_TEXT, { | |
| text: fullResponse, | |
| messageId, | |
| }); | |
| }; | |
| const audioChunkHandler = (audioChunk: Buffer) => { | |
| this.sendMessage(session.ws, WebSocketMessageType.AI_RESPONSE_AUDIO, { | |
| audio: audioChunk.toString('base64'), | |
| messageId, | |
| isLast: false, | |
| }); | |
| }; | |
| qwenProvider.on('response_text', responseTextHandler); | |
| qwenProvider.on('audio_chunk', audioChunkHandler); | |
| qwenProvider.once('response_complete', () => { | |
| // Clean up listeners to prevent memory leaks | |
| qwenProvider.off('response_text', responseTextHandler); | |
| qwenProvider.off('audio_chunk', audioChunkHandler); |
| import { useAudioRecorder, useAudioPlayer } from '@/hooks/useAudio'; | ||
| import { ChatMessage, AssistantStatus, VoiceAgent } from '@/lib/types'; | ||
| import { Mic, MicOff } from 'lucide-react'; | ||
| import { cn } from '@/lib/utils'; |
Copilot
AI
Nov 13, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cn utility function is imported from @/lib/utils but this file is missing from the repository. You need to create voice-app/frontend/lib/utils.ts with the following implementation:
import { type ClassValue, clsx } from 'clsx';
import { twMerge } from 'tailwind-merge';
export function cn(...inputs: ClassValue[]) {
return twMerge(clsx(inputs));
}This utility is commonly used for merging Tailwind CSS classes and is required by VoiceAssistant.tsx, VoiceAgentToggle.tsx, and ChatBubble.tsx.
| }, [status, setVoiceAgent]); | ||
|
|
||
| const getAgentName = () => { | ||
| return currentAgent === 'elevenlabs' ? 'ElevenLabs' : 'QWen 3 Omni'; |
Copilot
AI
Nov 13, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inconsistent naming: The product name is "Qwen" (all lowercase except the first letter), not "QWen". Throughout the codebase and documentation, you should use the official spelling "Qwen 3 Omni" or "Qwen2.5-Omni" for consistency.
| this.ws.on('error', (error) => { | ||
| logger.error('QWen Omni WebSocket error:', error); | ||
| this.emit('error', error); | ||
| reject(error); | ||
| }); |
Copilot
AI
Nov 13, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential race condition: If the WebSocket connection fails immediately after creation but before the 'error' handler is registered, or if multiple errors occur rapidly, the promise could be rejected multiple times. Consider adding a flag to ensure the promise is only resolved/rejected once, or use a timeout to handle connection that never opens.
| throw new Error('QWen Omni provider not initialized'); | ||
| } | ||
|
|
||
| const messageId = require('uuid').v4(); |
Copilot
AI
Nov 13, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using require() for importing uuid is inconsistent with the ES module imports used elsewhere in the file. Change this to:
import { v4 as uuidv4 } from 'uuid';And then use uuidv4() instead of require('uuid').v4(). Note that uuid is already imported at the top of the pipeline.ts file this way.
| )} | ||
| > | ||
| <Sparkles className="w-4 h-4" /> | ||
| <span>QWen 3 Omni</span> |
Copilot
AI
Nov 13, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inconsistent naming: The product name is "Qwen" (all lowercase except the first letter), not "QWen". Throughout the codebase and documentation, you should use the official spelling "Qwen 3 Omni" or "Qwen2.5-Omni" for consistency.
| <span>QWen 3 Omni</span> | |
| <span>Qwen 3 Omni</span> |
| private ws: WebSocket | null = null; | ||
| private isConnected = false; | ||
| private sessionId: string | null = null; | ||
| private audioQueue: Buffer[] = []; |
Copilot
AI
Nov 13, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The audioQueue field is declared but never used anywhere in this class. Consider removing it or implementing the intended audio queue functionality if it was meant to buffer audio chunks.
| private audioQueue: Buffer[] = []; |
| <button | ||
| onMouseDown={handleMouseDown} | ||
| onMouseUp={handleMouseUp} | ||
| onTouchStart={handleMouseDown} | ||
| onTouchEnd={handleMouseUp} | ||
| disabled={!isConnected || status === 'processing' || status === 'speaking'} | ||
| className={cn( | ||
| 'relative group p-8 rounded-full transition-all duration-200', | ||
| 'disabled:opacity-50 disabled:cursor-not-allowed', | ||
| isRecording | ||
| ? 'bg-blue-500 shadow-lg shadow-blue-500/50 scale-110' | ||
| : 'bg-gray-700 hover:bg-gray-600 hover:shadow-lg' | ||
| )} | ||
| > | ||
| {isRecording ? ( | ||
| <Mic className="w-12 h-12 text-white" /> | ||
| ) : ( | ||
| <MicOff className="w-12 h-12 text-gray-300" /> | ||
| )} | ||
|
|
||
| <div className="absolute -bottom-12 left-1/2 -translate-x-1/2 whitespace-nowrap text-sm text-gray-400"> | ||
| {isRecording ? 'Release to send' : 'Hold to speak'} | ||
| </div> | ||
| </button> |
Copilot
AI
Nov 13, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The voice recording button is missing accessibility attributes. Add aria-label, aria-pressed, and role attributes to improve screen reader support:
<button
onMouseDown={handleMouseDown}
onMouseUp={handleMouseUp}
onTouchStart={handleMouseDown}
onTouchEnd={handleMouseUp}
disabled={!isConnected || status === 'processing' || status === 'speaking'}
aria-label={isRecording ? 'Recording - Release to send' : 'Hold to record voice message'}
aria-pressed={isRecording}
role="button"
className={cn(
// ... rest of classes
)}
>| WebSocketMessageType, | ||
| AudioChunkPayload, | ||
| SetVoiceAgentPayload, | ||
| VoiceAgent, |
Copilot
AI
Nov 13, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unused import VoiceAgent.
| VoiceAgent, |
Description
Please include a summary of the change and which issue is fixed. Please also include relevant motivation and context. List any dependencies that are required for this change.
Fixes # (issue)
Type of change
Please delete options that are not relevant.
How Has This Been Tested?
Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration
Please delete options that are not relevant.
Checklist:
Maintainer Checklist