Claude/voice application setup 011 cv57 d2by dc qc yu puw fye1 #3747

bon-zai · 2025-11-13T10:16:37Z

Description

Please include a summary of the change and which issue is fixed. Please also include relevant motivation and context. List any dependencies that are required for this change.

Fixes # (issue)

Type of change

Please delete options that are not relevant.

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Refactor (does not change functionality, e.g. code style improvements, linting)
Documentation update

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration

Please delete options that are not relevant.

Unit Test
Test Script (please provide)

Checklist:

My code follows the style guidelines of this project
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes
Any dependent changes have been merged and published in downstream modules
I have checked my code and corrected any misspellings

Maintainer Checklist

closes #xxxx (Replace xxxx with the GitHub issue number)
Made sure Checks passed

Implemented a full-stack real-time voice application featuring: Backend (Node.js + TypeScript): - WebSocket server for real-time bidirectional communication - Voice pipeline: Whisper (STT) → Claude with MCP → Eleven Labs (TTS) - MCP client manager with mem0 and Tavily integration - Session-based audio processing and conversation history - Comprehensive error handling and logging Frontend (Next.js + React): - Animated 3D orb with Three.js and React Three Fiber - Blue particle effects when user speaks - Pink particle effects when AI responds - Real-time chat UI with speech bubble transcriptions - WebSocket client with audio recording and playback - Responsive design with Tailwind CSS MCP Integration: - mem0 server for persistent conversation memory - Tavily server for web search capabilities - User ID: mem0-zai-crew, Org: daddyholmes-default-org - Automatic context retrieval and storage Key Features: - Real-time voice transcription with streaming updates - Context-aware responses using conversation history - Beautiful 3D animated orb visualization - Production-ready with Docker support - Comprehensive documentation (README + ARCHITECTURE) Tech Stack: - Backend: Node.js 20+, TypeScript, Express, ws - Frontend: Next.js 15, React 19, Three.js - AI: Claude Sonnet 4.5, OpenAI Whisper, Eleven Labs - Memory: mem0 Enterprise Cloud - MCP: @modelcontextprotocol/sdk File Structure: - voice-app/backend: Complete TypeScript backend - voice-app/frontend: Next.js frontend with 3D graphics - voice-app/ARCHITECTURE.md: Detailed system design - voice-app/README.md: Setup and usage guide - voice-app/docker-compose.yml: Docker configuration

This commit implements a complete dual-agent voice assistant system that allows users to toggle between two different AI voice providers: Backend Changes: - Add QWen 3 Omni real-time voice provider with WebSocket integration - Implement voice agent switching in WebSocket handlers - Update session management to track selected voice agent - Add support for SET_VOICE_AGENT and VOICE_AGENT_CHANGED messages - Route audio processing to appropriate agent based on selection Frontend Changes: - Create VoiceAgentToggle component for switching between agents - Update VoiceAssistant to display current agent and support switching - Extend useWebSocket hook with agent management capabilities - Add visual feedback for active agent (blue for ElevenLabs, purple for QWen) Configuration: - Add all required API keys to .env.example (ElevenLabs, QWen, Azure, Tavily) - Configure QWen Omni settings (model, voice, endpoint) - Set DEFAULT_VOICE_AGENT environment variable Features: - Real-time agent switching (when assistant is idle) - QWen 3 Omni emotional voice synthesis - Support for 10+ languages - Multiple voice options (Cherry, Ethan, Jennifer, Ryan, etc.) - Seamless WebSocket communication for both agents Documentation: - Add comprehensive SETUP.md with usage instructions - Document API rate limits and pricing - Include troubleshooting guide - Provide architecture diagrams The implementation follows the Model Context Protocol and integrates with existing MCP servers for extended capabilities.

CLAassistant · 2025-11-13T10:16:45Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

Copilot

Pull Request Overview

This PR introduces a comprehensive dual-agent AI voice assistant application with real-time WebSocket communication, 3D animated visualization, and MCP integration. The application supports two voice AI providers: ElevenLabs (with Claude AI and Whisper STT) and Qwen 3 Omni (Alibaba's multimodal voice AI).

Key Changes:

Full-stack voice application with Next.js frontend and Node.js backend
WebSocket-based real-time bidirectional audio streaming
Integration with Claude AI, Whisper, ElevenLabs, and Qwen Omni APIs
MCP (Model Context Protocol) support for mem0 memory and Tavily search
3D animated orb visualization using Three.js/React Three Fiber

Reviewed Changes

Copilot reviewed 39 out of 39 changed files in this pull request and generated 13 comments.

Show a summary per file

File	Description
voice-app/frontend/package.json	Frontend dependencies including Next.js 15, React 19, Three.js for 3D visualization
voice-app/frontend/tsconfig.json	TypeScript configuration for Next.js with bundler module resolution
voice-app/frontend/lib/types.ts	Type definitions for WebSocket messages, voice agents, and chat data
voice-app/frontend/hooks/useWebSocket.ts	Custom hook for WebSocket connection and message handling
voice-app/frontend/hooks/useAudio.ts	Audio recording and playback functionality using Web Audio API
voice-app/frontend/components/VoiceAssistant.tsx	Main component orchestrating voice interaction and UI state
voice-app/frontend/components/VoiceAgentToggle.tsx	UI component for switching between ElevenLabs and Qwen agents
voice-app/frontend/components/ChatUI.tsx	Chat interface displaying conversation history
voice-app/frontend/components/AnimatedOrb.tsx	3D visualization with particles that change color based on speaker
voice-app/backend/package.json	Backend dependencies including Anthropic SDK, OpenAI, MCP SDK
voice-app/backend/tsconfig.json	TypeScript configuration for Node.js backend
voice-app/backend/src/types/index.ts	Backend type definitions matching frontend types
voice-app/backend/src/websocket/server.ts	WebSocket server setup with session management and heartbeat
voice-app/backend/src/websocket/handlers.ts	Message routing and processing for audio chunks and agent switching
voice-app/backend/src/voice/stt.ts	OpenAI Whisper integration for speech-to-text
voice-app/backend/src/voice/tts.ts	ElevenLabs integration for text-to-speech synthesis
voice-app/backend/src/voice/qwen-omni.ts	Qwen 3 Omni real-time voice provider with WebSocket connection
voice-app/backend/src/voice/pipeline.ts	Voice processing pipeline orchestrating STT, LLM, and TTS
voice-app/backend/src/llm/claude.ts	Claude AI integration with MCP tools for memory and search
voice-app/backend/src/mcp/client.ts	MCP client manager for connecting to MCP servers
voice-app/backend/src/mcp/servers.ts	Configuration for mem0 and Tavily MCP servers
voice-app/docker-compose.yml	Docker Compose setup for running frontend and backend services
voice-app/SETUP.md	Comprehensive setup guide with environment variables and troubleshooting
voice-app/README.md	Project overview with quick start instructions
voice-app/ARCHITECTURE.md	Detailed architecture documentation with data flow diagrams

Comments suppressed due to low confidence (1)

voice-app/frontend/hooks/useWebSocket.ts:31

Unused variable audioQueueRef.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-11-13T10:24:03Z

voice-app/backend/src/websocket/server.ts

+  async start(): Promise<void> {
+    this.wss = new WSServer({
+      port: this.port,
+      maxPayload: Number(process.env.WS_MAX_PAYLOAD) || 10 * 1024 * 1024, // 10MB


Security concern: The WebSocket server accepts connections without any authentication or origin validation. In production, you should implement:

Origin validation to prevent unauthorized cross-origin connections

Authentication tokens or API keys

Rate limiting per connection

CORS restrictions

Consider adding a verifyClient callback to the WSServer configuration:

this.wss = new WSServer({ port: this.port, maxPayload: Number(process.env.WS_MAX_PAYLOAD) || 10 * 1024 * 1024, verifyClient: (info) => { // Verify origin, authentication token, etc. const origin = info.origin; // Add your validation logic here return true; // or false to reject } });

Suggested change

maxPayload: Number(process.env.WS_MAX_PAYLOAD) || 10 * 1024 * 1024, // 10MB

maxPayload: Number(process.env.WS_MAX_PAYLOAD) || 10 * 1024 * 1024, // 10MB

verifyClient: (info, done) => {

// Origin validation

const allowedOrigins = (process.env.WS_ALLOWED_ORIGINS || '').split(',').map(o => o.trim()).filter(o => o);

const origin = info.origin;

if (allowedOrigins.length > 0 && !allowedOrigins.includes(origin)) {

logger.warn(`WebSocket connection rejected due to invalid origin: ${origin}`);

return done(false, 403, 'Forbidden');

}

// Token validation (query param)

const url = require('url');

const parsedUrl = url.parse(info.req.url || '', true);

const token = parsedUrl.query && parsedUrl.query.token;

const expectedToken = process.env.WS_AUTH_TOKEN;

if (expectedToken && token !== expectedToken) {

logger.warn(`WebSocket connection rejected due to invalid token: ${token}`);

return done(false, 401, 'Unauthorized');

}

// Passed all checks

return done(true);

}

Copilot · 2025-11-13T10:24:03Z

voice-app/backend/src/websocket/handlers.ts

+    qwenProvider.on('response_text', (text: string) => {
+      fullResponse += text;
+      this.sendMessage(session.ws, WebSocketMessageType.AI_RESPONSE_TEXT, {
+        text: fullResponse,
+        messageId,
+      });
+    });
+
+    qwenProvider.on('audio_chunk', (audioChunk: Buffer) => {
+      this.sendMessage(session.ws, WebSocketMessageType.AI_RESPONSE_AUDIO, {
+        audio: audioChunk.toString('base64'),
+        messageId,
+        isLast: false,
+      });
+    });
+
+    qwenProvider.once('response_complete', () => {


Potential memory leak: Event listeners are registered on the qwenProvider using .on() for response_text and audio_chunk events, but these are never removed. If this method is called multiple times (e.g., user sends multiple messages), listeners will accumulate, causing memory leaks and potentially duplicate event handling.

Either:

Use .once() instead of .on() if you only expect one event per request

Remove the listeners after the response_complete event using .off() or .removeAllListeners()

Create a new provider instance per request instead of reusing the same one

Example fix:

const handlers = { responseText: (text: string) => { /* ... */ }, audioChunk: (audioChunk: Buffer) => { /* ... */ } }; qwenProvider.on('response_text', handlers.responseText); qwenProvider.on('audio_chunk', handlers.audioChunk); qwenProvider.once('response_complete', () => { // Clean up listeners qwenProvider.off('response_text', handlers.responseText); qwenProvider.off('audio_chunk', handlers.audioChunk); // ... rest of the code });

Suggested change

qwenProvider.on('response_text', (text: string) => {

fullResponse += text;

this.sendMessage(session.ws, WebSocketMessageType.AI_RESPONSE_TEXT, {

text: fullResponse,

messageId,

});

});

qwenProvider.on('audio_chunk', (audioChunk: Buffer) => {

this.sendMessage(session.ws, WebSocketMessageType.AI_RESPONSE_AUDIO, {

audio: audioChunk.toString('base64'),

messageId,

isLast: false,

});

});

qwenProvider.once('response_complete', () => {

// Store handler references for cleanup

const responseTextHandler = (text: string) => {

fullResponse += text;

this.sendMessage(session.ws, WebSocketMessageType.AI_RESPONSE_TEXT, {

text: fullResponse,

messageId,

});

};

const audioChunkHandler = (audioChunk: Buffer) => {

this.sendMessage(session.ws, WebSocketMessageType.AI_RESPONSE_AUDIO, {

audio: audioChunk.toString('base64'),

messageId,

isLast: false,

});

};

qwenProvider.on('response_text', responseTextHandler);

qwenProvider.on('audio_chunk', audioChunkHandler);

qwenProvider.once('response_complete', () => {

// Clean up listeners to prevent memory leaks

qwenProvider.off('response_text', responseTextHandler);

qwenProvider.off('audio_chunk', audioChunkHandler);

Copilot · 2025-11-13T10:24:04Z

voice-app/frontend/components/VoiceAssistant.tsx

+import { useAudioRecorder, useAudioPlayer } from '@/hooks/useAudio';
+import { ChatMessage, AssistantStatus, VoiceAgent } from '@/lib/types';
+import { Mic, MicOff } from 'lucide-react';
+import { cn } from '@/lib/utils';


The cn utility function is imported from @/lib/utils but this file is missing from the repository. You need to create voice-app/frontend/lib/utils.ts with the following implementation:

import { type ClassValue, clsx } from 'clsx'; import { twMerge } from 'tailwind-merge'; export function cn(...inputs: ClassValue[]) { return twMerge(clsx(inputs)); }

This utility is commonly used for merging Tailwind CSS classes and is required by VoiceAssistant.tsx, VoiceAgentToggle.tsx, and ChatBubble.tsx.

Copilot · 2025-11-13T10:24:04Z

voice-app/frontend/components/VoiceAssistant.tsx

+  }, [status, setVoiceAgent]);
+
+  const getAgentName = () => {
+    return currentAgent === 'elevenlabs' ? 'ElevenLabs' : 'QWen 3 Omni';


Inconsistent naming: The product name is "Qwen" (all lowercase except the first letter), not "QWen". Throughout the codebase and documentation, you should use the official spelling "Qwen 3 Omni" or "Qwen2.5-Omni" for consistency.

Copilot · 2025-11-13T10:24:04Z

voice-app/backend/src/voice/qwen-omni.ts

+        this.ws.on('error', (error) => {
+          logger.error('QWen Omni WebSocket error:', error);
+          this.emit('error', error);
+          reject(error);
+        });


Potential race condition: If the WebSocket connection fails immediately after creation but before the 'error' handler is registered, or if multiple errors occur rapidly, the promise could be rejected multiple times. Consider adding a flag to ensure the promise is only resolved/rejected once, or use a timeout to handle connection that never opens.

Copilot · 2025-11-13T10:24:06Z

voice-app/backend/src/websocket/handlers.ts

+      throw new Error('QWen Omni provider not initialized');
+    }
+
+    const messageId = require('uuid').v4();


Using require() for importing uuid is inconsistent with the ES module imports used elsewhere in the file. Change this to:

import { v4 as uuidv4 } from 'uuid';

And then use uuidv4() instead of require('uuid').v4(). Note that uuid is already imported at the top of the pipeline.ts file this way.

Copilot · 2025-11-13T10:24:06Z

voice-app/frontend/components/VoiceAgentToggle.tsx

+        )}
+      >
+        <Sparkles className="w-4 h-4" />
+        <span>QWen 3 Omni</span>


Inconsistent naming: The product name is "Qwen" (all lowercase except the first letter), not "QWen". Throughout the codebase and documentation, you should use the official spelling "Qwen 3 Omni" or "Qwen2.5-Omni" for consistency.

Suggested change

<span>QWen 3 Omni</span>

<span>Qwen 3 Omni</span>

Copilot · 2025-11-13T10:24:06Z

voice-app/backend/src/voice/qwen-omni.ts

+  private ws: WebSocket | null = null;
+  private isConnected = false;
+  private sessionId: string | null = null;
+  private audioQueue: Buffer[] = [];


The audioQueue field is declared but never used anywhere in this class. Consider removing it or implementing the intended audio queue functionality if it was meant to buffer audio chunks.

Suggested change

private audioQueue: Buffer[] = [];

Copilot · 2025-11-13T10:24:07Z

voice-app/frontend/components/VoiceAssistant.tsx

+          <button
+            onMouseDown={handleMouseDown}
+            onMouseUp={handleMouseUp}
+            onTouchStart={handleMouseDown}
+            onTouchEnd={handleMouseUp}
+            disabled={!isConnected || status === 'processing' || status === 'speaking'}
+            className={cn(
+              'relative group p-8 rounded-full transition-all duration-200',
+              'disabled:opacity-50 disabled:cursor-not-allowed',
+              isRecording
+                ? 'bg-blue-500 shadow-lg shadow-blue-500/50 scale-110'
+                : 'bg-gray-700 hover:bg-gray-600 hover:shadow-lg'
+            )}
+          >
+            {isRecording ? (
+              <Mic className="w-12 h-12 text-white" />
+            ) : (
+              <MicOff className="w-12 h-12 text-gray-300" />
+            )}
+
+            <div className="absolute -bottom-12 left-1/2 -translate-x-1/2 whitespace-nowrap text-sm text-gray-400">
+              {isRecording ? 'Release to send' : 'Hold to speak'}
+            </div>
+          </button>


The voice recording button is missing accessibility attributes. Add aria-label, aria-pressed, and role attributes to improve screen reader support:

<button onMouseDown={handleMouseDown} onMouseUp={handleMouseUp} onTouchStart={handleMouseDown} onTouchEnd={handleMouseUp} disabled={!isConnected || status === 'processing' || status === 'speaking'} aria-label={isRecording ? 'Recording - Release to send' : 'Hold to record voice message'} aria-pressed={isRecording} role="button" className={cn( // ... rest of classes )} >

Copilot · 2025-11-13T10:24:07Z

voice-app/backend/src/websocket/handlers.ts

+  WebSocketMessageType,
+  AudioChunkPayload,
+  SetVoiceAgentPayload,
+  VoiceAgent,


Unused import VoiceAgent.

Suggested change

VoiceAgent,

claude added 2 commits November 13, 2025 02:45

Copilot AI review requested due to automatic review settings November 13, 2025 10:16

Copilot started reviewing on behalf of bon-zai November 13, 2025 10:17 View session

Copilot finished reviewing on behalf of bon-zai November 13, 2025 10:21

Copilot AI reviewed Nov 13, 2025

View reviewed changes

-      maxPayload: Number(process.env.WS_MAX_PAYLOAD) || 10 * 1024 * 1024, // 10MB
+      maxPayload: Number(process.env.WS_MAX_PAYLOAD) || 10 * 1024 * 1024, // 10MB
+      verifyClient: (info, done) => {
+        // Origin validation
+        const allowedOrigins = (process.env.WS_ALLOWED_ORIGINS || '').split(',').map(o => o.trim()).filter(o => o);
+        const origin = info.origin;
+        if (allowedOrigins.length > 0 && !allowedOrigins.includes(origin)) {
+          logger.warn(`WebSocket connection rejected due to invalid origin: ${origin}`);
+          return done(false, 403, 'Forbidden');
+        }
+        // Token validation (query param)
+        const url = require('url');
+        const parsedUrl = url.parse(info.req.url || '', true);
+        const token = parsedUrl.query && parsedUrl.query.token;
+        const expectedToken = process.env.WS_AUTH_TOKEN;
+        if (expectedToken && token !== expectedToken) {
+          logger.warn(`WebSocket connection rejected due to invalid token: ${token}`);
+          return done(false, 401, 'Unauthorized');
+        }
+        // Passed all checks
+        return done(true);
+      }

Claude/voice application setup 011 cv57 d2by dc qc yu puw fye1 #3747

Are you sure you want to change the base?

Claude/voice application setup 011 cv57 d2by dc qc yu puw fye1 #3747

Uh oh!

Conversation

bon-zai commented Nov 13, 2025

Description

Type of change

How Has This Been Tested?

Checklist:

Maintainer Checklist

Uh oh!

CLAassistant commented Nov 13, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants