-
Notifications
You must be signed in to change notification settings - Fork 86
feat: optimize local audio loopback and chat client config & Add client_interrupt support #254
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Warning Rate limit exceeded@jackshen310 has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 0 minutes and 58 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📒 Files selected for processing (5)
WalkthroughThis update introduces "press-to-talk" (client_interrupt) mode for real-time WebSocket chat, adds form and config support for conversation mode selection, and implements local audio loopback enhancements. The changes include UI updates, SDK API adjustments, new and updated types, improved resource cleanup, and related documentation and test updates. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant UI (Chat Page)
participant WsChatClient
participant WavStreamPlayer
participant LocalLoopback
User->>UI (Chat Page): Presses and holds Voice Button
UI (Chat Page)->>WsChatClient: startRecord()
WsChatClient->>WavStreamPlayer: (if client_interrupt) interrupt()
WsChatClient->>Recorder: getRawMediaStream()
WsChatClient->>WavStreamPlayer: setMediaStream()
WsChatClient->>LocalLoopback: connect() / start()
note right of LocalLoopback: Local audio loopback enabled
User->>UI (Chat Page): Releases Voice Button or after 60s
UI (Chat Page)->>WsChatClient: stopRecord()
WsChatClient->>Recorder: destroy()
WsChatClient->>WebSocket: send input_audio_completed
User->>UI (Chat Page): Changes conversation mode in settings
UI (Chat Page)->>Config: setTurnDetection(mode)
UI (Chat Page)->>WsChatClient: Update config and re-init as needed
Suggested reviewers
Poem
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
|
shenxiaojie.316 seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. You have signed the CLA already but the status is still pending? Let us recheck it. |
bb4480e to
c22c496
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 9
🔭 Outside diff range comments (1)
examples/realtime-websocket/src/components/settings/index.tsx (1)
9-27:⚠️ Potential issueFix duplicate interface declaration
The
FormConfiginterface is declared twice, which will cause a TypeScript error. Instead of redefining the interface, use TypeScript's interface merging feature by removing the first declaration.Apply this diff to fix the duplicate interface declaration:
-// 定义配置项接口 -interface FormConfig { - name: string; - label: string; - required?: boolean; - message?: string; -} - // 定义配置项类型 type FormConfigType = 'input' | 'select'; -// 扩展配置项接口 +// 定义配置项接口 interface FormConfig { name: string; label: string; required?: boolean; message?: string; type?: FormConfigType; options?: { label: string; value: string }[]; }
🧹 Nitpick comments (9)
common/changes/@coze/uniapp-api/feat-realtime-websocket_2025-05-29-12-14.json (1)
5-5: Clarify the error description in the comment field.The comment "package.json error" is too vague and doesn't provide meaningful information about what specific error was fixed. Consider providing more descriptive details about the actual issue that was resolved.
- "comment": "package.json error", + "comment": "Fix package.json exports configuration error",common/changes/@coze/uniapp-api/feat-realtime-websocket_2025-05-29-12-31.json (1)
5-5: Clarify the error description and consider consolidating related patches.Similar to the other changelog file, the comment "package.json error" is too vague. Additionally, having multiple patch entries with the same generic error message suggests these fixes might be related and could potentially be consolidated.
Consider providing more specific error descriptions and evaluate if these separate patch entries are necessary or if they should be consolidated into a single patch.
examples/realtime-console/src/pages/main/index.tsx (1)
173-173: Clean up commented code or restore functionality.The commented
handleMessage(eventName)call should either be removed entirely if no longer needed, or uncommented if the functionality is still required.If this interrupt message handling is no longer needed due to refactoring, remove the commented line entirely:
- // handleMessage(eventName);If the functionality is still needed, ensure it's properly implemented through the new interrupt handling mechanism.
packages/coze-uniapp/src/ws-tool/chat/index.ts (1)
188-189: Update error message to reflect new mode name.The error message still references "Client VAD mode" but should be updated to reflect the new
client_interruptmode for clarity.- if (this.turnDetection === 'client_interrupt') { - throw new Error('Client VAD mode does not support setAudioEnable'); + if (this.turnDetection === 'client_interrupt') { + throw new Error('client_interrupt mode does not support setAudioEnable');packages/coze-js/src/ws-tools/wavtools/lib/wav_stream_player.ts (1)
287-314: Well-structured cleanup method with room for error handling.The
destroy()method properly releases all resources in the correct order. Consider adding try-catch blocks to ensure cleanup continues even if individual operations fail.async destroy(): Promise<void> { // Stop any audio that's playing if (this.streamNode) { - this.streamNode.disconnect(); - this.streamNode = null; + try { + this.streamNode.disconnect(); + } catch (error) { + console.warn('Error disconnecting streamNode:', error); + } finally { + this.streamNode = null; + } } // Clean up local lookback if (this.localLookback) { - this.localLookback.cleanup(); - this.localLookback = undefined; + try { + this.localLookback.cleanup(); + } catch (error) { + console.warn('Error cleaning up localLookback:', error); + } finally { + this.localLookback = undefined; + } } // Close audio context if (this.context) { - await this.context.close(); - this.context = null; + try { + await this.context.close(); + } catch (error) { + console.warn('Error closing audio context:', error); + } finally { + this.context = null; + } } // Reset all state this.trackSampleOffsets = {}; this.interruptedTrackIds = {}; this.isPaused = false; }examples/realtime-websocket/src/pages/chat/receive-message.tsx (1)
135-163: Nice UI improvements for message display.The scrollable container and distinct message styling enhance the user experience. Consider adding auto-scroll to bottom when new messages arrive.
Would you like me to provide code to automatically scroll to the bottom when new messages are added? This is a common UX pattern for chat applications.
packages/coze-js/src/ws-tools/chat/index.ts (2)
23-26: Consider improving mobile detection logicThe current implementation only checks window width, which may not accurately detect all mobile devices. Tablets can have widths > 768px but should still be considered mobile devices for audio handling purposes.
Consider using a more comprehensive mobile detection approach:
function isMobileView() { - return window.innerWidth <= 768; // 通常移动端宽度 <= 768px + // Check for touch capability and user agent for better mobile detection + const isTouchDevice = 'ontouchstart' in window || navigator.maxTouchPoints > 0; + const isMobileUserAgent = /Android|webOS|iPhone|iPad|iPod|BlackBerry|IEMobile|Opera Mini/i.test(navigator.userAgent); + const isSmallScreen = window.innerWidth <= 768; + + return isTouchDevice || isMobileUserAgent || isSmallScreen; }
38-41: Remove unnecessary optional chaining on wavStreamPlayerThe
wavStreamPlayeris always initialized in the constructor, so the optional chaining (?.) is redundant throughout the code.Apply this diff to remove unnecessary optional chaining:
- this.wavStreamPlayer?.setSampleRate( + this.wavStreamPlayer.setSampleRate( event.data?.output_audio?.pcm_config?.sample_rate || 24000, ); - this.wavStreamPlayer?.setDefaultFormat( + this.wavStreamPlayer.setDefaultFormat( (event.data?.output_audio?.codec as AudioFormat) || 'pcm', );- await this.wavStreamPlayer?.destroy(); + await this.wavStreamPlayer.destroy();Also applies to: 170-175, 195-195
packages/coze-js/src/ws-tools/wavtools/lib/local-lookback.ts (1)
419-422: Remove redundant type annotationTypeScript can infer the type from the
eventListenersarray definition.- this.eventListeners.forEach(({ element, event, handler }: { element: EventTarget; event: string; handler: EventListener }) => { + this.eventListeners.forEach(({ element, event, handler }) => { element.removeEventListener(event, handler); });
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
common/config/subspaces/default/pnpm-lock.yamlis excluded by!**/pnpm-lock.yaml
📒 Files selected for processing (28)
common/changes/@coze/api/feat-realtime-websocket_2025-05-29-10-15.json(1 hunks)common/changes/@coze/api/feat-realtime-websocket_2025-05-29-13-59.json(1 hunks)common/changes/@coze/uniapp-api/feat-realtime-websocket_2025-05-29-10-15.json(1 hunks)common/changes/@coze/uniapp-api/feat-realtime-websocket_2025-05-29-12-14.json(1 hunks)common/changes/@coze/uniapp-api/feat-realtime-websocket_2025-05-29-12-31.json(1 hunks)common/changes/@coze/uniapp-api/feat-realtime-websocket_2025-05-29-13-59.json(1 hunks)examples/coze-js-uniapp/package.json(1 hunks)examples/realtime-console/src/pages/main/index.tsx(2 hunks)examples/realtime-console/src/pages/main/setting-form.tsx(1 hunks)examples/realtime-websocket/src/components/settings/index.tsx(5 hunks)examples/realtime-websocket/src/pages/chat/index.css(1 hunks)examples/realtime-websocket/src/pages/chat/index.tsx(11 hunks)examples/realtime-websocket/src/pages/chat/operation.tsx(1 hunks)examples/realtime-websocket/src/pages/chat/receive-message.tsx(4 hunks)examples/realtime-websocket/src/utils/config.ts(1 hunks)packages/coze-js/README.md(1 hunks)packages/coze-js/README.zh-CN.md(1 hunks)packages/coze-js/package.json(1 hunks)packages/coze-js/src/resources/websockets/types.ts(2 hunks)packages/coze-js/src/ws-tools/chat/base.ts(4 hunks)packages/coze-js/src/ws-tools/chat/index.ts(7 hunks)packages/coze-js/src/ws-tools/recorder/pcm-recorder.ts(4 hunks)packages/coze-js/src/ws-tools/types.ts(1 hunks)packages/coze-js/src/ws-tools/wavtools/lib/local-lookback.ts(9 hunks)packages/coze-js/src/ws-tools/wavtools/lib/wav_stream_player.ts(12 hunks)packages/coze-uniapp/README.md(2 hunks)packages/coze-uniapp/package.json(3 hunks)packages/coze-uniapp/src/ws-tool/chat/index.ts(5 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (2)
packages/coze-js/src/ws-tools/chat/base.ts (1)
packages/coze-js/src/ws-tools/wavtools/lib/wav_stream_player.ts (1)
WavStreamPlayer(14-315)
packages/coze-uniapp/src/ws-tool/chat/index.ts (1)
packages/coze-js/src/resources/websockets/types.ts (1)
TurnDetectionType(518-518)
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: Node.js v20 (ubuntu-latest)
- GitHub Check: Analyze (javascript-typescript)
🔇 Additional comments (40)
common/changes/@coze/uniapp-api/feat-realtime-websocket_2025-05-29-10-15.json (1)
1-11: Changelog metadata is correct.This JSON file appropriately documents the minor update for
@coze/uniapp-apiwith the intended comment and author email. No issues found.common/changes/@coze/api/feat-realtime-websocket_2025-05-29-10-15.json (1)
1-11: Changelog metadata is correct.This JSON file accurately captures the minor update for
@coze/apioptimizing local audio loopback and chat client config. No issues detected.packages/coze-js/README.md (1)
292-297: Ensure interrupt call is awaited.The added line
await client.interrupt();correctly reflects the asynchronous nature of the newclient_interruptmode. This aligns with the SDK changes requiringinterrupt()to return a promise.packages/coze-js/package.json (1)
1-4: Version bump is consistent with new features.The package version has been updated from
1.2.1-beta.15to1.2.1-beta.17, matching the released changelog entries for both optimization andclient_interruptsupport.common/changes/@coze/api/feat-realtime-websocket_2025-05-29-13-59.json (1)
1-11: Changelog metadata is correct.This JSON correctly documents the minor addition of
client_interruptsupport for@coze/api. No further changes needed.common/changes/@coze/uniapp-api/feat-realtime-websocket_2025-05-29-13-59.json (1)
1-11: LGTM! Properly documents the client_interrupt feature addition.The changelog entry correctly classifies the client_interrupt support as a "minor" version change, which is appropriate for a new feature addition. The comment is clear and aligns with the PR objectives.
examples/realtime-websocket/src/utils/config.ts (1)
11-12: LGTM! Clean implementation following established patterns.The new
getTurnDetectionmethod is consistent with the existing configuration pattern and provides a sensible default value of'server_vad'. This supports the new turn detection modes mentioned in the PR objectives.examples/coze-js-uniapp/package.json (1)
58-59: LGTM! Appropriate dependencies for WebSocket functionality.The addition of
axiosandreconnecting-websocketdependencies aligns well with the PR's focus on enhancing WebSocket communication and client interrupt support. Both are well-maintained packages with recent versions.packages/coze-uniapp/README.md (2)
12-14: LGTM! Consistent dependency documentation.The installation instructions correctly include the new
axiosandreconnecting-websocketdependencies that were added to the package.json, maintaining consistency between code and documentation.
328-329: LGTM! Important setup instructions added.The addition of the
npm run run-preinstallstep with emphasis on running it first is valuable for users. This ensures proper project setup before running the development servers.examples/realtime-websocket/src/pages/chat/operation.tsx (1)
43-45: Excellent improvement for resource cleanup.Making
handleDisconnectasync and awaiting thedisconnect()call ensures proper cleanup of audio resources (includingwavStreamPlayer) before clearing the client reference. This prevents potential resource leaks and ensures orderly shutdown of WebSocket and audio components.packages/coze-uniapp/package.json (3)
3-3: Version bump looks appropriate.The version increment from
0.2.1-beta.4to0.2.1-beta.5follows semantic versioning for beta releases and aligns with the new features being introduced.
47-47: Confirm removal of Chinese README is intentional.The Chinese README file has been removed from the package distribution. Please verify this is intentional and not an oversight.
75-79: Well-structured exports configuration.The addition of the
exportsfield properly maps the root export andws-toolssubpath to their corresponding built files, improving module resolution for consumers of this package.packages/coze-js/src/ws-tools/chat/base.ts (3)
3-3: Good optimization with type-only import.Converting to a type-only import for
WavStreamPlayerreduces bundle size when the class is not instantiated, which aligns with makingwavStreamPlayeroptional.
25-25: Well-designed optional property pattern.Making
wavStreamPlayeroptional allows for conditional instantiation based on the turn detection mode, improving resource efficiency when audio playback is not needed.
204-204: Proper safe access with optional chaining.The use of optional chaining (
?.) forwavStreamPlayermethod calls prevents runtime errors when the player is not instantiated, ensuring robust operation across different usage modes.Also applies to: 228-228
packages/coze-js/src/resources/websockets/types.ts (1)
224-226: LGTM! Good addition of flexible parameters support.The addition of the optional
parametersproperty toChatConfigprovides good flexibility for custom input parameters in chat flows. The use ofRecord<string, any>with proper ESLint disable comment is appropriate for this use case.packages/coze-uniapp/src/ws-tool/chat/index.ts (3)
1-5: LGTM! Proper import of new type.The import statement correctly includes the new
TurnDetectionTypefrom the API package.
21-21: LGTM! Correct type usage.The
turnDetectionproperty correctly uses the newTurnDetectionTypeinstead of the hardcoded string union.
48-48: LGTM! Consistent logic updates for new turn detection mode.The conditions correctly check for
'client_interrupt'instead of the previous'client_vad'mode, maintaining consistent behavior with the new turn detection type.Also applies to: 158-158, 188-188
packages/coze-js/src/ws-tools/recorder/pcm-recorder.ts (3)
180-182: LGTM! Simplified and more focused validation.The simplified validation in the
record()method correctly focuses on checking foraudioTrackpresence, which is the essential requirement for recording functionality.
357-368: LGTM! Good implementation of on-demand stream creation.The
getRawMediaStream()method correctly creates a freshMediaStreamfrom the currentaudioTrackon demand. This approach provides better resource management compared to maintaining a persistent stream reference.The method properly:
- Checks for
audioTrackexistence- Returns
undefinedwhen no track is available- Creates a new
MediaStreamwith the raw track
288-304: LGTM! Well-structured cleanup process.The updated comments in the
destroy()method clearly document the cleanup steps, making the resource management process more transparent and maintainable.examples/realtime-websocket/src/pages/chat/index.css (1)
1-93: LGTM! Comprehensive and well-structured voice recording UI styles.The CSS implementation provides excellent styling for the voice recording interface with several strong points:
Strengths:
- Clear visual hierarchy with distinct states (normal, recording, muted)
- Smooth transitions and animations enhance user experience
- Proper accessibility considerations (cursor pointer, user-select: none)
- Consistent color scheme and spacing
- Well-organized structure with logical grouping
Good practices implemented:
- Semantic class naming (
.voice-button,.recording-status, etc.)- Responsive design considerations (percentage-based widths)
- Visual feedback for different states (active, inactive, muted)
- Smooth animations with appropriate timing
The implementation effectively supports the new "press-to-talk" functionality introduced in the chat interface.
packages/coze-js/src/ws-tools/wavtools/lib/wav_stream_player.ts (3)
18-18: LGTM! Property rename improves code clarity.The rename from
streamtostreamNodebetter describes that this property holds anAudioWorkletNodeinstance. The addition oflocalLookbackStreamprovides a clear separation of concerns.Also applies to: 32-32, 47-47
136-141: Proper handling of stop event for both scenarios.The logic correctly handles stopping local lookback when it exists, and disconnects the stream node in the standard case. Good separation of concerns.
256-256: Simplified stream management.The direct assignment approach is cleaner and aligns with the new connection flow where the stream is passed during the
connect()call.examples/realtime-websocket/src/pages/chat/receive-message.tsx (2)
15-21: Good structured data model for chat messages.The
ChatMessageinterface provides better type safety and allows for richer message metadata. The migration from string array to structured messages is a solid improvement.Also applies to: 27-27
66-75: Proper handling of user transcript completion event.The implementation correctly extracts the transcript content and creates a user message with appropriate metadata.
examples/realtime-websocket/src/pages/chat/index.tsx (6)
74-84: Well-structured state management for press-to-talk feature.The state variables properly track all aspects of the recording interaction including duration, cancellation, and touch position.
37-57: Clean configuration helper function.Good abstraction that makes the configuration flexible based on the turn detection mode.
314-340: Well-implemented recording completion with validation.Good UX consideration with the 1-second minimum duration check and user feedback via success message.
234-273: Comprehensive cross-device event handling.Excellent implementation supporting both mouse and touch events with proper gesture detection for cancellation. The 50px threshold provides good UX.
367-381: Proper resource cleanup implementation.Both cleanup effects correctly handle their resources - clearing the timer and disconnecting the client to prevent memory leaks.
475-518:❓ Verification inconclusive
Rich UI feedback for recording interaction.
The implementation provides excellent visual feedback with duration display, progress bar, and contextual tips. Ensure all CSS classes (
voice-button,recording,recording-status, etc.) are properly defined in the imported./index.cssfile.
🏁 Script executed:
#!/bin/bash # Description: Verify that the CSS classes used in the voice button UI are defined # Check if index.css exists and contains the required classes fd "index.css" --exec grep -l "voice-button\|recording-status\|recording-progress" {} \;Length of output: 146
#!/bin/bash # Verify that all expected CSS classes are defined in index.css css_file="examples/realtime-websocket/src/pages/chat/index.css" classes=( "voice-button" "recording" "recording-status" "recording-time" "recording-progress-container" "recording-progress" "recording-tip" "cancel-tip" ) echo "Checking CSS definitions in $css_file" for cls in "${classes[@]}"; do if grep -q "\.${cls}" "$css_file"; then echo "✔ Found .$cls" else echo "✘ Missing .$cls" fi done
#!/bin/bash # Verify that all expected CSS classes are defined in index.css css_file="examples/realtime-websocket/src/pages/chat/index.css" classes=( "voice-button" "recording" "recording-status" "recording-time" "recording-progress-container" "recording-progress" "recording-tip" "cancel-tip" ) echo "Checking CSS definitions in $css_file" for cls in "${classes[@]}"; do if grep -q "\.${cls}" "$css_file"; then echo "✔ Found .$cls" else echo "✘ Missing .$cls" fi done
Verify CSS class definitions for voice recording UI
Please confirm that the following CSS classes are defined in
examples/realtime-websocket/src/pages/chat/index.css:
- .voice-button
- .recording
- .recording-status
- .recording-time
- .recording-progress-container
- .recording-progress
- .recording-tip
- .cancel-tip
If any are missing, add the corresponding rules to ensure the UI renders as intended.
examples/realtime-websocket/src/components/settings/index.tsx (1)
67-77: LGTM! Well-implemented conversation mode selectionThe implementation correctly adds support for selecting between server-side VAD (自由对话模式) and client-side interrupt (按键说话模式) modes. The conditional rendering of Select vs Input components and the default value handling are properly implemented.
Also applies to: 96-106, 133-133, 141-141
packages/coze-js/src/ws-tools/chat/index.ts (1)
54-134: Well-implemented client_interrupt mode supportThe implementation correctly handles the press-to-talk functionality:
- Guards against duplicate recording sessions
- Properly interrupts playback when starting recording in client_interrupt mode
- Provides clear error messaging when incompatible methods are called
- Correctly manages recording lifecycle with start/stop methods
Also applies to: 209-211
packages/coze-js/src/ws-tools/wavtools/lib/local-lookback.ts (2)
379-386: Reconsider stopping shared media stream tracksThe
mediaStreamis provided externally through theconnect()method and might be used by other components. Stopping its tracks in cleanup could cause issues for other parts of the application.Consider removing the media stream cleanup or adding a parameter to control this behavior:
- // Cleanup media stream - if (this.mediaStream) { - // Stop all tracks in the media stream - this.mediaStream.getTracks().forEach(track => { - track.stop(); - }); - this.mediaStream = undefined; - } + // Clear reference to media stream without stopping tracks + // as it might be used by other components + this.mediaStream = undefined;Alternatively, add a parameter to control whether to stop the tracks:
- cleanup(): void { + cleanup(stopMediaStream: boolean = false): void { this._debug('Cleaning up LocalLookback resources'); // ... other cleanup code ... - // Cleanup media stream - if (this.mediaStream) { + // Cleanup media stream if requested + if (this.mediaStream && stopMediaStream) { // Stop all tracks in the media stream this.mediaStream.getTracks().forEach(track => { track.stop(); }); - this.mediaStream = undefined; } + this.mediaStream = undefined;
4-8: Excellent lifecycle management implementationThe additions provide comprehensive lifecycle management for the local audio loopback:
- Clear separation of concerns with
connect(),start(),stop(), andcleanup()methods- Proper state checking and error handling
- Async/await properly used for SDP handling
- Thorough resource cleanup (with the exceptions noted above)
Also applies to: 89-138, 188-207, 345-425
packages/coze-js/src/ws-tools/wavtools/lib/wav_stream_player.ts
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
common/changes/@coze/api/feat-realtime-websocket_2025-06-02-01-01.json(1 hunks)packages/coze-js/src/ws-tools/chat/index.ts(7 hunks)packages/coze-js/src/ws-tools/wavtools/index.ts(1 hunks)packages/coze-js/test/ws-tools/chat.spec.ts(4 hunks)packages/coze-js/test/ws-tools/recorder/pcm-recorder.spec.ts(3 hunks)
✅ Files skipped from review due to trivial changes (2)
- common/changes/@coze/api/feat-realtime-websocket_2025-06-02-01-01.json
- packages/coze-js/src/ws-tools/wavtools/index.ts
🚧 Files skipped from review as they are similar to previous changes (1)
- packages/coze-js/src/ws-tools/chat/index.ts
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: Node.js v20 (ubuntu-latest)
- GitHub Check: Analyze (javascript-typescript)
🔇 Additional comments (8)
packages/coze-js/test/ws-tools/chat.spec.ts (2)
107-107: LGTM! Environment detection mock added.The
isBrowserExtensionmock returningfalseis appropriate for test environments and aligns with the new environment detection utility mentioned in the AI summary.
40-40:✅ Verification successful
Verify the API change is consistent across the codebase.
The mock has been updated from
getMediaStreamtogetRawMediaStream, which aligns with the AI summary indicating this change was made in the actualPcmRecorderimplementation.Please verify that all usages of the old
getMediaStreammethod have been updated consistently:
🏁 Script executed:
#!/bin/bash # Description: Verify that getMediaStream has been completely replaced with getRawMediaStream # Expected: No remaining references to getMediaStream in PcmRecorder-related code echo "Searching for any remaining getMediaStream references..." rg "getMediaStream" --type ts --type js -A 2 -B 2 echo "Verifying getRawMediaStream is used consistently..." rg "getRawMediaStream" --type ts --type js -A 2 -B 2Length of output: 6726
API rename verified: no remaining
getMediaStream()referencesI ran a targeted search for the exact
getMediaStream()invocation (usingrg "getMediaStream\(") and found zero matches in PcmRecorder-related code. All production code and tests now:
- Define and mock
getRawMediaStream()in
packages/coze-js/src/ws-tools/recorder/pcm-recorder.tspackages/coze-js/test/ws-tools/chat.spec.ts- Invoke
getRawMediaStream()in
packages/coze-js/src/ws-tools/chat/index.ts- Assert against
getRawMediaStream()in recorder testsThe only remaining
getMediaStreamTrackoccurrences are on theaudioTrackobject (part of the MediaStreamTrack API), not the renamed method. Everything looks consistent.packages/coze-js/test/ws-tools/recorder/pcm-recorder.spec.ts (6)
47-64: LGTM! Improved mock formatting and consistency.The
AudioContextmock has been reformatted with consistent arrow functions, proper indentation, and trailing commas. This improves code readability while maintaining the same functionality.
103-111: LGTM! Consistent mock implementation pattern.The mock for
pcm-audio-processorhas been updated with consistent arrow function syntax and trailing commas, improving code consistency across the test file.
116-124: LGTM! Consistent mock implementation pattern.The
OpusAudioProcessormock follows the same improved formatting pattern as other mocks in the file.
129-137: LGTM! Consistent mock implementation pattern.The
WavAudioProcessormock maintains consistency with the formatting improvements applied to other processor mocks.
140-152: LGTM! Consistent mock implementation pattern.The
pcm-worklet-processormock has been reformatted for consistency while preserving functionality.
331-331: LGTM! Formatting improvements throughout test cases.The various formatting improvements including trailing commas, consistent spacing, and better line breaks enhance code readability without affecting test functionality.
Also applies to: 337-337, 340-340, 343-343, 345-345, 350-350, 353-353, 357-357, 361-361, 365-365, 368-368, 372-372, 376-376, 379-379, 382-382, 385-385, 388-388, 391-391, 393-393, 397-397, 404-404, 408-408, 413-413, 415-418, 419-419, 425-425
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (1)
packages/coze-js/src/ws-tools/wavtools/lib/wav_stream_player.ts (1)
66-68: The undefinedlocalLoopbackStreamissue from the previous review remains unaddressed.The
connect()method still passesthis.localLoopbackStreamtolocalLoopback.connect()without checking if it's defined. IfsetMediaStream()hasn't been called beforeconnect()is invoked (which happens whenadd16BitPCM()or other methods trigger_start()), this will passundefinedto the loopback connection.The previous review comment's suggested fix should still be implemented:
if(this.enableLocalLoopback) { + if (!this.localLoopbackStream) { + throw new Error( + 'Local loopback enabled but no MediaStream provided. Call setMediaStream() before playback.' + ); + } await this.localLoopback?.connect(this.context, this.localLoopbackStream); }
🧹 Nitpick comments (5)
packages/coze-js/src/ws-tools/chat/index.ts (3)
20-23: Consider translating the comment to English for consistency.function isMobileView() { - return window.innerWidth <= 768; // 通常移动端宽度 <= 768px + return window.innerWidth <= 768; // Mobile viewport width is typically <= 768px }
51-71: Translate comments and error messages to English for consistency.The implementation logic is correct, but consider translating Chinese text for better international collaboration.
async startRecord() { if (this.recorder.getStatus() === 'recording') { console.warn('Recorder is already recording'); return; } - // 如果是客户端判停,需要先取消当前的播放 + // If in client interrupt mode, cancel current playback first if (this.turnDetection === 'client_interrupt') { this.interrupt(); } // 1. start recorder await this.recorder.start(this.inputAudioCodec); - // 获取原始麦克风输入用于本地回环 + // Get raw microphone input for local loopback const rawMediaStream = this.recorder.getRawMediaStream(); if (!rawMediaStream) { - throw new Error('无法获取原始麦克风输入'); + throw new Error('Failed to get raw microphone input stream'); } this.wavStreamPlayer?.setMediaStream(rawMediaStream);
121-132: Consider using uuid for consistency with other WebSocket messages.The implementation is correct, but for consistency with other WebSocket messages in the codebase, consider using uuid instead of Date.now().
stopRecord() { if (this.recorder.getStatus() !== 'recording') { console.warn('Recorder is not recording'); return; } this.recorder.destroy(); this.ws?.send({ - id: Date.now().toString(), + id: uuid(), event_type: WebsocketsEventType.INPUT_AUDIO_BUFFER_COMPLETE, }); }packages/coze-js/src/ws-tools/wavtools/lib/local-loopback.ts (2)
4-9: Translate documentation to English and consider ICE server configuration.The lifecycle management documentation is helpful but should be in English. Also, the empty ICE servers array might cause connectivity issues in certain network environments.
* Local audio loopback implementation using WebRTC peer connections * to create a local audio communication channel. - * 完整的音频回环生命周期管理: - * connect() - 建立初始连接 - * start() - 开始音频回环 - * stop() - 暂停音频回环 - * cleanup() - 完全清理所有资源 + * Complete audio loopback lifecycle management: + * connect() - Establish initial connection + * start() - Start audio loopback + * stop() - Pause audio loopback + * cleanup() - Clean up all resources completely */Consider adding default STUN servers for better connectivity across different network configurations:
async connect(context: AudioContext, stream?: MediaStream) { const servers = { - iceServers: [], + iceServers: [ + { urls: 'stun:stun.l.google.com:19302' } + ], iceCandidatePoolSize: 1 } as RTCConfiguration;Also applies to: 50-54
346-425: Excellent cleanup implementation! Consider translating inline comments.The cleanup method is comprehensive and properly releases all resources to prevent memory leaks. Great attention to detail in stopping tracks, removing listeners, and cleaning up DOM elements.
Translate the Chinese comments for consistency:
if (this.pc1) { - // 1. 关闭所有轨道(摄像头/麦克风) + // 1. Close all tracks (camera/microphone) this.pc1.getSenders().forEach(sender => { - if (sender.track) sender.track.stop(); // 停止媒体轨道 + if (sender.track) sender.track.stop(); // Stop media track }); - // 2. 移除所有事件监听器(避免内存泄漏) + // 2. Remove all event listeners (avoid memory leaks) this.pc1.onicecandidate = null;
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
common/changes/@coze/api/feat-realtime-websocket_2025-06-02-01-19.json(1 hunks)packages/coze-js/src/ws-tools/chat/index.ts(7 hunks)packages/coze-js/src/ws-tools/types.ts(1 hunks)packages/coze-js/src/ws-tools/wavtools/lib/local-loopback.ts(8 hunks)packages/coze-js/src/ws-tools/wavtools/lib/wav_stream_player.ts(11 hunks)packages/coze-js/test/ws-tools/chat.spec.ts(4 hunks)packages/coze-js/test/ws-tools/speech.spec.ts(6 hunks)
✅ Files skipped from review due to trivial changes (1)
- common/changes/@coze/api/feat-realtime-websocket_2025-06-02-01-19.json
🚧 Files skipped from review as they are similar to previous changes (2)
- packages/coze-js/test/ws-tools/chat.spec.ts
- packages/coze-js/src/ws-tools/types.ts
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: Analyze (javascript-typescript)
- GitHub Check: Node.js v20 (ubuntu-latest)
🔇 Additional comments (7)
packages/coze-js/test/ws-tools/speech.spec.ts (1)
265-267: LGTM! Good formatting improvements and typo fix.The changes improve code readability by:
- Formatting long base64 strings across multiple lines
- Fixing the typo from "Lookback" to "Loopback"
- Adding the
setMediaStreammethod to align with the implementation changesAlso applies to: 294-296, 308-308, 313-315, 338-340, 354-356, 452-454, 472-474
packages/coze-js/src/ws-tools/wavtools/lib/local-loopback.ts (1)
187-207: Good conversion to async/await pattern.The conversion of
gotDescription1andgotDescription2to async methods with proper await usage improves code readability and error handling.packages/coze-js/src/ws-tools/wavtools/lib/wav_stream_player.ts (5)
136-142: Improved resource cleanup logic in message handler.The updated message handler now properly delegates stopping to the
localLoopbackwhen available, which provides better encapsulation than directly disconnecting the node. This ensures the loopback can handle its own cleanup logic.
150-156: Consistent loopback integration in start logic.The start method correctly uses the new
localLoopback.start()method when loopback is enabled, maintaining proper separation of concerns between the stream player and loopback functionality.
18-18: Consistent property rename fromstreamtostreamNode.The rename from
streamtostreamNodethroughout the class provides better clarity about the property's actual type (AudioWorkletNode). All references have been consistently updated.Also applies to: 47-47, 118-118, 141-141, 156-156, 174-174, 206-206, 220-220, 224-224
287-314: Excellent addition of resource cleanup method.The new
destroy()method provides comprehensive cleanup of all resources:
- Properly disconnects and nullifies the audio worklet node
- Delegates loopback cleanup to the
localLoopback.cleanup()method- Closes the audio context asynchronously
- Resets all internal state
This follows best practices for resource management and prevents memory leaks.
255-257: Simplified media stream assignment.The
setMediaStream()method now directly assigns the stream tolocalLoopbackStreaminstead of calling a method on the loopback instance. This is cleaner and allows the loopback to receive the stream during itsconnect()call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (6)
examples/realtime-websocket/src/pages/chat/index.tsx (6)
37-57: Consider adding proper TypeScript typing for better type safety.The
getChatUpdateConfigfunction parameter lacks specific typing, and the return type could be more explicit.-const getChatUpdateConfig = (turnDetectionType: string) => ({ +const getChatUpdateConfig = (turnDetectionType: 'server_vad' | 'client_interrupt') => ({ data: { input_audio: { format: 'pcm', codec: 'pcm', sample_rate: 44100, }, output_audio: { codec: 'pcm', pcm_config: { sample_rate: 24000, }, voice_id: config.getVoiceId(), }, turn_detection: { type: turnDetectionType, }, need_play_prologue: true, }, -}); +} as const);
74-85: Extract magic numbers to named constants for better maintainability.The hardcoded values for recording duration and slide threshold should be extracted to named constants.
+// Recording configuration constants +const MAX_RECORDING_TIME_SECONDS = 60; +const MIN_RECORDING_TIME_SECONDS = 1; +const CANCEL_SLIDE_THRESHOLD_PX = 50; // 按键说话状态 const [isPressRecording, setIsPressRecording] = useState(false); const [recordingDuration, setRecordingDuration] = useState(0); const recordTimer = useRef<NodeJS.Timeout | null>(null); -const maxRecordingTime = 60; // 最大录音时长(秒) const [isCancelRecording, setIsCancelRecording] = useState(false); const startTouchY = useRef<number>(0); const [isMuted, setIsMuted] = useState(false);
234-273: Improve event handler type safety and extract slide threshold constant.The event handlers mix mouse and touch events which could benefit from better typing, and the magic number should be a constant.
+const CANCEL_SLIDE_THRESHOLD_PX = 50; // 处理按住说话按钮 const handleVoiceButtonMouseDown = ( - e: React.MouseEvent | React.TouchEvent, + e: React.MouseEvent<HTMLDivElement> | React.TouchEvent<HTMLDivElement>, ) => { if ( isConnected && clientRef.current && turnDetection === 'client_interrupt' ) { startPressRecord(e); } }; const handleVoiceButtonMouseMove = ( - e: React.MouseEvent | React.TouchEvent, + e: React.MouseEvent<HTMLDivElement> | React.TouchEvent<HTMLDivElement>, ) => { if (isPressRecording && startTouchY.current) { - // 上滑超过50px则取消发送 + // 上滑超过阈值则取消发送 const clientY = 'touches' in e ? e.touches[0].clientY : (e as React.MouseEvent).clientY; - if (clientY < startTouchY.current - 50) { + if (clientY < startTouchY.current - CANCEL_SLIDE_THRESHOLD_PX) { setIsCancelRecording(true); } else { setIsCancelRecording(false); } } };
332-335: Extract minimum recording duration to a named constant.The hardcoded value should be a named constant for better maintainability.
+const MIN_RECORDING_TIME_SECONDS = 1; -// 如果录音时间太短(小于1秒),视为无效 -if (recordingDuration < 1) { +// 如果录音时间太短,视为无效 +if (recordingDuration < MIN_RECORDING_TIME_SECONDS) { cancelPressRecord(); return; }
504-516: Extract time formatting logic to a reusable helper function.The time formatting logic could be extracted for better reusability and testability.
+const formatRecordingTime = (seconds: number): string => { + const minutes = Math.floor(seconds / 60).toString().padStart(2, '0'); + const secs = (seconds % 60).toString().padStart(2, '0'); + return `${minutes}:${secs}`; +}; <div className="recording-time"> - {Math.floor(recordingDuration / 60) - .toString() - .padStart(2, '0')} - :{(recordingDuration % 60).toString().padStart(2, '0')} + {formatRecordingTime(recordingDuration)} </div>
512-514: Use the named constant for maximum recording time.The progress calculation should use the same constant defined earlier.
style={{ - width: `${(recordingDuration / maxRecordingTime) * 100}%`, + width: `${(recordingDuration / MAX_RECORDING_TIME_SECONDS) * 100}%`, }}
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
common/changes/@coze/api/feat-realtime-websocket_2025-06-02-01-24.json(1 hunks)examples/realtime-websocket/src/pages/chat/index.tsx(11 hunks)packages/coze-js/src/ws-tools/wavtools/index.ts(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- common/changes/@coze/api/feat-realtime-websocket_2025-06-02-01-24.json
🚧 Files skipped from review as they are similar to previous changes (1)
- packages/coze-js/src/ws-tools/wavtools/index.ts
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: Node.js v20 (ubuntu-latest)
- GitHub Check: Analyze (javascript-typescript)
🔇 Additional comments (2)
examples/realtime-websocket/src/pages/chat/index.tsx (2)
276-319: LGTM! Error handling and cleanup are properly implemented.The function correctly handles the recording setup, timer management, and error scenarios with proper cleanup. The past review comment about timer cleanup has been addressed.
564-564: Good integration of dynamic configuration based on turn detection mode.The default value properly uses the helper function to generate appropriate configuration based on the current turn detection setting.
9acffb4 to
95b7ac6
Compare
Summary by CodeRabbit
New Features
Improvements
Bug Fixes
Documentation
Chores
Style
Tests