Skip to content

Conversation

@MrMingYang
Copy link
Contributor

@MrMingYang MrMingYang commented Nov 26, 2025

1.在WsSpeechClient类中新增destroyPlayer方法,提供给外部使用,在停止播报之后能手动释放播放器,避免手机端多次调用WsSpeechClient后,浏览器限制导致无法播放声音的问题。

2.WsTranscriptionClient类中提供配置语音识别的参数“transcriptionUpdateData”,比如语种,配置小语种zh,可以提升识别准确性,具体配置参数可参考官方连接对“transcriptions.update”的data配置。

连接:https://www.coze.cn/open/docs/developer_guides/asr_event

Summary by CodeRabbit

  • New Features

    • Improved audio player management with explicit resource cleanup
    • Publicly exposed audio configuration for advanced usage
    • Support for custom voice-recognition and customizable transcription update data
  • Bug Fixes

    • Ensure playback timeouts are cleared and connections closed when destroying the player

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Nov 26, 2025

Walkthrough

Exports AudioConfig, adds WsSpeechClient.destroyPlayer for playback teardown, centralizes transcription initial payload construction with a getInitialUpdateData helper and exposes transcriptionUpdateData option, and adds changelog entries for @coze/api reflecting these changes.

Changes

Cohort / File(s) Summary
AudioConfig Export
packages/coze-js/src/resources/websockets/types.ts
Made AudioConfig an exported interface (public API surface).
WsSpeechClient Resource Cleanup
packages/coze-js/src/ws-tools/speech/index.ts
Added async destroyPlayer() to clear playback timeout, close the WebSocket, destroy wavStreamPlayer, and reset playback state (totalDuration, playbackStartTime, playbackPauseTime, elapsedBeforePause, audioDeltaList).
Transcription Initialization & Options
packages/coze-js/src/ws-tools/transcription/index.ts, packages/coze-js/src/ws-tools/types.ts
Extracted initial transcription payload construction into getInitialUpdateData(sampleRate) and added transcriptionUpdateData?: TranscriptionsUpdateEvent['data'] to WsTranscriptionClientOptions; imports adjusted accordingly.
Changelog Entries
common/changes/@coze/api/myang-voice-ext-interface_2025-11-26-05-40.json, common/changes/@coze/api/myang-voice-ext-interface_2025-11-27-01-27.json
Added minor and patch changelog records for @coze/api documenting player cleanup and related behavior notes.

Sequence Diagram(s)

sequenceDiagram
  rect rgb(230,240,255)
    participant App
    participant WsSpeechClient
    participant WavPlayer
    participant WebSocket
  end

  App->>WsSpeechClient: call destroyPlayer()
  WsSpeechClient->>WsSpeechClient: clear playbackTimeout
  WsSpeechClient->>WebSocket: close()  -->> WsSpeechClient: closed
  WsSpeechClient->>WavPlayer: destroy()  -->> WsSpeechClient: destroyed
  WsSpeechClient->>WsSpeechClient: reset playback state (durations, times, lists)
  WsSpeechClient-->>App: Promise resolved
Loading
sequenceDiagram
  participant WsTranscriptionClient
  participant Config

  WsTranscriptionClient->>WsTranscriptionClient: getInitialUpdateData(sampleRate)
  WsTranscriptionClient->>Config: read transcriptionUpdateData (optional)
  alt custom input_audio present
    WsTranscriptionClient->>WsTranscriptionClient: merge custom input_audio with defaults
  else no custom input_audio
    WsTranscriptionClient->>WsTranscriptionClient: build default input_audio using sampleRate
  end
  WsTranscriptionClient-->>WebSocket: send TRANSCRIPTIONS_UPDATE with merged payload
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • Check destroyPlayer covers all edge cases (concurrent calls, no-op when already destroyed).
  • Validate getInitialUpdateData correctly merges custom transcriptionUpdateData and preserves backward compatibility.
  • Confirm AudioConfig export doesn't break consumers or cause type collisions.

Possibly related PRs

Suggested reviewers

  • DingGao-Devin

Poem

🐰 I hopped through code at break of day,
Cleared timeouts, closed the socket away,
I nudged configs, merged them neat,
Reset the times and cleaned the beat,
Player rests now — give a carrot, hooray! 🥕

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly reflects the main changes: adding destroyPlayer functionality to WsSpeechClient and custom transcriptionUpdateData support to WsTranscriptionClient for voice/speech features.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8269555 and 1dcd9dc.

📒 Files selected for processing (2)
  • common/changes/@coze/api/myang-voice-ext-interface_2025-11-27-01-27.json (1 hunks)
  • packages/coze-js/src/ws-tools/speech/index.ts (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • common/changes/@coze/api/myang-voice-ext-interface_2025-11-27-01-27.json
🔇 Additional comments (1)
packages/coze-js/src/ws-tools/speech/index.ts (1)

189-211: LGTM! Previous feedback has been addressed.

The destroyPlayer implementation correctly addresses the concerns raised in the previous review. The method now:

  1. Clears any pending playbackTimeout before destroying the player (lines 196-199)
  2. Closes the WebSocket to stop incoming messages (line 202)
  3. Safely destroys the wavStreamPlayer (line 205)
  4. Resets all playback state to ensure a clean teardown (lines 206-210)

The sequence of operations is correct and prevents the race conditions mentioned in the earlier review (timeout firing after destruction, WebSocket messages arriving after cleanup).

Tip

📝 Customizable high-level summaries are now available in beta!

You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.

  • Provide your own instructions using the high_level_summary_instructions setting.
  • Format the summary however you like (bullet lists, tables, multi-section layouts, contributor stats, etc.).
  • Use high_level_summary_in_walkthrough to move the summary from the description to the walkthrough section.

Example instruction:

"Divide the high-level summary into five sections:

  1. 📝 Description — Summarize the main change in 50–60 words, explaining what was done.
  2. 📓 References — List relevant issues, discussions, documentation, or related PRs.
  3. 📦 Dependencies & Requirements — Mention any new/updated dependencies, environment variable changes, or configuration updates.
  4. 📊 Contributor Summary — Include a Markdown table showing contributions:
    | Contributor | Lines Added | Lines Removed | Files Changed |
  5. ✔️ Additional Notes — Add any extra reviewer context.
    Keep each section concise (under 200 words) and use bullet or numbered lists for clarity."

Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
packages/coze-js/src/ws-tools/transcription/index.ts (1)

28-61: The merge logic is correct and handles all cases appropriately.

The getInitialUpdateData method properly constructs the initial transcription update payload by:

  1. Establishing sensible defaults based on the recorder's sample rate
  2. Allowing custom configuration to override defaults
  3. Ensuring input_audio is always present in the payload

The implementation correctly handles the three cases: no custom data, custom data without input_audio, and custom data with partial input_audio fields.

The three conditional branches could be simplified to a single return statement:

-  if (!customUpdateData) {
-    return {
-      input_audio: defaultInputAudio,
-    };
-  }
-
-  if (!customUpdateData.input_audio) {
-    return {
-      ...customUpdateData,
-      input_audio: defaultInputAudio,
-    };
-  }
-
-  return {
-    ...customUpdateData,
-    input_audio: {
-      ...defaultInputAudio,
-      ...customUpdateData.input_audio,
-    },
-  };
+  return {
+    ...customUpdateData,
+    input_audio: {
+      ...defaultInputAudio,
+      ...customUpdateData?.input_audio,
+    },
+  };

This leverages the fact that spreading undefined is a no-op in JavaScript, producing identical behavior with less code.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ad4c087 and 8269555.

📒 Files selected for processing (5)
  • common/changes/@coze/api/myang-voice-ext-interface_2025-11-26-05-40.json (1 hunks)
  • packages/coze-js/src/resources/websockets/types.ts (1 hunks)
  • packages/coze-js/src/ws-tools/speech/index.ts (1 hunks)
  • packages/coze-js/src/ws-tools/transcription/index.ts (2 hunks)
  • packages/coze-js/src/ws-tools/types.ts (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
packages/coze-js/src/ws-tools/types.ts (1)
packages/coze-js/src/resources/websockets/types.ts (1)
  • TranscriptionsUpdateEvent (315-320)
packages/coze-js/src/resources/websockets/types.ts (1)
examples/realtime-websocket/src/components/audio-config/index.tsx (1)
  • AudioConfig (34-288)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Node.js v20 (ubuntu-latest)
🔇 Additional comments (4)
packages/coze-js/src/resources/websockets/types.ts (1)

196-207: LGTM! Appropriate public API exposure.

Exporting the AudioConfig interface is a reasonable enhancement that allows external consumers to properly type audio configuration objects when using the WebSocket APIs.

common/changes/@coze/api/myang-voice-ext-interface_2025-11-26-05-40.json (1)

1-11: LGTM! Changelog entry is properly formatted.

The minor version bump is appropriate for the addition of new public methods and configuration options without breaking existing APIs.

packages/coze-js/src/ws-tools/types.ts (2)

1-1: LGTM! Proper import for the new type reference.

The TranscriptionsUpdateEvent import is necessary to support the type annotation for the new transcriptionUpdateData field.


432-436: LGTM! Well-documented optional configuration field.

The transcriptionUpdateData field:

  • Uses the correct type via indexed access (TranscriptionsUpdateEvent['data'])
  • Is appropriately optional, maintaining backward compatibility
  • Has clear bilingual documentation
  • Aligns with the implementation in transcription/index.ts

@MrMingYang MrMingYang marked this pull request as draft November 27, 2025 01:17
@MrMingYang MrMingYang closed this Nov 27, 2025
@MrMingYang MrMingYang reopened this Nov 27, 2025
@CLAassistant
Copy link

CLAassistant commented Nov 27, 2025

CLA assistant check
All committers have signed the CLA.

@MrMingYang MrMingYang marked this pull request as ready for review November 27, 2025 01:36
@chenyuliang-star chenyuliang-star added this pull request to the merge queue Nov 27, 2025
Merged via the queue into coze-dev:main with commit 6c26367 Nov 27, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants