feat(speech/transcription): 音播报提供释放播放器功能并支持语音识别自定义配置 #324

MrMingYang · 2025-11-26T06:05:37Z

1.在WsSpeechClient类中新增destroyPlayer方法，提供给外部使用，在停止播报之后能手动释放播放器，避免手机端多次调用WsSpeechClient后，浏览器限制导致无法播放声音的问题。

2.WsTranscriptionClient类中提供配置语音识别的参数“transcriptionUpdateData”，比如语种，配置小语种zh，可以提升识别准确性，具体配置参数可参考官方连接对“transcriptions.update”的data配置。

连接：https://www.coze.cn/open/docs/developer_guides/asr_event

Summary by CodeRabbit

New Features
- Improved audio player management with explicit resource cleanup
- Publicly exposed audio configuration for advanced usage
- Support for custom voice-recognition and customizable transcription update data
Bug Fixes
- Ensure playback timeouts are cleared and connections closed when destroying the player

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2025-11-26T06:05:46Z

Walkthrough

Exports AudioConfig, adds WsSpeechClient.destroyPlayer for playback teardown, centralizes transcription initial payload construction with a getInitialUpdateData helper and exposes transcriptionUpdateData option, and adds changelog entries for @coze/api reflecting these changes.

Changes

Cohort / File(s)	Summary
AudioConfig Export `packages/coze-js/src/resources/websockets/types.ts`	Made `AudioConfig` an exported interface (public API surface).
WsSpeechClient Resource Cleanup `packages/coze-js/src/ws-tools/speech/index.ts`	Added `async destroyPlayer()` to clear playback timeout, close the WebSocket, destroy `wavStreamPlayer`, and reset playback state (`totalDuration`, `playbackStartTime`, `playbackPauseTime`, `elapsedBeforePause`, `audioDeltaList`).
Transcription Initialization & Options `packages/coze-js/src/ws-tools/transcription/index.ts`, `packages/coze-js/src/ws-tools/types.ts`	Extracted initial transcription payload construction into `getInitialUpdateData(sampleRate)` and added `transcriptionUpdateData?: TranscriptionsUpdateEvent['data']` to `WsTranscriptionClientOptions`; imports adjusted accordingly.
Changelog Entries `common/changes/@coze/api/myang-voice-ext-interface_2025-11-26-05-40.json`, `common/changes/@coze/api/myang-voice-ext-interface_2025-11-27-01-27.json`	Added minor and patch changelog records for `@coze/api` documenting player cleanup and related behavior notes.

Sequence Diagram(s)

sequenceDiagram
  rect rgb(230,240,255)
    participant App
    participant WsSpeechClient
    participant WavPlayer
    participant WebSocket
  end

  App->>WsSpeechClient: call destroyPlayer()
  WsSpeechClient->>WsSpeechClient: clear playbackTimeout
  WsSpeechClient->>WebSocket: close()  -->> WsSpeechClient: closed
  WsSpeechClient->>WavPlayer: destroy()  -->> WsSpeechClient: destroyed
  WsSpeechClient->>WsSpeechClient: reset playback state (durations, times, lists)
  WsSpeechClient-->>App: Promise resolved

sequenceDiagram
  participant WsTranscriptionClient
  participant Config

  WsTranscriptionClient->>WsTranscriptionClient: getInitialUpdateData(sampleRate)
  WsTranscriptionClient->>Config: read transcriptionUpdateData (optional)
  alt custom input_audio present
    WsTranscriptionClient->>WsTranscriptionClient: merge custom input_audio with defaults
  else no custom input_audio
    WsTranscriptionClient->>WsTranscriptionClient: build default input_audio using sampleRate
  end
  WsTranscriptionClient-->>WebSocket: send TRANSCRIPTIONS_UPDATE with merged payload

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Check destroyPlayer covers all edge cases (concurrent calls, no-op when already destroyed).
Validate getInitialUpdateData correctly merges custom transcriptionUpdateData and preserves backward compatibility.
Confirm AudioConfig export doesn't break consumers or cause type collisions.

Possibly related PRs

fix(speech): implement audioDeltaList for sequential audio processing #232 — modifies WsSpeechClient audio playback internals and interacts with audioDeltaList behavior referenced by this PR.
fix(coze/api): fix speech bug #141 — adjusts WsSpeechClient playback/disconnect timing; overlaps with destroyPlayer teardown logic.
feat(api): Add playback timing and state management #122 — implements playback timing/pause/resume logic related to states reset by destroyPlayer.

Suggested reviewers

DingGao-Devin

Poem

🐰 I hopped through code at break of day,
Cleared timeouts, closed the socket away,
I nudged configs, merged them neat,
Reset the times and cleaned the beat,
Player rests now — give a carrot, hooray! 🥕

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly reflects the main changes: adding destroyPlayer functionality to WsSpeechClient and custom transcriptionUpdateData support to WsTranscriptionClient for voice/speech features.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8269555 and 1dcd9dc.

📒 Files selected for processing (2)

common/changes/@coze/api/myang-voice-ext-interface_2025-11-27-01-27.json (1 hunks)
packages/coze-js/src/ws-tools/speech/index.ts (1 hunks)

✅ Files skipped from review due to trivial changes (1)

common/changes/@coze/api/myang-voice-ext-interface_2025-11-27-01-27.json

🔇 Additional comments (1)

packages/coze-js/src/ws-tools/speech/index.ts (1)

189-211: LGTM! Previous feedback has been addressed.

The destroyPlayer implementation correctly addresses the concerns raised in the previous review. The method now:

Clears any pending playbackTimeout before destroying the player (lines 196-199)

Closes the WebSocket to stop incoming messages (line 202)

Safely destroys the wavStreamPlayer (line 205)

Resets all playback state to ensure a clean teardown (lines 206-210)

The sequence of operations is correct and prevents the race conditions mentioned in the earlier review (timeout firing after destruction, WebSocket messages arriving after cleanup).

Tip

📝 Customizable high-level summaries are now available in beta!

You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.

Provide your own instructions using the high_level_summary_instructions setting.
Format the summary however you like (bullet lists, tables, multi-section layouts, contributor stats, etc.).
Use high_level_summary_in_walkthrough to move the summary from the description to the walkthrough section.

Example instruction:

"Divide the high-level summary into five sections:

📝 Description — Summarize the main change in 50–60 words, explaining what was done.

📓 References — List relevant issues, discussions, documentation, or related PRs.

📦 Dependencies & Requirements — Mention any new/updated dependencies, environment variable changes, or configuration updates.

📊 Contributor Summary — Include a Markdown table showing contributions:
| Contributor | Lines Added | Lines Removed | Files Changed |

✔️ Additional Notes — Add any extra reviewer context.
Keep each section concise (under 200 words) and use bullet or numbered lists for clarity."

Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

packages/coze-js/src/ws-tools/transcription/index.ts (1)
28-61: The merge logic is correct and handles all cases appropriately.

The getInitialUpdateData method properly constructs the initial transcription update payload by:

Establishing sensible defaults based on the recorder's sample rate

Allowing custom configuration to override defaults

Ensuring input_audio is always present in the payload

The implementation correctly handles the three cases: no custom data, custom data without input_audio, and custom data with partial input_audio fields.

The three conditional branches could be simplified to a single return statement:
-  if (!customUpdateData) {
-    return {
-      input_audio: defaultInputAudio,
-    };
-  }
-
-  if (!customUpdateData.input_audio) {
-    return {
-      ...customUpdateData,
-      input_audio: defaultInputAudio,
-    };
-  }
-
-  return {
-    ...customUpdateData,
-    input_audio: {
-      ...defaultInputAudio,
-      ...customUpdateData.input_audio,
-    },
-  };
+  return {
+    ...customUpdateData,
+    input_audio: {
+      ...defaultInputAudio,
+      ...customUpdateData?.input_audio,
+    },
+  };
This leverages the fact that spreading undefined is a no-op in JavaScript, producing identical behavior with less code.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ad4c087 and 8269555.

📒 Files selected for processing (5)

common/changes/@coze/api/myang-voice-ext-interface_2025-11-26-05-40.json (1 hunks)
packages/coze-js/src/resources/websockets/types.ts (1 hunks)
packages/coze-js/src/ws-tools/speech/index.ts (1 hunks)
packages/coze-js/src/ws-tools/transcription/index.ts (2 hunks)
packages/coze-js/src/ws-tools/types.ts (2 hunks)

🧰 Additional context used

🧬 Code graph analysis (2)

packages/coze-js/src/ws-tools/types.ts (1)

packages/coze-js/src/resources/websockets/types.ts (1)

TranscriptionsUpdateEvent (315-320)

packages/coze-js/src/resources/websockets/types.ts (1)

examples/realtime-websocket/src/components/audio-config/index.tsx (1)

AudioConfig (34-288)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Node.js v20 (ubuntu-latest)

🔇 Additional comments (4)

packages/coze-js/src/resources/websockets/types.ts (1)

196-207: LGTM! Appropriate public API exposure.

Exporting the AudioConfig interface is a reasonable enhancement that allows external consumers to properly type audio configuration objects when using the WebSocket APIs.

common/changes/@coze/api/myang-voice-ext-interface_2025-11-26-05-40.json (1)

1-11: LGTM! Changelog entry is properly formatted.

The minor version bump is appropriate for the addition of new public methods and configuration options without breaking existing APIs.

packages/coze-js/src/ws-tools/types.ts (2)

1-1: LGTM! Proper import for the new type reference.

The TranscriptionsUpdateEvent import is necessary to support the type annotation for the new transcriptionUpdateData field.

432-436: LGTM! Well-documented optional configuration field.

The transcriptionUpdateData field:

Uses the correct type via indexed access (TranscriptionsUpdateEvent['data'])

Is appropriately optional, maintaining backward compatibility

Has clear bilingual documentation

Aligns with the implementation in transcription/index.ts

packages/coze-js/src/ws-tools/speech/index.ts

CLAassistant · 2025-11-27T01:28:08Z

All committers have signed the CLA.

feat(speech/transcription): 音播报提供释放播放器功能并支持语音识别自定义配置

8269555

coderabbitai bot reviewed Nov 26, 2025

View reviewed changes

packages/coze-js/src/ws-tools/speech/index.ts Show resolved Hide resolved

MrMingYang marked this pull request as draft November 27, 2025 01:17

MrMingYang closed this Nov 27, 2025

MrMingYang reopened this Nov 27, 2025

fix: Clear timeout and close WebSocket before destroying player

1dcd9dc

MrMingYang marked this pull request as ready for review November 27, 2025 01:36

chenyuliang-star approved these changes Nov 27, 2025

View reviewed changes

chenyuliang-star added this pull request to the merge queue Nov 27, 2025

Merged via the queue into coze-dev:main with commit 6c26367 Nov 27, 2025
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(speech/transcription): 音播报提供释放播放器功能并支持语音识别自定义配置 #324

feat(speech/transcription): 音播报提供释放播放器功能并支持语音识别自定义配置 #324

Uh oh!

MrMingYang commented Nov 26, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Nov 26, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

CLAassistant commented Nov 27, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

feat(speech/transcription): 音播报提供释放播放器功能并支持语音识别自定义配置 #324

feat(speech/transcription): 音播报提供释放播放器功能并支持语音识别自定义配置 #324

Uh oh!

Conversation

MrMingYang commented Nov 26, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Nov 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

CLAassistant commented Nov 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

MrMingYang commented Nov 26, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 26, 2025 •

edited

Loading

CLAassistant commented Nov 27, 2025 •

edited

Loading