-
Notifications
You must be signed in to change notification settings - Fork 87
feat(speech/transcription): 音播报提供释放播放器功能并支持语音识别自定义配置 #324
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(speech/transcription): 音播报提供释放播放器功能并支持语音识别自定义配置 #324
Conversation
WalkthroughExports AudioConfig, adds WsSpeechClient.destroyPlayer for playback teardown, centralizes transcription initial payload construction with a getInitialUpdateData helper and exposes transcriptionUpdateData option, and adds changelog entries for Changes
Sequence Diagram(s)sequenceDiagram
rect rgb(230,240,255)
participant App
participant WsSpeechClient
participant WavPlayer
participant WebSocket
end
App->>WsSpeechClient: call destroyPlayer()
WsSpeechClient->>WsSpeechClient: clear playbackTimeout
WsSpeechClient->>WebSocket: close() -->> WsSpeechClient: closed
WsSpeechClient->>WavPlayer: destroy() -->> WsSpeechClient: destroyed
WsSpeechClient->>WsSpeechClient: reset playback state (durations, times, lists)
WsSpeechClient-->>App: Promise resolved
sequenceDiagram
participant WsTranscriptionClient
participant Config
WsTranscriptionClient->>WsTranscriptionClient: getInitialUpdateData(sampleRate)
WsTranscriptionClient->>Config: read transcriptionUpdateData (optional)
alt custom input_audio present
WsTranscriptionClient->>WsTranscriptionClient: merge custom input_audio with defaults
else no custom input_audio
WsTranscriptionClient->>WsTranscriptionClient: build default input_audio using sampleRate
end
WsTranscriptionClient-->>WebSocket: send TRANSCRIPTIONS_UPDATE with merged payload
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes
Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (2)
✅ Files skipped from review due to trivial changes (1)
🔇 Additional comments (1)
Tip 📝 Customizable high-level summaries are now available in beta!You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.
Example instruction:
Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
packages/coze-js/src/ws-tools/transcription/index.ts (1)
28-61: The merge logic is correct and handles all cases appropriately.The
getInitialUpdateDatamethod properly constructs the initial transcription update payload by:
- Establishing sensible defaults based on the recorder's sample rate
- Allowing custom configuration to override defaults
- Ensuring
input_audiois always present in the payloadThe implementation correctly handles the three cases: no custom data, custom data without
input_audio, and custom data with partialinput_audiofields.The three conditional branches could be simplified to a single return statement:
- if (!customUpdateData) { - return { - input_audio: defaultInputAudio, - }; - } - - if (!customUpdateData.input_audio) { - return { - ...customUpdateData, - input_audio: defaultInputAudio, - }; - } - - return { - ...customUpdateData, - input_audio: { - ...defaultInputAudio, - ...customUpdateData.input_audio, - }, - }; + return { + ...customUpdateData, + input_audio: { + ...defaultInputAudio, + ...customUpdateData?.input_audio, + }, + };This leverages the fact that spreading
undefinedis a no-op in JavaScript, producing identical behavior with less code.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
common/changes/@coze/api/myang-voice-ext-interface_2025-11-26-05-40.json(1 hunks)packages/coze-js/src/resources/websockets/types.ts(1 hunks)packages/coze-js/src/ws-tools/speech/index.ts(1 hunks)packages/coze-js/src/ws-tools/transcription/index.ts(2 hunks)packages/coze-js/src/ws-tools/types.ts(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
packages/coze-js/src/ws-tools/types.ts (1)
packages/coze-js/src/resources/websockets/types.ts (1)
TranscriptionsUpdateEvent(315-320)
packages/coze-js/src/resources/websockets/types.ts (1)
examples/realtime-websocket/src/components/audio-config/index.tsx (1)
AudioConfig(34-288)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Node.js v20 (ubuntu-latest)
🔇 Additional comments (4)
packages/coze-js/src/resources/websockets/types.ts (1)
196-207: LGTM! Appropriate public API exposure.Exporting the
AudioConfiginterface is a reasonable enhancement that allows external consumers to properly type audio configuration objects when using the WebSocket APIs.common/changes/@coze/api/myang-voice-ext-interface_2025-11-26-05-40.json (1)
1-11: LGTM! Changelog entry is properly formatted.The minor version bump is appropriate for the addition of new public methods and configuration options without breaking existing APIs.
packages/coze-js/src/ws-tools/types.ts (2)
1-1: LGTM! Proper import for the new type reference.The
TranscriptionsUpdateEventimport is necessary to support the type annotation for the newtranscriptionUpdateDatafield.
432-436: LGTM! Well-documented optional configuration field.The
transcriptionUpdateDatafield:
- Uses the correct type via indexed access (
TranscriptionsUpdateEvent['data'])- Is appropriately optional, maintaining backward compatibility
- Has clear bilingual documentation
- Aligns with the implementation in
transcription/index.ts
1.在WsSpeechClient类中新增destroyPlayer方法,提供给外部使用,在停止播报之后能手动释放播放器,避免手机端多次调用WsSpeechClient后,浏览器限制导致无法播放声音的问题。
2.WsTranscriptionClient类中提供配置语音识别的参数“transcriptionUpdateData”,比如语种,配置小语种zh,可以提升识别准确性,具体配置参数可参考官方连接对“transcriptions.update”的data配置。
连接:https://www.coze.cn/open/docs/developer_guides/asr_event
Summary by CodeRabbit
New Features
Bug Fixes
✏️ Tip: You can customize this high-level summary in your review settings.