-
Notifications
You must be signed in to change notification settings - Fork 87
feat(coze-js-web): Add text-to-speech support #97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughThis pull request revises the ChatX module. The configuration file now returns a different default API URL. In the ChatX component, the import of Flex is added while speech handling is updated: speech methods are replaced with transcription methods and a new footer with a speech toggle is introduced. The use-ws-api hook has been enhanced with new types and functions to manage a dedicated speech WebSocket connection, including initialization, messaging, and closure functions. Changes
Sequence Diagram(s)sequenceDiagram
participant U as User
participant C as ChatX Component
participant WS as useWsAPI Hook
participant S as Speech WebSocket
U->>C: Clicks speech toggle button
C->>WS: Calls startTranscriptions()
WS->>S: Initializes speech connection (initSpeechWs)
S-->>WS: Sends onopen/onmessage events
WS-->>C: Updates speech status via getIsSpeech
C->>U: Reflects new speech state in UI
Suggested reviewers
Poem
✨ Finishing Touches
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
3cbf239 to
a71ee51
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Caution
Inline review comments failed to post. This is likely due to GitHub's limits when posting large numbers of comments.
Actionable comments posted: 5
🧹 Nitpick comments (13)
examples/chat-sdk-web/tsconfig.node.json (2)
3-7: Consider relocating the build info file.Storing build info in
node_modules/.tmpmay lead to unnecessary rebuilds as this directory is typically cleaned during package operations. Consider moving it to a more persistent location like.tsbuildinfoin the project root.- "tsBuildInfoFile": "./node_modules/.tmp/tsconfig.node.tsbuildinfo", + "tsBuildInfoFile": "./.tsbuildinfo",
9-9: Consider using JSONC extension for TypeScript config.The file contains comments which violate JSON standard. Consider renaming the file to
tsconfig.node.jsoncto indicate it's using JSONC format that supports comments.Also applies to: 16-16
🧰 Tools
🪛 Biome (1.9.4)
[error] 9-9: JSON standard does not allow comments.
(parse)
examples/chat-sdk-web/tsconfig.app.json (3)
3-3: Consider relocating the build info file.Storing the build info file in
node_modules/.tmpmight cause issues as this directory is typically excluded from version control and cleaned during package installations. Consider moving it to a more stable location like.tsbuildinfoin the project root or a dedicated.builddirectory.- "tsBuildInfoFile": "./node_modules/.tmp/tsconfig.app.tsbuildinfo", + "tsBuildInfoFile": "./.build/tsconfig.app.tsbuildinfo",
18-23: Consider additional type-checking options.While the current linting configuration is good, consider adding these options to further enhance type safety:
/* Linting */ "strict": true, "noUnusedLocals": true, "noUnusedParameters": true, "noFallthroughCasesInSwitch": true, - "noUncheckedSideEffectImports": true + "noUncheckedSideEffectImports": true, + "noImplicitReturns": true, + "noImplicitOverride": true🧰 Tools
🪛 Biome (1.9.4)
[error] 18-18: JSON standard does not allow comments.
(parse)
[error] 19-19: End of file expected
Use an array for a sequence of values:
[1, 2](parse)
[error] 19-19: End of file expected
Use an array for a sequence of values:
[1, 2](parse)
[error] 19-19: End of file expected
Use an array for a sequence of values:
[1, 2](parse)
[error] 19-19: End of file expected
Use an array for a sequence of values:
[1, 2](parse)
[error] 20-20: End of file expected
Use an array for a sequence of values:
[1, 2](parse)
[error] 20-20: End of file expected
Use an array for a sequence of values:
[1, 2](parse)
[error] 20-20: End of file expected
Use an array for a sequence of values:
[1, 2](parse)
[error] 20-20: End of file expected
Use an array for a sequence of values:
[1, 2](parse)
[error] 21-21: End of file expected
Use an array for a sequence of values:
[1, 2](parse)
[error] 21-21: End of file expected
Use an array for a sequence of values:
[1, 2](parse)
[error] 21-21: End of file expected
Use an array for a sequence of values:
[1, 2](parse)
[error] 21-21: End of file expected
Use an array for a sequence of values:
[1, 2](parse)
[error] 22-22: End of file expected
Use an array for a sequence of values:
[1, 2](parse)
[error] 22-22: End of file expected
Use an array for a sequence of values:
[1, 2](parse)
[error] 22-22: End of file expected
Use an array for a sequence of values:
[1, 2](parse)
[error] 22-22: End of file expected
Use an array for a sequence of values:
[1, 2](parse)
[error] 23-23: End of file expected
Use an array for a sequence of values:
[1, 2](parse)
[error] 23-23: End of file expected
Use an array for a sequence of values:
[1, 2](parse)
[error] 23-23: End of file expected
Use an array for a sequence of values:
[1, 2](parse)
25-25: Consider including test files and type declarations.The current include configuration only covers the "src" directory. Consider explicitly including test files and type declarations:
- "include": ["src"] + "include": ["src", "src/**/*.ts", "src/**/*.tsx", "tests/**/*.ts", "tests/**/*.tsx"]🧰 Tools
🪛 Biome (1.9.4)
[error] 25-25: End of file expected
Use an array for a sequence of values:
[1, 2](parse)
[error] 25-25: End of file expected
Use an array for a sequence of values:
[1, 2](parse)
[error] 25-25: End of file expected
Use an array for a sequence of values:
[1, 2](parse)
[error] 25-26: End of file expected
Use an array for a sequence of values:
[1, 2](parse)
examples/chat-sdk-web/src/App.css (2)
8-19: Consider removing premature will-change optimization.The
will-changeproperty should only be used as a last resort for performance issues, not as a preventive measure. Since we're only animating the filter on hover, this optimization is likely unnecessary..logo { height: 6em; padding: 1.5em; - will-change: filter; transition: filter 300ms; }
30-34: Consider improving selector specificity and animation control.Two suggestions for improvement:
- Replace the nth-of-type selector with a more specific class for better maintainability
- Add a way to pause the infinite animation on hover for better user experience
@media (prefers-reduced-motion: no-preference) { - a:nth-of-type(2) .logo { + .logo.animate-spin { animation: logo-spin infinite 20s linear; } + .logo.animate-spin:hover { + animation-play-state: paused; + } }examples/coze-js-web/src/pages/chat-x/use-ws-api.ts (2)
241-289: Initialize speech WebSocket with minor logging fix.Overall, the setup logic is sound. However, note that the console log on line 269 says
"[transcriptions] ws message"though this is clearly the speech WebSocket code path. Consider updating the log statement to maintain clarity.- console.log('[transcriptions] ws message', data); + console.log('[speech] ws message', data);
293-298: Consider providing a closure code for speech WebSocket.Currently, the code calls
speechRef.current.close();without a specific code or reason. Using a standard code (e.g.,1000) and a short reason string can ease debugging and maintain consistency:- speechRef.current.close(); + speechRef.current.close(1000, 'Normal Closure');examples/coze-js-web/src/pages/chat-x/index.tsx (1)
89-102: Integration of new speech/transcriptions methods.Destructuring
startChat, stopChat, startTranscriptions, stopTranscriptions, ...from the updated hook ensures proper usage. The inline callback forTRANSCRIPTIONS_MESSAGE_UPDATElooks correct to update local state.Consider also handling potential error events in this block for better resilience.
examples/chat-sdk-web/src/main.tsx (1)
6-10: Consider adding null check for root element.The non-null assertion operator (!) assumes the root element will always exist. Consider adding a null check for better error handling:
-createRoot(document.getElementById('root')!).render( +const rootElement = document.getElementById('root'); +if (!rootElement) { + throw new Error('Failed to find root element'); +} +createRoot(rootElement).render( <StrictMode> <App /> </StrictMode>, )examples/chat-sdk-web/eslint.config.js (1)
7-28: Consider adding accessibility-related ESLint rules.Since this project involves text-to-speech functionality, consider adding
eslint-plugin-jsx-a11yto enforce accessibility best practices:import js from '@eslint/js' import globals from 'globals' import reactHooks from 'eslint-plugin-react-hooks' import reactRefresh from 'eslint-plugin-react-refresh' +import jsxA11y from 'eslint-plugin-jsx-a11y' import tseslint from 'typescript-eslint' export default tseslint.config( { ignores: ['dist'] }, { - extends: [js.configs.recommended, ...tseslint.configs.recommended], + extends: [ + js.configs.recommended, + ...tseslint.configs.recommended, + 'plugin:jsx-a11y/recommended' + ], files: ['**/*.{ts,tsx}'], languageOptions: { ecmaVersion: 2020, globals: globals.browser, }, plugins: { 'react-hooks': reactHooks, 'react-refresh': reactRefresh, + 'jsx-a11y': jsxA11y, }, // ... rest of the config } )Don't forget to add the plugin to package.json:
{ "devDependencies": { + "eslint-plugin-jsx-a11y": "^6.7.1" } }examples/chat-sdk-web/src/index.css (1)
1-14: Consider using CSS custom properties for better maintainability.The root variables could be organized better for theme customization.
:root { + /* Typography */ font-family: Inter, system-ui, Avenir, Helvetica, Arial, sans-serif; line-height: 1.5; font-weight: 400; + /* Colors */ + --text-primary: rgba(255, 255, 255, 0.87); + --bg-primary: #242424; color-scheme: light dark; - color: rgba(255, 255, 255, 0.87); - background-color: #242424; + color: var(--text-primary); + background-color: var(--bg-primary); font-synthesis: none; text-rendering: optimizeLegibility; -webkit-font-smoothing: antialiased; -moz-osx-font-smoothing: grayscale; }
🛑 Comments failed to post (5)
examples/chat-sdk-web/src/App.css (1)
36-42: 🛠️ Refactor suggestion
Ensure sufficient color contrast for accessibility.
The text color
#888might not provide sufficient contrast with the background for optimal readability. Consider using a darker shade to meet WCAG 2.1 Level AA contrast requirements..read-the-docs { - color: #888; + color: #666; }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements..card { padding: 2em; } .read-the-docs { color: #666; }examples/chat-sdk-web/src/App.tsx (3)
6-33:
⚠️ Potential issueIntegrate text-to-speech functionality.
Based on the PR objectives, this component should include text-to-speech support:
Here's a suggested integration. Would you like me to provide a complete implementation?
import { useState, useCallback } from 'react' import { useSpeech } from '@coze/chat-sdk' // Assuming this hook exists function App() { const [count, setCount] = useState(0) const { speak, speaking, supported } = useSpeech() const handleSpeak = useCallback((text: string) => { if (supported) { speak(text) } }, [speak, supported]) // ... rest of the component }
12-17: 🛠️ Refactor suggestion
Add missing accessibility attributes to links and images.
External links should indicate they open in a new window, and images should have meaningful alt text:
- <a href="https://vite.dev" target="_blank"> + <a href="https://vite.dev" target="_blank" rel="noopener noreferrer" aria-label="Visit Vite documentation (opens in new window)"> <img src={viteLogo} className="logo" alt="Vite logo" /> </a> - <a href="https://react.dev" target="_blank"> + <a href="https://react.dev" target="_blank" rel="noopener noreferrer" aria-label="Visit React documentation (opens in new window)"> <img src={reactLogo} className="logo react" alt="React logo" /> </a>📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.<a href="https://vite.dev" target="_blank" rel="noopener noreferrer" aria-label="Visit Vite documentation (opens in new window)"> <img src={viteLogo} className="logo" alt="Vite logo" /> </a> <a href="https://react.dev" target="_blank" rel="noopener noreferrer" aria-label="Visit React documentation (opens in new window)"> <img src={reactLogo} className="logo react" alt="React logo" /> </a>
21-23: 🛠️ Refactor suggestion
Add ARIA attributes to interactive elements.
The button should have an aria-label for better screen reader support:
- <button onClick={() => setCount((count) => count + 1)}> + <button + onClick={() => setCount((count) => count + 1)} + aria-label={`Increment count, current value is ${count}`} + > count is {count} </button>📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.<button onClick={() => setCount((count) => count + 1)} aria-label={`Increment count, current value is ${count}`} > count is {count} </button>examples/chat-sdk-web/package.json (1)
22-27: 💡 Verification agent
🧩 Analysis chain
Consider using stable versions for critical dependencies.
Several dependencies are using beta or unstable versions which could lead to issues:
- eslint v9.14.0 (ahead of stable)
- eslint-plugin-react-hooks (beta version)
- typescript v5.5.3 (ahead of stable)
🌐 Web query:
What is the latest stable version of TypeScript?💡 Result:
The latest stable version of TypeScript is 5.7.3, released in January 2025[8][9]. This version includes performance optimizations, stability improvements, and maintains compatibility with recent ECMAScript standards like ES2024[7].
Key details:
- Released via npm with the command
npm install -D typescript[1][9]- Current beta version is 5.8, which is available for testing[7]
- Recent versions have focused on:
- Improved type inference patterns[4]
- Enhanced editor reliability[4]
- Better ESM module support[3][7]
- Memory/performance optimizations[1][4]
For production use, TypeScript recommends staying on stable releases like 5.7.x unless specifically needing experimental features from beta versions[7][9]. The TypeScript team maintains a regular release cycle with approximately quarterly updates[4][7].
Citations:
- 1: https://devblogs.microsoft.com/typescript/announcing-typescript-5-0/
- 2: https://github.com/microsoft/typescript/releases
- 3: https://devblogs.microsoft.com/typescript/announcing-typescript-5-4/
- 4: https://effectivetypescript.com/2024/07/02/ts-55/
- 5: https://en.wikipedia.org/wiki/TypeScript
- 6: https://www.npmjs.com/package/typescript/v/4.8.4
- 7: https://www.typescriptlang.org
- 8: https://www.npmjs.com/package/typescript
- 9: https://www.typescriptlang.org/download/
Update Critical Dependency Versions to Stable Releases
There are several critical dependencies that appear to use potentially unstable or beta versions which could introduce risks:
- eslint: Currently at "9.14.0". Please verify if this version is intended and stable, as the version number suggests it might be a pre-stable release.
- eslint-plugin-react-hooks: This dependency is using a beta release ("5.1.0-beta-26f2496093-20240514"); consider switching to a stable version.
- typescript: The dependency is defined as "^5.5.3", but the latest stable version is now 5.7.3. Please update to ensure you benefit from performance optimizations and stability improvements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (4)
examples/coze-js-web/src/pages/chat-x/use-ws-api.ts (4)
241-289: Consider adding error recovery and reconnection logic.The WebSocket initialization looks good, but there's no automatic reconnection strategy if the connection fails or drops.
Consider adding a reconnection mechanism:
const initSpeechWs = useCallback( async (onMessage?: (data: CreateSpeechWsRes) => void) => { closeSpeechWs(); if (!clientRef.current) { return; } + const maxRetries = 3; + let retryCount = 0; + + const connect = async (): Promise<WebSocketAPI<CreateSpeechWsReq, CreateSpeechWsRes>> => { const ws = await clientRef.current.websockets.audio.speech.create(); return new Promise<WebSocketAPI<CreateSpeechWsReq, CreateSpeechWsRes>>( (resolve, reject) => { ws.onopen = () => { console.log('[speech] ws open'); resolve(ws); }; ws.onerror = async (error, event) => { console.error('[speech] WebSocket error', error, event); ws.close(); - reject(error); + if (retryCount < maxRetries) { + retryCount++; + console.log(`[speech] Retrying connection (${retryCount}/${maxRetries})...`); + try { + const newWs = await connect(); + resolve(newWs); + } catch (retryError) { + reject(retryError); + } + } else { + reject(error); + } }; // ... rest of the implementation } ); + }; + + return connect(); }, [], );
300-309: Consider optimizing the audio decoding process.The current implementation creates a new ArrayBuffer for each message. For better performance with large audio streams:
const handleAudioMessage = useCallback(async (message: string) => { - const decodedContent = atob(message); - const arrayBuffer = new ArrayBuffer(decodedContent.length); - const view = new Uint8Array(arrayBuffer); - for (let i = 0; i < decodedContent.length; i++) { - view[i] = decodedContent.charCodeAt(i); - } + const binaryString = atob(message); + const bytes = new Uint8Array(binaryString.length); + bytes.set(Uint8Array.from(binaryString, c => c.charCodeAt(0))); - await wavStreamPlayer.add16BitPCM(arrayBuffer, trackId.current); + await wavStreamPlayer.add16BitPCM(bytes.buffer, trackId.current); }, []);
474-474: Remove or update the documentation link.The link to internal documentation should be removed or replaced with public documentation.
- // See https://bytedance.larkoffice.com/docx/Uv6Wd8GTjoEex3xyq4YcxDnRnkc#HrVcdpW4Oo3A25xCQIDcGBlknOc + // For API documentation, see: <add public documentation link>
504-504: Consider caching the speech status.The
getIsSpeechfunction checks the stream status on every call. Consider caching this value.+const [isSpeechActive, setIsSpeechActive] = useState(false); + +useEffect(() => { + const streamStatus = wavStreamPlayer.stream !== null; + if (streamStatus !== isSpeechActive) { + setIsSpeechActive(streamStatus); + } +}, [wavStreamPlayer.stream]); + -const getIsSpeech = useCallback(() => wavStreamPlayer.stream !== null, []); +const getIsSpeech = useCallback(() => isSpeechActive, [isSpeechActive]);
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
examples/coze-js-web/src/pages/chat-x/config.ts(1 hunks)examples/coze-js-web/src/pages/chat-x/index.tsx(4 hunks)examples/coze-js-web/src/pages/chat-x/use-ws-api.ts(8 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- examples/coze-js-web/src/pages/chat-x/config.ts
- examples/coze-js-web/src/pages/chat-x/index.tsx
🔇 Additional comments (2)
examples/coze-js-web/src/pages/chat-x/use-ws-api.ts (2)
8-9: LGTM! Good modular approach to WebSocket types and audio tools.The addition of speech-specific types and WavStreamPlayer shows good separation of concerns.
Also applies to: 16-16
33-36: LGTM! Well-structured WebSocket reference.The speechRef follows the same pattern as other WebSocket references, maintaining consistency.
a71ee51 to
5aa122c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (4)
examples/coze-js-web/src/pages/chat-x/index.tsx (2)
247-263: Enhance accessibility of the speech toggle button.While the implementation is clean, the button could benefit from accessibility improvements.
Apply this diff to improve accessibility:
<Button size="small" type="text" icon={<PhoneOutlined />} + aria-label={getIsSpeech() ? "Stop speech" : "Start speech"} + title={getIsSpeech() ? "Stop speech" : "Start speech"} style={{ marginInlineEnd: 'auto' }} onClick={() => { if (getIsSpeech()) { stopSpeech(); } else { startSpeech(msg); } }} />
392-402: Add error handling for transcription operations.The speech state handling should include error handling to provide better user feedback.
Apply this diff to add error handling:
onRecordingChange: async nextSpeech => { if (nextSpeech) { lastContentRef.current = content; message.info('Start Voice to Text'); - await startTranscriptions(); + try { + await startTranscriptions(); + } catch (error) { + message.error('Failed to start voice to text'); + setSpeech(false); + return; + } } else { message.info('Stop Voice to Text'); - await stopTranscriptions(); + try { + await stopTranscriptions(); + } catch (error) { + message.error('Failed to stop voice to text'); + } } setSpeech(nextSpeech); },examples/coze-js-web/src/pages/chat-x/use-ws-api.ts (2)
300-309: Add input validation and error handling for audio message processing.The audio message handling should include validation and error handling to prevent potential issues.
Apply this diff to improve error handling:
const handleAudioMessage = useCallback(async (message: string) => { + if (!message) { + console.warn('[audio] Empty message received'); + return; + } + try { const decodedContent = atob(message); const arrayBuffer = new ArrayBuffer(decodedContent.length); const view = new Uint8Array(arrayBuffer); for (let i = 0; i < decodedContent.length; i++) { view[i] = decodedContent.charCodeAt(i); } await wavStreamPlayer.add16BitPCM(arrayBuffer, trackId.current); + } catch (error) { + console.error('[audio] Failed to process audio message:', error); + } }, []);
293-298: Add cleanup timeout for WebSocket closure.To ensure proper cleanup, add a timeout to force close the connection if it takes too long.
Apply this diff to add cleanup timeout:
const closeSpeechWs = () => { + const forceCloseTimeout = setTimeout(() => { + if (speechRef.current) { + console.warn('[speech] Force closing WebSocket after timeout'); + speechRef.current.close(1000, 'force close'); + speechRef.current = null; + } + }, 5000); if (speechRef.current && speechRef.current.readyState === WebSocket.OPEN) { speechRef.current.close(); + clearTimeout(forceCloseTimeout); } speechRef.current = null; };
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
examples/coze-js-web/src/pages/chat-x/config.ts(1 hunks)examples/coze-js-web/src/pages/chat-x/index.tsx(4 hunks)examples/coze-js-web/src/pages/chat-x/use-ws-api.ts(9 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- examples/coze-js-web/src/pages/chat-x/config.ts
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: Node.js v20 (ubuntu-latest)
🔇 Additional comments (2)
examples/coze-js-web/src/pages/chat-x/index.tsx (1)
3-11: LGTM! Clean imports and hook usage.The imports are properly organized and the hook destructuring includes all necessary methods for speech functionality.
Also applies to: 89-98
examples/coze-js-web/src/pages/chat-x/use-ws-api.ts (1)
461-498: Add rate limiting for speech requests.The startSpeech function should implement rate limiting to prevent overwhelming the speech service.
Codecov ReportAll modified and coverable lines are covered by tests ✅ @@ Coverage Diff @@
## main #97 +/- ##
==========================================
+ Coverage 91.82% 92.71% +0.89%
==========================================
Files 117 52 -65
Lines 5469 1923 -3546
Branches 1106 344 -762
==========================================
- Hits 5022 1783 -3239
+ Misses 447 140 -307
|
… handling (#97) Co-authored-by: shenxiaojie.316 <[email protected]>
Summary by CodeRabbit