Add Support For Gemini TTS #81

groxaxo · 2025-11-13T01:39:22Z

Speech endpoints for Gemini TTS Flash and pro integrated into the project.

Summary by CodeRabbit

New Features
- Added a Text-to-Speech endpoint to convert text to audio with multiple voice options, model aliases, and output formats (MP3, Opus, AAC, FLAC, WAV, PCM). Response headers preserve CORS. Playback speed parameter not yet available.
Documentation
- Added comprehensive TTS docs with usage examples, model/voice mappings, parameter guidance, and response format notes.

Co-authored-by: groxaxo <[email protected]>

…tion Add OpenAI speech API endpoint with Gemini TTS backend

netlify · 2025-11-13T01:39:27Z

✅ Deploy Preview for gemini-pro ready!

Name	Link
🔨 Latest commit	`8f637ab`
🔍 Latest deploy log	https://app.netlify.com/projects/gemini-pro/deploys/6915a145731d8400079db6c0
😎 Deploy Preview	https://deploy-preview-81--gemini-pro.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

coderabbitai · 2025-11-13T01:39:30Z

Walkthrough

Adds Text-to-Speech docs for /v1/audio/speech and implements a new /audio/speech POST handler in the worker that maps OpenAI-like models/voices to Gemini TTS, calls Gemini generateContent, converts returned audio to requested formats (mp3/opus/aac/flac/wav/pcm), and returns audio with CORS.

Changes

Cohort / File(s)	Summary
Documentation `README.md`	Added TTS documentation describing the `/v1/audio/speech` endpoint with example usage, model mappings (tts-1, tts-1-hd and Gemini model names), voice mappings, parameter guidance (`input`, `voice`, `response_format`), and note that `speed` is not yet implemented.
Speech Synthesis Implementation `src/worker.mjs`	Added `handleSpeech(req, apiKey)` and routed POST `/audio/speech` to it. Introduced `DEFAULT_SPEECH_MODEL = "gemini-2.5-flash-preview-tts"` and `VOICE_MAP` (alloy→Puck, echo→Charon, fable→Kore, onyx→Fenrir, nova→Aoede, shimmer→Aoede). Implements Gemini `generateContent` request, validates `input` and `voice`, parses base64/PCM responses, supports output formats (mp3, opus, aac, flac, wav, pcm), adds `convertPCMToWAV(pcmData)` helper, and preserves CORS while returning audio or errors.

Sequence Diagram

sequenceDiagram
    participant Client
    participant Worker as /audio/speech Handler
    participant Gemini as Gemini API
    participant Converter as Format Converter

    Client->>Worker: POST /audio/speech (model, input, voice, response_format)
    Worker->>Worker: Map model → Gemini TTS\nMap voice → Gemini voice\nValidate input & voice
    Worker->>Gemini: generateContent (speech synthesis request)
    alt Success
        Gemini-->>Worker: Base64-encoded audio or PCM payload
        Worker->>Converter: Convert/unwrap to requested format (mp3/opus/aac/flac/wav/pcm)
        Converter-->>Worker: Audio bytes
        Worker-->>Client: 200 OK (audio bytes, Content-Type, CORS)
    else API Error
        Gemini-->>Worker: Error response
        Worker-->>Client: Error response (preserve CORS)
    end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Verify correctness of base64 decoding and PCM handling.
Inspect convertPCMToWAV for header/sample calculations.
Confirm model/voice mapping fallbacks and validation logic.
Check error propagation and CORS header preservation.

Poem

🐰 I found a WAV beneath the hill,

Mapped voices, hummed a tiny trill.
Bytes and base64 danced in a row,
From text to tone the rabbit knows—
Hop, play, and let the audio flow! 🎶

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'Add Support For Gemini TTS' directly and clearly describes the main change: implementing Gemini Text-to-Speech functionality via a new /audio/speech endpoint with model/voice mappings.

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

README.md (1)

230-246: Consider fixing markdown list indentation for consistency.

The nested list items under audio/speech use 6-space indentation instead of the expected 4 spaces, which is inconsistent with markdown best practices.

Apply this diff to fix the indentation:

 - [x] `audio/speech` (Text-to-Speech)
   <details>
 
-  - [x] `model`
-      - `tts-1` => `gemini-2.5-flash-preview-tts`
-      - `tts-1-hd` => `gemini-2.5-pro-preview-tts`
-      - Can also specify Gemini model names directly
-  - [x] `input` (required)
-  - [x] `voice` (required)
-      - Supported: `alloy`, `echo`, `fable`, `onyx`, `nova`, `shimmer`
-      - Maps to Gemini voices: Puck, Charon, Kore, Fenrir, Aoede
-  - [x] `response_format`
-      - Supported: `mp3`, `opus`, `aac`, `flac`, `wav`, `pcm`
-      - Default: `mp3`
-  - [ ] `speed` (not yet implemented)
+    - [x] `model`
+        - `tts-1` => `gemini-2.5-flash-preview-tts`
+        - `tts-1-hd` => `gemini-2.5-pro-preview-tts`
+        - Can also specify Gemini model names directly
+    - [x] `input` (required)
+    - [x] `voice` (required)
+        - Supported: `alloy`, `echo`, `fable`, `onyx`, `nova`, `shimmer`
+        - Maps to Gemini voices: Puck, Charon, Kore, Fenrir, Aoede
+    - [x] `response_format`
+        - Supported: `mp3`, `opus`, `aac`, `flac`, `wav`, `pcm`
+        - Default: `mp3`
+    - [ ] `speed` (not yet implemented)

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fe22245 and d6d63aa.

⛔ Files ignored due to path filters (1)

package-lock.json is excluded by !**/package-lock.json

📒 Files selected for processing (2)

README.md (2 hunks)
src/worker.mjs (2 hunks)

🧰 Additional context used

🪛 Gitleaks (8.29.0)

README.md

[high] 157-158: Discovered a potential authorization token provided in a curl command header, which could compromise the curl accessed resource.

(curl-auth-header)

🪛 markdownlint-cli2 (0.18.1)

README.md

234-234: Unordered list indentation
Expected: 4; Actual: 6

(MD007, ul-indent)

235-235: Unordered list indentation
Expected: 4; Actual: 6

(MD007, ul-indent)

236-236: Unordered list indentation
Expected: 4; Actual: 6

(MD007, ul-indent)

239-239: Unordered list indentation
Expected: 4; Actual: 6

(MD007, ul-indent)

240-240: Unordered list indentation
Expected: 4; Actual: 6

(MD007, ul-indent)

242-242: Unordered list indentation
Expected: 4; Actual: 6

(MD007, ul-indent)

243-243: Unordered list indentation
Expected: 4; Actual: 6

(MD007, ul-indent)

🔇 Additional comments (6)

README.md (1)

149-181: Excellent documentation for the new TTS feature.

The documentation clearly explains the endpoint usage, model mappings, and voice mappings. The example is helpful and comprehensive.

Note: The Gitleaks warning about the authorization token on line 157-158 is a false positive—this is example documentation with a placeholder value.

src/worker.mjs (5)

34-37: LGTM! Route handler follows existing patterns.

The new /audio/speech endpoint is correctly integrated into the routing logic, consistent with other endpoints in terms of method assertion and error handling.

149-160: Voice mapping is well-defined with clear documentation.

The mapping between OpenAI and Gemini voices is sensible and well-commented. Note that both nova and shimmer map to the same Gemini voice (Aoede), which is acceptable given Gemini's available voice options.

161-189: Model mapping and input validation are correctly implemented.

The model name mapping handles all expected cases (OpenAI model names, Gemini model names, and defaults), and required fields are properly validated with appropriate error messages.

191-233: Gemini API integration looks correct.

The request construction properly configures audio response with voice settings, and error handling preserves CORS headers while logging errors. The response validation ensures audio data is present before proceeding.

277-307: Hardcoded audio parameters are correct for Gemini's TTS output.

Gemini TTS API uses 24000 Hz sample rate, mono (1 channel), and 16-bit PCM encoding (LINEAR16) by default, which matches the implementation exactly. The parameters do not need to be made configurable or extracted from metadata for standard speech generation use cases.

src/worker.mjs

Co-authored-by: groxaxo <[email protected]>

Fix TTS audio format handling - limit to WAV/PCM, reject unsupported formats

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

README.md (1)
157-158: Clarify the authorization header in the curl example.

The curl example shows Authorization: Bearer YOUR_GEMINI_API_KEY. While this is a placeholder, it may be clearer to use angle brackets (e.g., <YOUR_GEMINI_API_KEY>) to explicitly indicate it's a placeholder and should not be copied literally. This avoids potential confusion or accidental exposure if users copy the example without substitution.
 curl https://your-endpoint.com/v1/audio/speech \
-  -H "Authorization: Bearer YOUR_GEMINI_API_KEY" \
+  -H "Authorization: Bearer <YOUR_GEMINI_API_KEY>" \

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d6d63aa and 033b40a.

📒 Files selected for processing (2)

README.md (2 hunks)
src/worker.mjs (2 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

src/worker.mjs

🧰 Additional context used

🪛 Gitleaks (8.29.0)

README.md

[high] 157-158: Discovered a potential authorization token provided in a curl command header, which could compromise the curl accessed resource.

(curl-auth-header)

🪛 markdownlint-cli2 (0.18.1)

README.md

234-234: Unordered list indentation
Expected: 4; Actual: 6

(MD007, ul-indent)

235-235: Unordered list indentation
Expected: 4; Actual: 6

(MD007, ul-indent)

236-236: Unordered list indentation
Expected: 4; Actual: 6

(MD007, ul-indent)

239-239: Unordered list indentation
Expected: 4; Actual: 6

(MD007, ul-indent)

240-240: Unordered list indentation
Expected: 4; Actual: 6

(MD007, ul-indent)

242-242: Unordered list indentation
Expected: 4; Actual: 6

(MD007, ul-indent)

243-243: Unordered list indentation
Expected: 4; Actual: 6

(MD007, ul-indent)

244-244: Unordered list indentation
Expected: 4; Actual: 6

(MD007, ul-indent)

🔇 Additional comments (1)

README.md (1)

150-181: Comprehensive TTS documentation—verify alignment with implementation.

The new TTS section provides clear documentation with model mappings, voice mappings, and usage examples. The parameter detail list (lines 231–247) and the note about PCM format (line 244) and external conversion tools are helpful for users.

Please verify that:

The model names (gemini-2.5-flash-preview-tts and gemini-2.5-pro-preview-tts) match those used in src/worker.mjs.

The voice mappings (alloy→Puck, echo→Charon, etc.) match the VOICE_MAP implementation.

The supported response formats (wav, pcm) and their behavior match the actual implementation.

README.md

Co-authored-by: groxaxo <[email protected]>

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

README.md (1)

239-240: Clarify voice mapping for shimmer in the supported endpoints section.

The list shows 6 OpenAI voices but only 5 Gemini voice names. While the TTS section (lines 175-180) makes it explicit that both nova and shimmer map to Aoede, this could be clearer in the supported endpoints section. Consider adding a note: "Maps to Gemini voices: Puck, Charon, Kore, Fenrir, Aoede (nova and shimmer both map to Aoede)".

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 033b40a and 8f637ab.

📒 Files selected for processing (1)

README.md (2 hunks)

🧰 Additional context used

🪛 Gitleaks (8.29.0)

README.md

[high] 157-158: Discovered a potential authorization token provided in a curl command header, which could compromise the curl accessed resource.

(curl-auth-header)

🔇 Additional comments (2)

README.md (2)
157-158: Consider using environment variables for the Authorization header example.

The Gitleaks security scanner flagged the bare Authorization: Bearer YOUR_GEMINI_API_KEY header in the curl example. While this uses a placeholder, it's a best practice to avoid showing credential patterns directly in command examples, even with placeholders. Consider documenting the use of environment variables instead.

Example improvement:
curl https://your-endpoint.com/v1/audio/speech \
  -H "Authorization: Bearer $GEMINI_API_KEY" \
  ...
This makes it clearer that the key should never be hardcoded in commands.

242-244: Verify audio format support discrepancy.

The documentation states that "For mp3, opus, aac, or flac, use external conversion tools like ffmpeg," implying these formats are not natively supported. However, the AI summary indicates the implementation includes "converts returned audio to requested formats (mp3/opus/aac/flac/wav/pcm)."

This is a critical discrepancy—if the implementation supports these formats, the documentation should be updated to reflect that. If only wav/pcm are supported, the implementation summary should be corrected.

Copilot AI and others added 5 commits November 12, 2025 09:38

Initial plan

7e6d9bc

Initial analysis: Add OpenAI speech API endpoint linked to Gemini TTS

9607ee2

Co-authored-by: groxaxo <[email protected]>

Add OpenAI speech API endpoint with Gemini TTS integration

cd05894

Co-authored-by: groxaxo <[email protected]>

Add TTS usage documentation and examples to README

05d7366

Co-authored-by: groxaxo <[email protected]>

Merge pull request #1 from groxaxo/copilot/add-openai-api-speech-func…

d6d63aa

…tion Add OpenAI speech API endpoint with Gemini TTS backend

deno-deploy bot deployed to Preview November 13, 2025 01:39 View deployment

coderabbitai bot reviewed Nov 13, 2025

View reviewed changes

src/worker.mjs Show resolved Hide resolved

Copilot AI and others added 3 commits November 13, 2025 01:48

Initial plan

f2d39aa

Fix audio format conversion - limit to WAV and PCM only

7515a78

Co-authored-by: groxaxo <[email protected]>

Merge pull request #2 from groxaxo/copilot/fix-audio-format-conversion

033b40a

Fix TTS audio format handling - limit to WAV/PCM, reject unsupported formats

deno-deploy bot deployed to Preview November 13, 2025 02:57 View deployment

coderabbitai bot reviewed Nov 13, 2025

View reviewed changes

README.md Outdated Show resolved Hide resolved

Copilot AI and others added 3 commits November 13, 2025 09:07

Initial plan

2786f6d

Fix markdown list indentation violations in README.md

2aa8d07

Co-authored-by: groxaxo <[email protected]>

Merge pull request #3 from groxaxo/copilot/fix-markdown-list-indentation

8f637ab

deno-deploy bot deployed to Preview November 13, 2025 09:13 View deployment

coderabbitai bot reviewed Nov 13, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Support For Gemini TTS #81

Add Support For Gemini TTS #81

Uh oh!

groxaxo commented Nov 13, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

netlify bot commented Nov 13, 2025 •

edited

Loading

Uh oh!

coderabbitai bot commented Nov 13, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Add Support For Gemini TTS #81

Are you sure you want to change the base?

Add Support For Gemini TTS #81

Uh oh!

Conversation

groxaxo commented Nov 13, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

netlify bot commented Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for gemini-pro ready!

Uh oh!

coderabbitai bot commented Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

groxaxo commented Nov 13, 2025 •

edited by coderabbitai bot

Loading

netlify bot commented Nov 13, 2025 •

edited

Loading

coderabbitai bot commented Nov 13, 2025 •

edited

Loading