Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
| const audioBuffer = await response.arrayBuffer(); | ||
| const audioBase64 = Buffer.from(audioBuffer).toString("base64"); | ||
|
|
||
| const contentType = outputFormat.startsWith("mp3") ? "audio/mpeg" : "audio/wav"; |
There was a problem hiding this comment.
| const contentType = outputFormat.startsWith("mp3") ? "audio/mpeg" : "audio/wav"; | |
| let contentType: string; | |
| if (outputFormat.startsWith("mp3")) { | |
| contentType = "audio/mpeg"; | |
| } else if (outputFormat.startsWith("pcm")) { | |
| contentType = "audio/L16"; | |
| } else { | |
| contentType = "audio/mpeg"; // fallback | |
| } |
The content type mapping for audio formats is incorrect. PCM formats (pcm_16000, pcm_22050, pcm_24000, pcm_44100) are being labeled as "audio/wav" when they should use a different MIME type like "audio/L16" or "audio/x-raw".
View Details
Analysis
Incorrect MIME type mapping for ElevenLabs PCM audio formats
What fails: textToSpeechStep() in plugins/elevenlabs/steps/text-to-speech.ts incorrectly maps PCM output formats (pcm_16000, pcm_22050, pcm_24000, pcm_44100) to the MIME type audio/wav, which is incorrect for raw PCM data.
How to reproduce:
// Call textToSpeechStep with a PCM output format
const result = await textToSpeechStep({
voiceId: "your-voice-id",
text: "Hello world",
outputFormat: "pcm_16000" // or pcm_22050, pcm_24000, pcm_44100
});
// result.contentType will be "audio/wav" but should be "audio/L16"Result: The function returns contentType: "audio/wav" for all PCM formats. However, ElevenLabs' PCM formats return raw S16LE (16-bit signed little-endian) PCM audio data without WAV container headers.
Expected:
- MP3 formats should use
audio/mpeg✓ (already correct) - PCM formats should use
audio/L16per RFC 2586 and industry standards for raw L16 PCM - The MIME type
audio/wavis incorrect because WAV is a container format with RIFF headers, while ElevenLabs PCM returns headerless raw PCM bytes
References:
- RFC 2586 - The Audio/L16 MIME content type defines
audio/L16as the standard MIME type for raw 16-bit linear PCM audio - AWS Project Lakechain ElevenLabs documentation explicitly maps PCM formats to
audio/L16 - Nuance Mix audio formats documentation uses
audio/L16;rate=...for raw PCM audio - ElevenLabs API documentation states PCM formats are "PCM format (S16LE)" which is consistent with L16
Impact: Downstream systems that validate or process the audio based on the MIME type will incorrectly treat the raw PCM data as a WAV file with container headers, potentially causing decoding failures or incorrect audio processing.
No description provided.