Skip to content
Open
Show file tree
Hide file tree
Changes from 4 commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
6c6fb5e
Add online serving to Stable Audio Diffusion TTS
ekagra-ranjan Feb 6, 2026
cdec68a
make sr model specifc
ekagra-ranjan Feb 6, 2026
ea38d5a
Merge branch 'main' of https://github.com/vllm-project/vllm-omni into…
ekagra-ranjan Feb 6, 2026
c3dad34
lint
ekagra-ranjan Feb 6, 2026
4b5d63e
save model_type in OmniBase so it can be referred easily. Reuse model…
ekagra-ranjan Feb 7, 2026
fa8fa80
update test
ekagra-ranjan Feb 7, 2026
8063b3c
Apply suggestion from @Copilot
ekagra-ranjan Feb 16, 2026
3722a9d
fix doc
ekagra-ranjan Feb 16, 2026
23235aa
fix import
ekagra-ranjan Feb 16, 2026
dfb4873
fix hint
ekagra-ranjan Feb 16, 2026
fce33d3
conflict
ekagra-ranjan Feb 16, 2026
b86dd42
fix comment
ekagra-ranjan Feb 16, 2026
e6c8cd4
fix
ekagra-ranjan Feb 16, 2026
90837b7
fix comment
ekagra-ranjan Feb 16, 2026
049ec17
add test
ekagra-ranjan Feb 17, 2026
071fb8e
add docs
ekagra-ranjan Feb 17, 2026
cf0a6c6
remove debug
ekagra-ranjan Feb 17, 2026
9b548e7
fix doc
ekagra-ranjan Feb 17, 2026
9e345ae
resolve conflict
ekagra-ranjan Feb 24, 2026
7f8380e
fix conflict
ekagra-ranjan Feb 24, 2026
88a1ad7
fix conflict
ekagra-ranjan Feb 26, 2026
296cba3
Merge branch 'main' into er-stable-audio-online
ekagra-ranjan Feb 28, 2026
6192a9d
Merge branch 'main' into er-stable-audio-online
ekagra-ranjan Mar 2, 2026
29e4fb2
resolve comments
ekagra-ranjan Mar 3, 2026
3b3e310
Merge branch 'main' of https://github.com/vllm-project/vllm-omni into…
ekagra-ranjan Mar 3, 2026
400afa3
Merge branch 'main' into er-stable-audio-online
hsliuustc0106 Mar 4, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
234 changes: 234 additions & 0 deletions examples/online_serving/stable_audio/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,234 @@
# Stable Audio Online Serving

Generate audio from text prompts using Stable Audio models via an OpenAI-compatible API endpoint.

## Features

- **OpenAI-compatible API**: Use `/v1/audio/speech` endpoint
- **Flexible control**: Adjust audio length, guidance scale, inference steps
- **Quality control**: Use negative prompts to avoid unwanted characteristics
- **Reproducible**: Set random seed for deterministic generation

## Quick Start

### 1. Start the Server

```bash
vllm-omni serve stabilityai/stable-audio-open-1.0 \
--host 0.0.0.0 \
--port 8000 \
--gpu-memory-utilization 0.9 \
--trust-remote-code \
--enforce-eager \
--omni
```

### 2. Generate Audio

#### Using curl

```bash
curl -X POST http://localhost:8000/v1/audio/speech \
-H "Content-Type: application/json" \
-d '{
"input": "The sound of a cat purring",
"audio_length": 10.0
}' --output cat.wav
```

#### Using Python Client

```bash
python stable_audio_client.py \
--text "The sound of a cat purring" \
--audio_length 10.0 \
--output cat.wav
```

#### Using Bash Script

```bash
bash curl_examples.sh
```

## API Reference

### Endpoint

```
POST /v1/audio/speech
```

### Request Body

```json
{
"input": "Text description of the audio",
"audio_length": 10.0,
"audio_start": 0.0,
"negative_prompt": "Low quality",
"guidance_scale": 7.0,
"num_inference_steps": 100,
"seed": 42,
"response_format": "wav"
}
```

### Parameters

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `input` | string | **required** | Text prompt describing the audio to generate |
| `audio_length` | float | 10.0 | Audio duration in seconds (max ~47s for stable-audio-open-1.0) |
| `audio_start` | float | 0.0 | Audio start time in seconds |
| `negative_prompt` | string | null | Text describing what to avoid in generation |
| `guidance_scale` | float | 7.0 | Classifier-free guidance scale (higher = more adherence to prompt) |
| `num_inference_steps` | int | 100 | Number of denoising steps (higher = better quality, slower) |
| `seed` | int | null | Random seed for reproducibility |
| `response_format` | string | "wav" | Output format: wav, mp3, flac, pcm |

### Response

Returns audio data in the requested format (default: WAV).

## Usage Examples

### Basic Generation

```bash
curl -X POST http://localhost:8000/v1/audio/speech \
-H "Content-Type: application/json" \
-d '{
"input": "The sound of ocean waves"
}' --output ocean.wav
```

### Custom Duration

```bash
curl -X POST http://localhost:8000/v1/audio/speech \
-H "Content-Type: application/json" \
-d '{
"input": "A dog barking",
"audio_length": 5.0
}' --output dog_5s.wav
```

### High Quality with Negative Prompt

```bash
curl -X POST http://localhost:8000/v1/audio/speech \
-H "Content-Type: application/json" \
-d '{
"input": "A piano playing a gentle melody",
"audio_length": 10.0,
"negative_prompt": "Low quality, distorted, noisy",
"guidance_scale": 8.0,
"num_inference_steps": 150
}' --output piano_hq.wav
```

### Reproducible Generation

```bash
curl -X POST http://localhost:8000/v1/audio/speech \
-H "Content-Type: application/json" \
-d '{
"input": "Thunder and rain sounds",
"audio_length": 15.0,
"seed": 42
}' --output thunder.wav
```

### Quick Generation (Fewer Steps)

For faster generation with slightly lower quality:

```bash
curl -X POST http://localhost:8000/v1/audio/speech \
-H "Content-Type: application/json" \
-d '{
"input": "Birds chirping in a forest",
"audio_length": 8.0,
"num_inference_steps": 50
}' --output birds_quick.wav
```

## Python Client Examples

### Simple Generation

```bash
python stable_audio_client.py \
--text "The sound of a cat purring"
```

### Custom Parameters

```bash
python stable_audio_client.py \
--text "Thunder and rain" \
--audio_length 15.0 \
--negative_prompt "Low quality" \
--guidance_scale 7.0 \
--num_inference_steps 100 \
--seed 42 \
--output thunder.wav
```

### Different Output Format

```bash
python stable_audio_client.py \
--text "Guitar playing" \
--response_format mp3 \
--output guitar.mp3
```

## Tips

1. **Audio Length**: Keep under 47 seconds for `stable-audio-open-1.0`
2. **Quality vs Speed**:
- 50 steps: Fast, decent quality
- 100 steps: Good balance (default)
- 150+ steps: High quality, slower
3. **Guidance Scale**:
- Lower (3-5): More creative/varied
- Default (7): Good balance
- Higher (10+): More literal to prompt
4. **Negative Prompts**: Use to avoid "Low quality", "distorted", "noisy", etc.
5. **Seeds**: Use same seed for reproducible results

## Performance

| Inference Steps | Quality | Speed | Use Case |
|----------------|---------|-------|----------|
| 50 | Good | Fast | Quick previews |
| 100 (default) | Very Good | Medium | Production |
| 150+ | Excellent | Slow | Final/critical audio |

## Troubleshooting

### Server not responding
- Check if server is running: `curl http://localhost:8000/health`
- Check server logs for errors

### Audio quality issues
- Increase `num_inference_steps` (e.g., 150)
- Add negative prompts: `"Low quality, distorted, noisy"`
- Increase `guidance_scale` for more prompt adherence

### Generation timeout
- Reduce `num_inference_steps`
- Reduce `audio_length`
- Check GPU memory with `nvidia-smi`

### Wrong audio length
- Ensure `audio_length` is within model limits (~47s max)
- Adjust `audio_start` if trimming is needed

## See Also

- [Offline Inference Example](../../offline_inference/text_to_audio/README.md)
- [Stable Audio Model Card](https://huggingface.co/stabilityai/stable-audio-open-1.0)
- [vLLM-Omni Documentation](https://github.com/vllm-project/vllm-omni)
54 changes: 54 additions & 0 deletions examples/online_serving/stable_audio/curl_examples.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
#!/bin/bash
# Examples for using Stable Audio with curl via /v1/audio/speech endpoint

# Example 1: Simple request with default parameters
echo "Example 1: Simple request with default parameters"
curl -X POST http://localhost:8000/v1/audio/speech \
-H "Content-Type: application/json" \
-d '{
"input": "The sound audience clapping and cheering in a stadium"
}' --output stadium.wav

# Example 2: Request with custom audio_length
echo "Example 2: Custom audio length (5 seconds)"
curl -X POST http://localhost:8000/v1/audio/speech \
-H "Content-Type: application/json" \
-d '{
"input": "The sound of a dog barking",
"audio_length": 5.0
}' --output dog_5s.wav

# Example 3: Request with negative prompt for quality control
echo "Example 3: With negative prompt"
curl -X POST http://localhost:8000/v1/audio/speech \
-H "Content-Type: application/json" \
-d '{
"input": "A piano playing a gentle melody",
"audio_length": 10.0,
"negative_prompt": "Low quality, distorted, noisy"
}' --output piano.wav

# Example 4: Full control with all parameters
echo "Example 4: Full control (custom length, guidance, steps, seed)"
curl -X POST http://localhost:8000/v1/audio/speech \
-H "Content-Type: application/json" \
-d '{
"input": "Thunder and rain sounds",
"audio_length": 15.0,
"negative_prompt": "Low quality",
"guidance_scale": 7.0,
"num_inference_steps": 100,
"seed": 42
}' --output thunder_rain.wav

# Example 5: Quick generation with fewer steps (faster but lower quality)
echo "Example 5: Quick generation (fewer steps)"
curl -X POST http://localhost:8000/v1/audio/speech \
-H "Content-Type: application/json" \
-d '{
"input": "Ocean waves crashing on a beach",
"audio_length": 8.0,
"num_inference_steps": 50
}' --output ocean.wav

echo "All examples completed!"
Loading