Utilizes ONNX Runtime for TTS model.
-
Supported Models:
-
End-to-End Processing:
- The solution includes internal
STFT/ISTFTprocessing. - Input:
reference audio+text - Output:
generated speech
- The solution includes internal
-
Optimize:
- The key components enable 100% deployment of GPU operators.
-
Resources:
| OS | Device | Backend | Model | Time Cost in Seconds (reference audio: 6s / generates approximately 15 words of speech) |
RTF |
|---|---|---|---|---|---|
| Ubuntu-24.04 | Laptop | CPU i7-1165G7 |
F5-TTS F32 |
180 (NFE=32) |
60 |
| Ubuntu-24.04 | Laptop | GPU MX150 |
F5-TTS F32 |
62 (NFE=32) |
21 |
| Ubuntu-24.04 | Laptop | CPU i7-1165G7 |
IndexTTS F32 |
18 | 6 |
| Ubuntu-24.04 | Laptop | GPU MX150 |
BigVGAN V2 24khz_100band_256x F16 |
4.6 input mel = (1, 100, 512) |
1.53 |
| Ubuntu-24.04 | Laptop | CPU i7-1165G7 |
KaniTTS Q8F32 |
4.2 | 1.4 |
| Ubuntu-24.04 | Laptop | CPU i7-1165G7 |
KaniTTS Q4F32 |
2.6 | 0.87 |
- Beam Search
- VoxCPM
通过 ONNX Runtime 实现运行 TTS 模型。
-
支持的模型:
-
端到端处理:
- 解决方案内置
STFT/ISTFT处理。 - 输入:
参考音频+文本 - 输出:
生成的语音
- 解决方案内置
-
优化:
- 模型关键组件实现了 100% GPU 算子部署。
-
资源: