[Model] Add Stable Audio Open support for text-to-audio generation by linyueqian · Pull Request #331 · vllm-project/vllm-omni

linyueqian · 2025-12-16T08:04:46Z

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

Add support for Stable Audio Open (stabilityai/stable-audio-open-1.0) for text-to-audio generation in vLLM-Omni.

Test Plan

python examples/offline_inference/text_to_audio/text_to_audio.py --model stabilityai/stable-audio-open-1.0 --prompt "The sound of a dog barking" --output dog_barking.wav

Test Result

dog_barking.wav

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

Signed-off-by: linyueqian <[email protected]>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2025-12-16T08:14:10Z

vllm_omni/diffusion/models/stable_audio/pipeline_stable_audio.py

+        num_inference_steps = req.num_inference_steps or num_inference_steps
+        guidance_scale = req.guidance_scale if req.guidance_scale > 1.0 else guidance_scale
+


Honor guidance_scale ≤1 from requests

guidance_scale from the request is only applied when it exceeds 1.0; otherwise the pipeline falls back to the default argument (7.0). This prevents callers from disabling classifier-free guidance or using a lower scale (e.g., requesting 0 or 1 via Omni.generate), because the model will always run CFG at scale 7 regardless of what was requested, making unconditional/low-guidance Stable Audio generation impossible.

Useful? React with 👍 / 👎.

hsliuustc0106 · 2025-12-19T23:27:58Z

add test ci
add online serving
add streaming test after output stream is supported #367

Signed-off-by: linyueqian <[email protected]>

linyueqian · 2025-12-22T07:41:24Z

add test ci add online serving add streaming test after output stream is supported #367

@hsliuustc0106 For online serving of Stable Audio, is a simple FastAPI wrapper around OmniDiffusion the right approach, or is there existing/planned infrastructure I should use?

Signed-off-by: Yueqian Lin <[email protected]>

hsliuustc0106 · 2025-12-23T13:54:22Z

add test ci add online serving add streaming test after output stream is supported #367

@hsliuustc0106 For online serving of Stable Audio, is a simple FastAPI wrapper around OmniDiffusion the right approach, or is there existing/planned infrastructure I should use?

@david6666666 I notice you are working on the video online serving, is it possible to support audio output as well? #437

david6666666 · 2025-12-24T01:15:11Z

is it possible to support audio output as well?

I think we need to open another PR to enable online audio serving, after this text-to-audio model is supported.

Signed-off-by: Yueqian Lin <[email protected]>

Signed-off-by: linyueqian <[email protected]>

hsliuustc0106

add supported models

Signed-off-by: linyueqian <[email protected]>

hsliuustc0106

lgtm

linyueqian · 2025-12-31T04:54:49Z

@hsliuustc0106 is this pr ready to be merged now?

hsliuustc0106 · 2026-01-04T14:06:29Z

fix ci please

hsliuustc0106 · 2026-01-04T14:06:41Z

@Bounty-hunter PTAL

hsliuustc0106 · 2026-01-05T01:39:43Z

fix ci please

Signed-off-by: linyueqian <[email protected]>

princepride · 2026-01-05T03:53:08Z

@ZJY0516 I checked my code, and it doesn't seem to be my problem. I only added a special case check to determine whether it's a diffusion model in the end of is_diffusion_model:

vllm-omni/vllm_omni/diffusion/utils/hf_utils.py

Lines 77 to 79 in ad3011d

    
               # Bagel is not a diffusers pipeline (no model_index.json), but is still a 
        
               # diffusion-style model in vllm-omni. Detect it via config.json. 
        
           return _looks_like_bagel(model_name)

.
But I noticed that the model stabilityai/stable-audio-open-1.0 used for testing isn't a directly accessible model, an application is required.

ZJY0516 · 2026-01-05T04:03:34Z

But I noticed that the model stabilityai/stable-audio-open-1.0 used for testing isn't a directly accessible model, an application is required.

Thanks. I found this too. This works fine on my local machine.

ZJY0516 · 2026-01-05T04:05:47Z

I noticed some warnings. Could you please solve this? @linyueqian

You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565

/home/zjy/code/vllm-omni/.venv/lib/python3.12/site-packages/torch/nn/utils/weight_norm.py:144: FutureWarning: `torch.nn.utils.weight_norm` is deprecated in favor of `torch.nn.utils.parametrizations.weight_norm`.
  WeightNorm.apply(module, name, dim)

/home/zjy/code/vllm-omni/.venv/lib/python3.12/site-packages/torchsde/_brownian/brownian_interval.py:608: UserWarning: Should have tb<=t1 but got tb=500.00006103515625 and t1=500.0.
  warnings.warn(f"Should have {tb_name}<=t1 but got {tb_name}={tb} and t1={self._end}.")
/home/zjy/code/vllm-omni/.venv/lib/python3.12/site-packages/torchsde/_brownian/brownian_interval.py:599: UserWarning: Should have ta>=t0 but got ta=0.29999998211860657 and t0=0.3.
  warnings.warn(f"Should have ta>=t0 but got ta={ta} and t0={self._start}.")
/home/zjy/code/vllm-omni/.venv/lib/python3.12/site-packages/torchsde/_brownian/brownian_interval.py:599: UserWarning: Should have ta>=t0 but got ta=0.0 and t0=0.3.
  warnings.warn(f"Should have ta>=t0 but got ta={ta} and t0={self._start}.")
/home/zjy/code/vllm-omni/.venv/lib/python3.12/site-packages/torchsde/_brownian/brownian_interval.py:602: UserWarning: Should have tb>=t0 but got tb=0.29999998211860657 and t0=0.3.
  warnings.warn(f"Should have {tb_name}>=t0 but got {tb_name}={tb} and t0={self._start}.")

Signed-off-by: linyueqian <[email protected]>

linyueqian · 2026-01-05T04:40:41Z

I noticed some warnings. Could you please solve this? @linyueqian

You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565

/home/zjy/code/vllm-omni/.venv/lib/python3.12/site-packages/torch/nn/utils/weight_norm.py:144: FutureWarning: `torch.nn.utils.weight_norm` is deprecated in favor of `torch.nn.utils.parametrizations.weight_norm`.
  WeightNorm.apply(module, name, dim)

/home/zjy/code/vllm-omni/.venv/lib/python3.12/site-packages/torchsde/_brownian/brownian_interval.py:608: UserWarning: Should have tb<=t1 but got tb=500.00006103515625 and t1=500.0.
  warnings.warn(f"Should have {tb_name}<=t1 but got {tb_name}={tb} and t1={self._end}.")
/home/zjy/code/vllm-omni/.venv/lib/python3.12/site-packages/torchsde/_brownian/brownian_interval.py:599: UserWarning: Should have ta>=t0 but got ta=0.29999998211860657 and t0=0.3.
  warnings.warn(f"Should have ta>=t0 but got ta={ta} and t0={self._start}.")
/home/zjy/code/vllm-omni/.venv/lib/python3.12/site-packages/torchsde/_brownian/brownian_interval.py:599: UserWarning: Should have ta>=t0 but got ta=0.0 and t0=0.3.
  warnings.warn(f"Should have ta>=t0 but got ta={ta} and t0={self._start}.")
/home/zjy/code/vllm-omni/.venv/lib/python3.12/site-packages/torchsde/_brownian/brownian_interval.py:602: UserWarning: Should have tb>=t0 but got tb=0.29999998211860657 and t0=0.3.
  warnings.warn(f"Should have {tb_name}>=t0 but got {tb_name}={tb} and t0={self._start}.")

I've fixed the first warning (T5Tokenizer legacy) by switching to T5TokenizerFast in this commit. And I think the remaining warnings are from third-party dependencies that cannot be fixed here.

Signed-off-by: linyueqian <[email protected]>

linyueqian · 2026-01-05T07:37:23Z

@hsliuustc0106 i think the ci issue is fixed here.

hsliuustc0106 · 2026-01-07T09:00:32Z

@hsliuustc0106 i think the ci issue is fixed here.

thanks for contributing the first text-2-audio model

hsliuustc0106 · 2026-01-07T09:01:43Z

please remember to submit a PR to vllm-project/recipe

…llm-project#331) Signed-off-by: linyueqian <[email protected]> Signed-off-by: Yueqian Lin <[email protected]> Co-authored-by: Hongsheng Liu <[email protected]>

linyueqian · 2026-01-09T11:13:51Z

please remember to submit a PR to vllm-project/recipe

vllm-project/recipes#193

…llm-project#331) Signed-off-by: linyueqian <[email protected]> Signed-off-by: Yueqian Lin <[email protected]> Co-authored-by: Hongsheng Liu <[email protected]>

add stable audio

a6ea3e5

Signed-off-by: linyueqian <[email protected]>

linyueqian requested a review from hsliuustc0106 as a code owner December 16, 2025 08:04

chatgpt-codex-connector bot reviewed Dec 16, 2025

View reviewed changes

david6666666 linked an issue Dec 16, 2025 that may be closed by this pull request

[New Model]: stabilityai/stable-audio-open-1.0 #324

Closed

1 task

david6666666 mentioned this pull request Dec 16, 2025

[RFC]: DiT model and feature support enhancement #85

Closed

58 tasks

hsliuustc0106 requested review from Gaohan123 and SamitHuang December 19, 2025 23:26

linyueqian added 2 commits December 22, 2025 02:33

add test ci

f3b3d89

Signed-off-by: linyueqian <[email protected]>

track whether guidance_scale was provided

e9fea87

Signed-off-by: linyueqian <[email protected]>

Merge branch 'main' into feat/stable-audio-open

1bc3754

Signed-off-by: Yueqian Lin <[email protected]>

linyueqian and others added 5 commits December 24, 2025 23:14

Merge branch 'vllm-project:main' into feat/stable-audio-open

4ff06b9

Merge branch 'main' into feat/stable-audio-open

107154a

Signed-off-by: Yueqian Lin <[email protected]>

fix pre-commit

3cafb2a

Signed-off-by: linyueqian <[email protected]>

fix: add missing closing parenthesis in registry.py

86f68e6

Signed-off-by: linyueqian <[email protected]>

Merge branch 'main' into feat/stable-audio-open

5bf507c

hsliuustc0106 reviewed Dec 27, 2025

View reviewed changes

update supported model

3f36de6

Signed-off-by: linyueqian <[email protected]>

hsliuustc0106 added the ready label to trigger buildkite CI label Dec 28, 2025

hsliuustc0106 approved these changes Dec 28, 2025

View reviewed changes

hsliuustc0106 enabled auto-merge (squash) December 28, 2025 02:06

Merge branch 'main' into feat/stable-audio-open

b56dfde

Merge branch 'main' into feat/stable-audio-open

a3c54b5

linyueqian and others added 3 commits January 4, 2026 18:43

Merge branch 'main' into feat/stable-audio-open

7fea674

Retry CI after model access granted

c0b5b6b

Signed-off-by: linyueqian <[email protected]>

Use random weights model for stable audio CI test

b3d282b

Signed-off-by: linyueqian <[email protected]>

auto-merge was automatically disabled January 5, 2026 03:37
Head branch was pushed to by a user without write access

Merge branch 'main' into feat/stable-audio-open

d40e650

Fix stable audio CI test with random weights model

babb068

Signed-off-by: linyueqian <[email protected]>

linyueqian added 2 commits January 5, 2026 00:02

Increase audio test timeout to 20 minutes

af66705

Signed-off-by: linyueqian <[email protected]>

Add torchsde dependency for Stable Audio scheduler

9c40203

Signed-off-by: linyueqian <[email protected]>

hsliuustc0106 approved these changes Jan 7, 2026

View reviewed changes

hsliuustc0106 merged commit 0cb7ec6 into vllm-project:main Jan 7, 2026
7 checks passed

david6666666 mentioned this pull request Jan 9, 2026

[Feature]: vLLM-Omni model owner JiusiServe/vllm-omni#25

Open

17 tasks

david6666666 mentioned this pull request Jan 16, 2026

vLLM-Omni Model Support #808

Open

57 tasks

ekagra-ranjan mentioned this pull request Feb 6, 2026

Add online serving to Stable Audio Diffusion and introduce v1/audio/generate endpoint #1255

Open

5 tasks

		num_inference_steps = req.num_inference_steps or num_inference_steps
		guidance_scale = req.guidance_scale if req.guidance_scale > 1.0 else guidance_scale

Comments

Conversation

linyueqian commented Dec 16, 2025 • edited by hsliuustc0106 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Dec 16, 2025

Choose a reason for hiding this comment

Uh oh!

hsliuustc0106 commented Dec 19, 2025

Uh oh!

linyueqian commented Dec 22, 2025

Uh oh!

hsliuustc0106 commented Dec 23, 2025

Uh oh!

david6666666 commented Dec 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hsliuustc0106 left a comment

Choose a reason for hiding this comment

Uh oh!

hsliuustc0106 left a comment

Choose a reason for hiding this comment

Uh oh!

linyueqian commented Dec 31, 2025

Uh oh!

hsliuustc0106 commented Jan 4, 2026

Uh oh!

hsliuustc0106 commented Jan 4, 2026

Uh oh!

hsliuustc0106 commented Jan 5, 2026

Uh oh!

princepride commented Jan 5, 2026

Uh oh!

ZJY0516 commented Jan 5, 2026

Uh oh!

ZJY0516 commented Jan 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

linyueqian commented Jan 5, 2026

Uh oh!

linyueqian commented Jan 5, 2026

Uh oh!

hsliuustc0106 commented Jan 7, 2026

Uh oh!

Uh oh!

hsliuustc0106 commented Jan 7, 2026

Uh oh!

linyueqian commented Jan 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

linyueqian commented Dec 16, 2025 •

edited by hsliuustc0106

Loading

david6666666 commented Dec 24, 2025 •

edited

Loading

ZJY0516 commented Jan 5, 2026 •

edited

Loading