Skip to content

Comments

[Model] Add Stable Audio Open support for text-to-audio generation#331

Merged
hsliuustc0106 merged 19 commits intovllm-project:mainfrom
linyueqian:feat/stable-audio-open
Jan 7, 2026
Merged

[Model] Add Stable Audio Open support for text-to-audio generation#331
hsliuustc0106 merged 19 commits intovllm-project:mainfrom
linyueqian:feat/stable-audio-open

Conversation

@linyueqian
Copy link
Contributor

@linyueqian linyueqian commented Dec 16, 2025

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

Add support for Stable Audio Open (stabilityai/stable-audio-open-1.0) for text-to-audio generation in vLLM-Omni.

Test Plan

python examples/offline_inference/text_to_audio/text_to_audio.py --model stabilityai/stable-audio-open-1.0 --prompt "The sound of a dog barking" --output dog_barking.wav

Test Result

dog_barking.wav


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

Signed-off-by: linyueqian <[email protected]>
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines 390 to 392
num_inference_steps = req.num_inference_steps or num_inference_steps
guidance_scale = req.guidance_scale if req.guidance_scale > 1.0 else guidance_scale

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Honor guidance_scale ≤1 from requests

guidance_scale from the request is only applied when it exceeds 1.0; otherwise the pipeline falls back to the default argument (7.0). This prevents callers from disabling classifier-free guidance or using a lower scale (e.g., requesting 0 or 1 via Omni.generate), because the model will always run CFG at scale 7 regardless of what was requested, making unconditional/low-guidance Stable Audio generation impossible.

Useful? React with 👍 / 👎.

@david6666666 david6666666 linked an issue Dec 16, 2025 that may be closed by this pull request
1 task
@hsliuustc0106
Copy link
Collaborator

add test ci
add online serving
add streaming test after output stream is supported #367

@linyueqian
Copy link
Contributor Author

add test ci add online serving add streaming test after output stream is supported #367

@hsliuustc0106 For online serving of Stable Audio, is a simple FastAPI wrapper around OmniDiffusion the right approach, or is there existing/planned infrastructure I should use?

@hsliuustc0106
Copy link
Collaborator

add test ci add online serving add streaming test after output stream is supported #367

@hsliuustc0106 For online serving of Stable Audio, is a simple FastAPI wrapper around OmniDiffusion the right approach, or is there existing/planned infrastructure I should use?

@david6666666 I notice you are working on the video online serving, is it possible to support audio output as well? #437

@david6666666
Copy link
Collaborator

david6666666 commented Dec 24, 2025

is it possible to support audio output as well?

I think we need to open another PR to enable online audio serving, after this text-to-audio model is supported.

Copy link
Collaborator

@hsliuustc0106 hsliuustc0106 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add supported models

Signed-off-by: linyueqian <[email protected]>
@hsliuustc0106 hsliuustc0106 added the ready label to trigger buildkite CI label Dec 28, 2025
Copy link
Collaborator

@hsliuustc0106 hsliuustc0106 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@hsliuustc0106 hsliuustc0106 enabled auto-merge (squash) December 28, 2025 02:06
@linyueqian
Copy link
Contributor Author

@hsliuustc0106 is this pr ready to be merged now?

@hsliuustc0106
Copy link
Collaborator

fix ci please

@hsliuustc0106
Copy link
Collaborator

@Bounty-hunter PTAL

@hsliuustc0106
Copy link
Collaborator

fix ci please

auto-merge was automatically disabled January 5, 2026 03:37

Head branch was pushed to by a user without write access

@princepride
Copy link
Collaborator

@ZJY0516 I checked my code, and it doesn't seem to be my problem. I only added a special case check to determine whether it's a diffusion model in the end of is_diffusion_model:

# Bagel is not a diffusers pipeline (no model_index.json), but is still a
# diffusion-style model in vllm-omni. Detect it via config.json.
return _looks_like_bagel(model_name)
.
But I noticed that the model stabilityai/stable-audio-open-1.0 used for testing isn't a directly accessible model, an application is required.

@ZJY0516
Copy link
Collaborator

ZJY0516 commented Jan 5, 2026

But I noticed that the model stabilityai/stable-audio-open-1.0 used for testing isn't a directly accessible model, an application is required.

Thanks. I found this too. This works fine on my local machine.

@ZJY0516
Copy link
Collaborator

ZJY0516 commented Jan 5, 2026

I noticed some warnings. Could you please solve this? @linyueqian

You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565

/home/zjy/code/vllm-omni/.venv/lib/python3.12/site-packages/torch/nn/utils/weight_norm.py:144: FutureWarning: `torch.nn.utils.weight_norm` is deprecated in favor of `torch.nn.utils.parametrizations.weight_norm`.
  WeightNorm.apply(module, name, dim)

/home/zjy/code/vllm-omni/.venv/lib/python3.12/site-packages/torchsde/_brownian/brownian_interval.py:608: UserWarning: Should have tb<=t1 but got tb=500.00006103515625 and t1=500.0.
  warnings.warn(f"Should have {tb_name}<=t1 but got {tb_name}={tb} and t1={self._end}.")
/home/zjy/code/vllm-omni/.venv/lib/python3.12/site-packages/torchsde/_brownian/brownian_interval.py:599: UserWarning: Should have ta>=t0 but got ta=0.29999998211860657 and t0=0.3.
  warnings.warn(f"Should have ta>=t0 but got ta={ta} and t0={self._start}.")
/home/zjy/code/vllm-omni/.venv/lib/python3.12/site-packages/torchsde/_brownian/brownian_interval.py:599: UserWarning: Should have ta>=t0 but got ta=0.0 and t0=0.3.
  warnings.warn(f"Should have ta>=t0 but got ta={ta} and t0={self._start}.")
/home/zjy/code/vllm-omni/.venv/lib/python3.12/site-packages/torchsde/_brownian/brownian_interval.py:602: UserWarning: Should have tb>=t0 but got tb=0.29999998211860657 and t0=0.3.
  warnings.warn(f"Should have {tb_name}>=t0 but got {tb_name}={tb} and t0={self._start}.")

@linyueqian
Copy link
Contributor Author

I noticed some warnings. Could you please solve this? @linyueqian

You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565

/home/zjy/code/vllm-omni/.venv/lib/python3.12/site-packages/torch/nn/utils/weight_norm.py:144: FutureWarning: `torch.nn.utils.weight_norm` is deprecated in favor of `torch.nn.utils.parametrizations.weight_norm`.
  WeightNorm.apply(module, name, dim)

/home/zjy/code/vllm-omni/.venv/lib/python3.12/site-packages/torchsde/_brownian/brownian_interval.py:608: UserWarning: Should have tb<=t1 but got tb=500.00006103515625 and t1=500.0.
  warnings.warn(f"Should have {tb_name}<=t1 but got {tb_name}={tb} and t1={self._end}.")
/home/zjy/code/vllm-omni/.venv/lib/python3.12/site-packages/torchsde/_brownian/brownian_interval.py:599: UserWarning: Should have ta>=t0 but got ta=0.29999998211860657 and t0=0.3.
  warnings.warn(f"Should have ta>=t0 but got ta={ta} and t0={self._start}.")
/home/zjy/code/vllm-omni/.venv/lib/python3.12/site-packages/torchsde/_brownian/brownian_interval.py:599: UserWarning: Should have ta>=t0 but got ta=0.0 and t0=0.3.
  warnings.warn(f"Should have ta>=t0 but got ta={ta} and t0={self._start}.")
/home/zjy/code/vllm-omni/.venv/lib/python3.12/site-packages/torchsde/_brownian/brownian_interval.py:602: UserWarning: Should have tb>=t0 but got tb=0.29999998211860657 and t0=0.3.
  warnings.warn(f"Should have {tb_name}>=t0 but got {tb_name}={tb} and t0={self._start}.")

I've fixed the first warning (T5Tokenizer legacy) by switching to T5TokenizerFast in this commit. And I think the remaining warnings are from third-party dependencies that cannot be fixed here.

@linyueqian
Copy link
Contributor Author

@hsliuustc0106 i think the ci issue is fixed here.

@hsliuustc0106
Copy link
Collaborator

@hsliuustc0106 i think the ci issue is fixed here.

thanks for contributing the first text-2-audio model

@hsliuustc0106 hsliuustc0106 merged commit 0cb7ec6 into vllm-project:main Jan 7, 2026
7 checks passed
@hsliuustc0106
Copy link
Collaborator

please remember to submit a PR to vllm-project/recipe

Shirley125 pushed a commit to Shirley125/vllm-omni that referenced this pull request Jan 9, 2026
@linyueqian
Copy link
Contributor Author

please remember to submit a PR to vllm-project/recipe

vllm-project/recipes#193

princepride pushed a commit to princepride/vllm-omni that referenced this pull request Jan 10, 2026
sniper35 pushed a commit to sniper35/vllm-omni that referenced this pull request Jan 10, 2026
ZJY0516 pushed a commit to LawJarp-A/vllm-omni that referenced this pull request Jan 10, 2026
@david6666666 david6666666 mentioned this pull request Jan 16, 2026
57 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready label to trigger buildkite CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[New Model]: stabilityai/stable-audio-open-1.0

5 participants