Skip to content

Comments

[Platform] Add supports_torch_inductor interface#1108

Merged
ZJY0516 merged 3 commits intovllm-project:mainfrom
gcanlin:eager-default
Jan 30, 2026
Merged

[Platform] Add supports_torch_inductor interface#1108
ZJY0516 merged 3 commits intovllm-project:mainfrom
gcanlin:eager-default

Conversation

@gcanlin
Copy link
Contributor

@gcanlin gcanlin commented Jan 30, 2026

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

More and more users on NPU are facing the torch compile failing problem. Before we support multi-hardware torch compile backend, we need disable it by default for better users experience.

In the current code, we set enforce_eager = False by default:

class OmniDiffusionConfig:
    ....
    enforce_eager: bool = False

Test Plan

When users disable enforce_eager parameter(this is the default behaviour), they wouldn't get the error on NPU. And the log will give the WARNING.

python text_to_image.py \
  --model Tongyi-MAI/Z-Image-Turbo \
  --prompt "a cup of coffee on the table" \
  --seed 42 \
  --cfg_scale 4.0 \
  --num_images_per_prompt 1 \
  --num_inference_steps 50 \
  --height 1024 \
  --width 1024 \
  --output outputs/coffee.png

Test Result

[Stage-0] INFO 01-30 08:38:47 [diffusers_loader.py:227] Loading weights took 6.47 seconds
[Stage-0] INFO 01-30 08:38:47 [diffusion_model_runner.py:100] Model loading took 53.7445 GiB and 13.569574 seconds
[Stage-0] INFO 01-30 08:38:47 [diffusion_model_runner.py:105] Model runner: Model loaded successfully.
[Stage-0] WARNING 01-30 08:38:47 [diffusion_model_runner.py:134] Model runner: Platform npu does not support torch inductor, skipping torch.compile.
[Stage-0] INFO 01-30 08:38:47 [cache_dit_backend.py:379] Enabling cache-dit on transformer: Fn=1, Bn=0, W=4,

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

@gcanlin
Copy link
Contributor Author

gcanlin commented Jan 30, 2026

cc @ZJY0516 @faaany @tjtanaa

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e2b032dbd4

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

dynamic=True,
from vllm_omni.platforms import current_omni_platform

if current_omni_platform.supports_torch_inductor():

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Guard against UnspecifiedOmniPlatform missing method

When no platform plugin is detected, current_omni_platform is an UnspecifiedOmniPlatform (see resolve_current_omni_platform_cls_qualname in vllm_omni/platforms/__init__.py), and that class does not override supports_torch_inductor (the base method in vllm_omni/platforms/interface.py raises NotImplementedError). With the new call here, any CPU/unknown environment running with enforce_eager=False will now crash during initialization instead of just skipping compilation. Consider providing a default implementation on UnspecifiedOmniPlatform (e.g., False) or guarding the call before invoking the method.

Useful? React with 👍 / 👎.

@david6666666 david6666666 added this to the v0.14.0 milestone Jan 30, 2026
@ZJY0516 ZJY0516 added the ready label to trigger buildkite CI label Jan 30, 2026
@ZJY0516
Copy link
Collaborator

ZJY0516 commented Jan 30, 2026

Please also verify that this change does not affect CUDA platform behavior.

Signed-off-by: gcanlin <[email protected]>
@gcanlin
Copy link
Contributor Author

gcanlin commented Jan 30, 2026

Please also verify that this change does not affect CUDA platform behavior.

CUDA: compiled by default.

python text_to_image.py \
  --model Tongyi-MAI/Z-Image-Turbo \
  --prompt "a cup of coffee on the table" \
  --seed 42 \
  --cfg_scale 4.0 \
  --num_images_per_prompt 1 \
  --num_inference_steps 50 \
  --height 1024 \
  --width 1024 \
  --output outputs/coffee.png
[Stage-0] INFO 01-30 13:40:23 [diffusers_loader.py:227] Loading weights took 67.10 seconds
[Stage-0] INFO 01-30 13:40:24 [diffusion_model_runner.py:101] Model loading took 19.1516 GiB and 91.199319 seconds
[Stage-0] INFO 01-30 13:40:24 [diffusion_model_runner.py:106] Model runner: Model loaded successfully.
[Stage-0] INFO 01-30 13:40:24 [diffusion_model_runner.py:129] Model runner: Model compiled with torch.compile.
[Stage-0] INFO 01-30 13:40:24 [diffusion_model_runner.py:144] Model runner: Initialization complete.
[Stage-0] INFO 01-30 13:40:24 [manager.py:90] Initializing DiffusionLoRAManager: device=cuda:0, dtype=torch.bfloat16, max_cached_adapters=1, static_lora_path=None

@ZJY0516 ZJY0516 enabled auto-merge (squash) January 30, 2026 13:48
@faaany
Copy link
Contributor

faaany commented Jan 30, 2026

@hsliuustc0106 hsliuustc0106 added the Hardware Plugin support different hardware beyond cuda label Jan 30, 2026
@ZJY0516 ZJY0516 merged commit 4eeea68 into vllm-project:main Jan 30, 2026
7 checks passed
dongbo910220 pushed a commit to dongbo910220/vllm-omni that referenced this pull request Feb 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Hardware Plugin support different hardware beyond cuda ready label to trigger buildkite CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants