[Bugfix] Add Fake VllmConfig on NPU and add slicing/tiling args in Qwen-Image by gcanlin · Pull Request #145 · vllm-project/vllm-omni

gcanlin · 2025-12-01T11:45:17Z

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

In vllm-ascend v0.11.0rc2, there are some usages of vllm_config.model_config.hf_config.to_dict() that do not check whether vllm_config is null.
Use slicing and tiling to avoid OOM in NPU.

Test Plan

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

Signed-off-by: gcanlin <[email protected]>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2025-12-01T11:48:48Z

vllm_omni/diffusion/worker/npu/npu_worker.py

+                return "fake"
+
+        vllm_config = VllmConfig(model_config=ModelConfig(hf_config=FakePretrainedConfig()))


Instantiate ModelConfig without required model

NPU worker initialization now builds ModelConfig(hf_config=FakePretrainedConfig()) without supplying the mandatory model path. ModelConfig’s constructor/post-init requires a model identifier and will raise a TypeError before set_current_vllm_config runs, so the NPU worker process crashes during startup and diffusion on NPU never initializes.

Useful? React with 👍 / 👎.

gcanlin · 2025-12-01T11:58:04Z

@Gaohan123 @ZJY0516 Please also take a look. Thanks! I think that we could also consider make use_slicing and use_tiling as common env vars so that if someone meet the OOM issue in any device, they can set them to avoid OOM but maybe lead to some performance loss.

Signed-off-by: gcanlin <[email protected]>

gcanlin · 2025-12-01T12:34:40Z

NPU, Qwen-Image:

NPU, Qwen2.5-Omni-7B:

ZJY0516 · 2025-12-01T12:59:07Z

@Gaohan123 @ZJY0516 Please also take a look. Thanks! I think that we could also consider make use_slicing and use_tiling as common env vars so that if someone meet the OOM issue in any device, they can set them to avoid OOM but maybe lead to some performance loss.

Could you please use env vars in this pr and update related doc

MengqingCao · 2025-12-01T13:07:35Z

vllm_omni/diffusion/registry.py

+            model.vae.use_slicing = True
+            model.vae.use_tiling = True


Using slicing will reduce some performance, but Qwen-Image will oom on A2 machine without this. Thus we add this for NPU.

@gcanlin @ZJY0516 BTW, Maybe we can add some conditions on enabling this? Not sure if slicing is also needed when inferencing high resolution images on GPU

I think we can take it as input argument for model initialization. Both GPU and NPU or other hardware can use it. Later for diaggregation serving, it can be an argument for diffuion stage.

MengqingCao · 2025-12-01T13:11:29Z

Could you please use env vars in this pr and update related doc

IMO, adding an env var is better if GPU also need this, we don't wanna introduce npu specific env var except there is no way to address it. WDYT?

ZJY0516 · 2025-12-01T13:14:09Z

Could you please use env vars in this pr and update related doc

IMO, adding an env var is better if GPU also need this, we don't wanna introduce npu specific env var except there is no way to address it. WDYT?

I think it's also needed by gpu. cc @SamitHuang

gcanlin · 2025-12-01T13:23:01Z

Also cc @Gaohan123 @hsliuustc0106 @ywang96, seems that it needs to introduce the env module, which affects the user-facing side.

Signed-off-by: gcanlin <[email protected]>

vllm_omni/diffusion/registry.py

Signed-off-by: gcanlin <[email protected]>

Gaohan123 · 2025-12-01T13:19:28Z

vllm_omni/diffusion/registry.py

+            model.vae.use_slicing = True
+            model.vae.use_tiling = True


I think we can take it as input argument for model initialization. Both GPU and NPU or other hardware can use it. Later for diaggregation serving, it can be an argument for diffuion stage.

vllm_omni/diffusion/registry.py

Signed-off-by: gcanlin <[email protected]>

Gaohan123

LGTM, thanks!

Yikun · 2025-12-01T16:37:25Z

@ywang96 Please make sure this PR merged before first RC, thanks

…en-Image (vllm-project#145) Signed-off-by: gcanlin <[email protected]> Signed-off-by: Prajwal A <[email protected]>

…en-Image (vllm-project#145) Signed-off-by: gcanlin <[email protected]>

gcanlin added 2 commits December 1, 2025 11:18

[Bugfix] Create a fake VllmConfig to prevent null-value error in NPU

e893624

Signed-off-by: gcanlin <[email protected]>

[Bugfix] Use slicing and tiling in NPU to avoid OOM

d73ad6f

Signed-off-by: gcanlin <[email protected]>

gcanlin requested a review from hsliuustc0106 as a code owner December 1, 2025 11:45

fix lint

6553505

Signed-off-by: gcanlin <[email protected]>

chatgpt-codex-connector bot reviewed Dec 1, 2025

View reviewed changes

gcanlin added 2 commits December 1, 2025 12:13

Merge branch 'main' into npu-bugfix

0299e4c

[Docs] Add the share memory config

a24604d

Signed-off-by: gcanlin <[email protected]>

MengqingCao reviewed Dec 1, 2025

View reviewed changes

Add two args for vae optimization

fd78db0

Signed-off-by: gcanlin <[email protected]>

gcanlin changed the title ~~[Bugfix] Add Fake VllmConfig and slicing/tiling in Qwen-Image in NPU~~ [Bugfix] Add Fake VllmConfig on NPU and add slicing/tiling args in Qwen-Image Dec 1, 2025

MengqingCao reviewed Dec 1, 2025

View reviewed changes

vllm_omni/diffusion/registry.py Outdated Show resolved Hide resolved

gcanlin added 2 commits December 1, 2025 14:33

Minor fix

0b9eff4

Signed-off-by: gcanlin <[email protected]>

mkdocs

3963cd4

Signed-off-by: gcanlin <[email protected]>

Gaohan123 reviewed Dec 1, 2025

View reviewed changes

update

59f1d4b

Signed-off-by: gcanlin <[email protected]>

Gaohan123 approved these changes Dec 1, 2025

View reviewed changes

Gaohan123 enabled auto-merge (squash) December 1, 2025 15:40

Gaohan123 removed the request for review from hsliuustc0106 December 1, 2025 15:50

Yikun approved these changes Dec 1, 2025

View reviewed changes

Merge branch 'main' into npu-bugfix

4d85a67

Gaohan123 merged commit 7e60c49 into vllm-project:main Dec 1, 2025
4 checks passed

david6666666 mentioned this pull request Dec 5, 2025

[Bug]: QwenImage offline inference stuck at downloading qwen/qwen3-0.6B in NPU（A2 910B） #208

Closed

1 task

May-Z-H mentioned this pull request Jan 8, 2026

[Bug]: Qwen-Image NPU inference, After adding inference warm-up, the second inference gets stuck. #705

Open

1 task

princepride pushed a commit to princepride/vllm-omni that referenced this pull request Jan 10, 2026

[Bugfix] Add Fake VllmConfig on NPU and add slicing/tiling args in Qw…

1208237

…en-Image (vllm-project#145) Signed-off-by: gcanlin <[email protected]>

		return "fake"

		vllm_config = VllmConfig(model_config=ModelConfig(hf_config=FakePretrainedConfig()))

Conversation

gcanlin commented Dec 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

gcanlin commented Dec 1, 2025

Uh oh!

gcanlin commented Dec 1, 2025

Uh oh!

ZJY0516 commented Dec 1, 2025

Uh oh!

MengqingCao Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

Gaohan123 Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

MengqingCao commented Dec 1, 2025

Uh oh!

ZJY0516 commented Dec 1, 2025

Uh oh!

gcanlin commented Dec 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Gaohan123 Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Gaohan123 left a comment

Choose a reason for hiding this comment

Uh oh!

Yikun commented Dec 1, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

gcanlin commented Dec 1, 2025 •

edited

Loading

gcanlin commented Dec 1, 2025 •

edited

Loading