Skip to content

[Bugfix] Add Fake VllmConfig on NPU and add slicing/tiling args in Qwen-Image#145

Merged
Gaohan123 merged 10 commits intovllm-project:mainfrom
gcanlin:npu-bugfix
Dec 1, 2025
Merged

[Bugfix] Add Fake VllmConfig on NPU and add slicing/tiling args in Qwen-Image#145
Gaohan123 merged 10 commits intovllm-project:mainfrom
gcanlin:npu-bugfix

Conversation

@gcanlin
Copy link
Contributor

@gcanlin gcanlin commented Dec 1, 2025

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

  • In vllm-ascend v0.11.0rc2, there are some usages of vllm_config.model_config.hf_config.to_dict() that do not check whether vllm_config is null.
  • Use slicing and tiling to avoid OOM in NPU.

Test Plan

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

Signed-off-by: gcanlin <[email protected]>
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines 60 to 62
return "fake"

vllm_config = VllmConfig(model_config=ModelConfig(hf_config=FakePretrainedConfig()))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Instantiate ModelConfig without required model

NPU worker initialization now builds ModelConfig(hf_config=FakePretrainedConfig()) without supplying the mandatory model path. ModelConfig’s constructor/post-init requires a model identifier and will raise a TypeError before set_current_vllm_config runs, so the NPU worker process crashes during startup and diffusion on NPU never initializes.

Useful? React with 👍 / 👎.

@gcanlin
Copy link
Contributor Author

gcanlin commented Dec 1, 2025

@Gaohan123 @ZJY0516 Please also take a look. Thanks! I think that we could also consider make use_slicing and use_tiling as common env vars so that if someone meet the OOM issue in any device, they can set them to avoid OOM but maybe lead to some performance loss.

@gcanlin
Copy link
Contributor Author

gcanlin commented Dec 1, 2025

NPU, Qwen-Image:

image

NPU, Qwen2.5-Omni-7B:
image

@ZJY0516
Copy link
Collaborator

ZJY0516 commented Dec 1, 2025

@Gaohan123 @ZJY0516 Please also take a look. Thanks! I think that we could also consider make use_slicing and use_tiling as common env vars so that if someone meet the OOM issue in any device, they can set them to avoid OOM but maybe lead to some performance loss.

Could you please use env vars in this pr and update related doc

Comment on lines 29 to 30
model.vae.use_slicing = True
model.vae.use_tiling = True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using slicing will reduce some performance, but Qwen-Image will oom on A2 machine without this. Thus we add this for NPU.

@gcanlin @ZJY0516 BTW, Maybe we can add some conditions on enabling this? Not sure if slicing is also needed when inferencing high resolution images on GPU

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can take it as input argument for model initialization. Both GPU and NPU or other hardware can use it. Later for diaggregation serving, it can be an argument for diffuion stage.

@MengqingCao
Copy link
Contributor

Could you please use env vars in this pr and update related doc

IMO, adding an env var is better if GPU also need this, we don't wanna introduce npu specific env var except there is no way to address it. WDYT?

@ZJY0516
Copy link
Collaborator

ZJY0516 commented Dec 1, 2025

Could you please use env vars in this pr and update related doc

IMO, adding an env var is better if GPU also need this, we don't wanna introduce npu specific env var except there is no way to address it. WDYT?

I think it's also needed by gpu. cc @SamitHuang

@gcanlin
Copy link
Contributor Author

gcanlin commented Dec 1, 2025

Also cc @Gaohan123 @hsliuustc0106 @ywang96, seems that it needs to introduce the env module, which affects the user-facing side.

@gcanlin gcanlin changed the title [Bugfix] Add Fake VllmConfig and slicing/tiling in Qwen-Image in NPU [Bugfix] Add Fake VllmConfig on NPU and add slicing/tiling args in Qwen-Image Dec 1, 2025
Signed-off-by: gcanlin <[email protected]>
Signed-off-by: gcanlin <[email protected]>
Comment on lines 29 to 30
model.vae.use_slicing = True
model.vae.use_tiling = True
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can take it as input argument for model initialization. Both GPU and NPU or other hardware can use it. Later for diaggregation serving, it can be an argument for diffuion stage.

Signed-off-by: gcanlin <[email protected]>
Copy link
Collaborator

@Gaohan123 Gaohan123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@Gaohan123 Gaohan123 enabled auto-merge (squash) December 1, 2025 15:40
@Gaohan123 Gaohan123 removed the request for review from hsliuustc0106 December 1, 2025 15:50
@Yikun
Copy link
Member

Yikun commented Dec 1, 2025

@ywang96 Please make sure this PR merged before first RC, thanks

@Gaohan123 Gaohan123 merged commit 7e60c49 into vllm-project:main Dec 1, 2025
4 checks passed
LawJarp-A pushed a commit to LawJarp-A/vllm-omni that referenced this pull request Dec 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants