Skip to content

Comments

[Diffusion] Refactor diffusion models weights loading#157

Merged
Isotr0py merged 19 commits intovllm-project:mainfrom
Isotr0py:refactor-diffusion-weight-loader
Dec 6, 2025
Merged

[Diffusion] Refactor diffusion models weights loading#157
Isotr0py merged 19 commits intovllm-project:mainfrom
Isotr0py:refactor-diffusion-weight-loader

Conversation

@Isotr0py
Copy link
Member

@Isotr0py Isotr0py commented Dec 2, 2025

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

  • This PR introduce DiffusersPipelineLoader to load weights under the subfolder of standard diffusers mode repo.
  • This PR also reuses the AutoWeightsLoader from main repo to load weights recursively.

Test Plan

python examples/offline_inference/qwen_image/text_to_image.py
python examples/offline_inference/qwen_image/text_to_image.py --model Tongyi-MAI/Z-Image-Turbo --num_inference_steps 10

Test Result

Both Z-Image and Qwen-Image pipeline can still work.


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

@Isotr0py Isotr0py changed the title [Diffusion] Refactor Qwen-image weight loader [Diffusion] Refactor diffusion models weights loading Dec 3, 2025
Signed-off-by: Isotr0py <[email protected]>
Signed-off-by: Isotr0py <[email protected]>
Signed-off-by: Isotr0py <[email protected]>
Signed-off-by: Isotr0py <[email protected]>
Signed-off-by: Isotr0py <[email protected]>
@Isotr0py Isotr0py marked this pull request as ready for review December 3, 2025 17:06
@chatgpt-codex-connector
Copy link

The account who enabled Codex for this repo no longer has access to Codex. Please contact the admins of this repo to enable Codex again.

Comment on lines +154 to +162
self.weights_sources = [
DiffusersPipelineLoader.ComponentSource(
model_or_path=od_config.model,
subfolder="transformer",
revision=None,
prefix="transformer.",
fall_back_to_pt=True,
)
]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should also unifiy text_encoder, vae and other submodule's weights loading along this way in following PRs.

Otherwise AutoModel.from_pretrained has loaded their weights at model initialization stage here, which will make weights-based parallism integration a bit difficult to be handled.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed

Copy link
Collaborator

@ZJY0516 ZJY0516 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Does this PR support modelscope weight load?

Comment on lines +154 to +162
self.weights_sources = [
DiffusersPipelineLoader.ComponentSource(
model_or_path=od_config.model,
subfolder="transformer",
revision=None,
prefix="transformer.",
fall_back_to_pt=True,
)
]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed

@Isotr0py
Copy link
Member Author

Isotr0py commented Dec 4, 2025

Does this PR support modelscope weight load?

Yes, it works with the modelscope patch omni_snapshot_download at entrypont level:

def omni_snapshot_download(model_id) -> str:
# TODO: this is just a workaround for quickly use modelscope, we should support
# modelscope in weight loading feature instead of using `snapshot_download`
if os.environ.get("VLLM_USE_MODELSCOPE", False):
from modelscope.hub.snapshot_download import snapshot_download
return snapshot_download(model_id)
else:
return _dummy_snapshot_download(model_id)

It still can't work without this patch, because some vllm-omni specific weights utils functions don't have modelscope workaround:

def download_weights_from_hf_specific(
model_name_or_path: str,
cache_dir: Optional[str],
allow_patterns: list[str],
revision: Optional[str] = None,
ignore_patterns: Optional[Union[str, list[str]]] = None,
) -> str:

@ZJY0516
Copy link
Collaborator

ZJY0516 commented Dec 4, 2025

Does this PR support modelscope weight load?

Yes, it works with the modelscope patch omni_snapshot_download at entrypont level:

def omni_snapshot_download(model_id) -> str:
# TODO: this is just a workaround for quickly use modelscope, we should support
# modelscope in weight loading feature instead of using `snapshot_download`
if os.environ.get("VLLM_USE_MODELSCOPE", False):
from modelscope.hub.snapshot_download import snapshot_download
return snapshot_download(model_id)
else:
return _dummy_snapshot_download(model_id)

It still can't work without this patch, because some vllm-omni specific weights utils functions don't have modelscope workaround:

def download_weights_from_hf_specific(
model_name_or_path: str,
cache_dir: Optional[str],
allow_patterns: list[str],
revision: Optional[str] = None,
ignore_patterns: Optional[Union[str, list[str]]] = None,
) -> str:

Do you know does vllm already provides similar modelscope-compatible utilities? also cc @MengqingCao

Signed-off-by: Isotr0py <[email protected]>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it will be good to show the memory usage and time cost for model loading, similar to vllm

(EngineCore_DP0 pid=2964220) INFO 12-04 06:28:57 [default_loader.py:267] Loading weights took 3.21 seconds
(EngineCore_DP0 pid=2964220) INFO 12-04 06:28:57 [gpu_model_runner.py:2653] Model loading took 1.6714 GiB and 5.854078 seconds

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 22f0f84

INFO 12-04 17:45:34 [parallel_state.py:1208] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, TP rank 0, EP rank 0
INFO:vllm_omni.diffusion.worker.gpu_worker:Worker 0: Initialized device and distributed environment.
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:04<00:00,  1.58s/it]
WARNING 12-04 17:45:39 [__init__.py:755] Current vLLM config is not set.
Loading safetensors checkpoint shards:   0% Completed | 0/3 [00:00<?, ?it/s]
Loading safetensors checkpoint shards:  33% Completed | 1/3 [00:03<00:06,  3.48s/it]
Loading safetensors checkpoint shards:  67% Completed | 2/3 [00:06<00:03,  3.34s/it]
Loading safetensors checkpoint shards: 100% Completed | 3/3 [00:08<00:00,  2.50s/it]
Loading safetensors checkpoint shards: 100% Completed | 3/3 [00:08<00:00,  2.74s/it]

INFO:vllm_omni.diffusion.model_loader.diffusers_loader:Loading weights took 8.44 seconds
INFO:vllm_omni.diffusion.worker.gpu_worker:Model loading took 19.2180 GiB and 14.825355 seconds
INFO:vllm_omni.diffusion.worker.gpu_worker:Worker 0: Model loaded successfully.
INFO:vllm_omni.diffusion.worker.gpu_worker:Worker 0: Scheduler loop started.
INFO:vllm_omni.diffusion.worker.gpu_worker:Worker 0 ready to receive requests via shared memory

Signed-off-by: Isotr0py <[email protected]>
Signed-off-by: Isotr0py <[email protected]>
Copy link
Collaborator

@ZJY0516 ZJY0516 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, generally

@Isotr0py Isotr0py enabled auto-merge (squash) December 6, 2025 07:25
@Isotr0py Isotr0py merged commit 059f45e into vllm-project:main Dec 6, 2025
4 checks passed
@Isotr0py Isotr0py deleted the refactor-diffusion-weight-loader branch December 6, 2025 11:15
LawJarp-A pushed a commit to LawJarp-A/vllm-omni that referenced this pull request Dec 12, 2025
LawJarp-A pushed a commit to LawJarp-A/vllm-omni that referenced this pull request Dec 12, 2025
faaany pushed a commit to faaany/vllm-omni that referenced this pull request Dec 19, 2025
princepride pushed a commit to princepride/vllm-omni that referenced this pull request Jan 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants