[Wan2.2] Optimize memory usage with conditional transformer loading by faaany · Pull Request #980 · vllm-project/vllm-omni

faaany · 2026-01-27T09:23:16Z

Purpose

Wan2.2 uses a two-stage denoising process with two separate transformer models:

High-noise stage (transformer): Handles early denoising steps (t >= boundary_timestep)
Low-noise stage (transformer_2): Handles final refinement steps (t < boundary_timestep)

Loading both transformers simultaneously causes OOM issues on systems with limited GPU memory, as each transformer can consume ~27 GB of memory.

This PR implements conditional transformer loading based on the boundary_ratio parameter, following the approach in this PR from HF diffusers: huggingface/diffusers#12024 .

The pipeline now intelligently loads only the transformers that will actually be used:

`boundary_ratio`	Loaded Models	Memory Savings	Use Case
`0.0`	Only `transformer` (high-noise)	~30%	Use only high-noise stage
`1.0`	Only `transformer_2` (low-noise)	~30%	Use only low-noise stage
`0.0 < x < 1.0`	Both transformers	None	Standard two-stage pipeline
`None`	Both transformers	None	Original behavior

Test Plan

python text_to_video.py   --prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage."   --negative_prompt "<optional quality filter>"   --height 480   --width 640   --num_frames 33   --guidance_scale 4.0   --guidance_scale_high 3.0   --num_inference_steps 40   --fps 16   --output t2v_out.mp4 --boundary_ratio 1

Test Result

Memory usage before:

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A           15484      C   /usr/bin/python3                        426MiB |
|    0   N/A  N/A           15735      C   /usr/bin/python3                      50688MiB |
+-----------------------------------------------------------------------------------------+

Memory usage after:

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A           24779      C   /usr/bin/python3                        426MiB |
|    0   N/A  N/A           25028      C   /usr/bin/python3                      78132MiB |
+-----------------------------------------------------------------------------------------+

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft.

Signed-off-by: Lin, Fanli <[email protected]>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 56e9f566ee

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

vllm_omni/diffusion/models/wan2_2/pipeline_wan2_2.py

faaany · 2026-01-27T09:26:58Z

Please help review this PR, thanks a lot! @linyueqian @hsliuustc0106

Signed-off-by: Lin, Fanli <[email protected]>

… wan2.2-oom

hsliuustc0106 · 2026-01-27T09:54:53Z

@ZJY0516 @SamitHuang @wtomin

wtomin · 2026-01-28T06:32:19Z

There is an e2e test in tests/e2e/offline_inference/test_t2v_model.py. Can you test it with your new feature?

ZJY0516 · 2026-01-28T06:39:33Z

vllm_omni/diffusion/models/wan2_2/pipeline_wan2_2.py

-            if boundary_timestep is not None and t < boundary_timestep and self.transformer_2 is not None:
-                current_model = self.transformer_2
+
+            # Select model based on timestep and boundary_ratio


Sorry, I didn't understand this. When you don't need the first transformer, how to offload it?

When boundary_ratio is set to 1.0, self.transformer will be None. In our current offload logic, None module will be skipped (see

vllm-omni/vllm_omni/diffusion/offload.py

Line 151 in c4220f0

continue

), leaving only self.transformer_2 in the dit_modules list. In this case, the memory-saving strategy still works, because DiT modules (no matter 1 or 2) and encoders are mutual exclusive.

faaany · 2026-01-28T07:47:34Z

There is an e2e test in tests/e2e/offline_inference/test_t2v_model.py. Can you test it with your new feature?

Yes, the e2e test passes:

========================================== short test summary info ===========================================
PASSED tests/e2e/offline_inference/test_t2v_model.py::test_video_diffusion_model[Wan-AI/Wan2.2-T2V-A14B-Diffusers]
================================== 1 passed, 2 warnings in 75.59s (0:01:15) ==================================

Signed-off-by: Lin, Fanli <[email protected]>

… wan2.2-oom

SamitHuang · 2026-01-28T12:13:15Z

I wonder whether we will get bad generation results when boundary_ratio is set to 0.0 or 1.0.
The default value 0.875 seems to be defined according to the training config.

Signed-off-by: Lin, Fanli <[email protected]>

faaany · 2026-01-28T13:56:52Z

I wonder whether we will get bad generation results when boundary_ratio is set to 0.0 or 1.0. The default value 0.875 seems to be defined according to the training config.

Indeed, it doesn't make practical sense to set boundary_ratio to 0, because it would produce worse quality. But setting boundary_ratio to 1 actually creates a higher-quality video based on my experiment. Also, many people are saying that the high-noise transformer is not useful (e.g. huggingface/diffusers#12019).

Doc updated to make it clearer for users.

examples/offline_inference/text_to_video/text_to_video.md

examples/offline_inference/text_to_video/text_to_video.py

Signed-off-by: Samit <[email protected]>

Signed-off-by: Lin, Fanli <[email protected]>

…llm-project#980) Signed-off-by: Lin, Fanli <[email protected]> Signed-off-by: Samit <[email protected]> Co-authored-by: Samit <[email protected]>

faaany added 2 commits January 27, 2026 16:44

add conditional model load

b7cc39e

Signed-off-by: Lin, Fanli <[email protected]>

refactor transformer config

56e9f56

Signed-off-by: Lin, Fanli <[email protected]>

faaany requested a review from hsliuustc0106 as a code owner January 27, 2026 09:23

Merge branch 'main' into wan2.2-oom

0e69f80

chatgpt-codex-connector bot reviewed Jan 27, 2026

View reviewed changes

vllm_omni/diffusion/models/wan2_2/pipeline_wan2_2.py Show resolved Hide resolved

vllm_omni/diffusion/models/wan2_2/pipeline_wan2_2.py Show resolved Hide resolved

faaany added 2 commits January 27, 2026 17:32

use self.transformer.config

481e75e

Signed-off-by: Lin, Fanli <[email protected]>

Merge branch 'wan2.2-oom' of https://github.com/faaany/vllm-omni into…

639c8d6

… wan2.2-oom

faaany added 2 commits January 28, 2026 09:26

Merge branch 'main' into wan2.2-oom

138cbe5

Merge branch 'main' into wan2.2-oom

e314487

ZJY0516 reviewed Jan 28, 2026

View reviewed changes

faaany added 3 commits January 28, 2026 15:47

Merge branch 'main' into wan2.2-oom

6230103

add doc

2e6ad08

Signed-off-by: Lin, Fanli <[email protected]>

Merge branch 'wan2.2-oom' of https://github.com/faaany/vllm-omni into…

b1dcb3d

… wan2.2-oom

hsliuustc0106 added the ready label to trigger buildkite CI label Jan 28, 2026

Merge branch 'main' into wan2.2-oom

ae45aa3

refine doc

c1902c1

Signed-off-by: Lin, Fanli <[email protected]>

faaany added 2 commits January 28, 2026 21:57

Merge branch 'main' into wan2.2-oom

d24283b

Merge branch 'main' into wan2.2-oom

b60b95d

SamitHuang approved these changes Jan 29, 2026

View reviewed changes

examples/offline_inference/text_to_video/text_to_video.md Outdated Show resolved Hide resolved

examples/offline_inference/text_to_video/text_to_video.md Outdated Show resolved Hide resolved

examples/offline_inference/text_to_video/text_to_video.py Outdated Show resolved Hide resolved

SamitHuang and others added 4 commits January 29, 2026 11:07

Update examples/offline_inference/text_to_video/text_to_video.md

a4cc3ac

Signed-off-by: Samit <[email protected]>

Update examples/offline_inference/text_to_video/text_to_video.py

9c49cce

Signed-off-by: Samit <[email protected]>

fix typo

67045dc

Signed-off-by: Lin, Fanli <[email protected]>

Merge branch 'main' into wan2.2-oom

109ad84

SamitHuang merged commit c7f89ef into vllm-project:main Jan 29, 2026
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

[Wan2.2] Optimize memory usage with conditional transformer loading#980

[Wan2.2] Optimize memory usage with conditional transformer loading#980
SamitHuang merged 18 commits intovllm-project:mainfrom
faaany:wan2.2-oom

faaany commented Jan 27, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

Uh oh!

faaany commented Jan 27, 2026

Uh oh!

hsliuustc0106 commented Jan 27, 2026

Uh oh!

wtomin commented Jan 28, 2026

Uh oh!

ZJY0516 Jan 28, 2026

Uh oh!

faaany Jan 28, 2026

Uh oh!

faaany commented Jan 28, 2026

Uh oh!

SamitHuang commented Jan 28, 2026

Uh oh!

faaany commented Jan 28, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Comments

Conversation

faaany commented Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

faaany commented Jan 27, 2026

Uh oh!

hsliuustc0106 commented Jan 27, 2026

Uh oh!

wtomin commented Jan 28, 2026

Uh oh!

ZJY0516 Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

faaany Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

faaany commented Jan 28, 2026

Uh oh!

SamitHuang commented Jan 28, 2026

Uh oh!

faaany commented Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

faaany commented Jan 27, 2026 •

edited

Loading

faaany commented Jan 28, 2026 •

edited

Loading