Support sleep, wake_up and load_weights for Omni Diffusion by knlnguyen1802 · Pull Request #376 · vllm-project/vllm-omni

knlnguyen1802 · 2025-12-19T05:35:23Z

Purpose

This will support load and offload weight for Diffusion Model

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

Signed-off-by: knlnguyen1802 <[email protected]>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

vllm_omni/diffusion/worker/gpu_worker.py

Signed-off-by: knlnguyen1802 <[email protected]>

SamitHuang

test plan and results are missing. can you add unit tests for the new interfaces?

Signed-off-by: knlnguyen1802 <[email protected]>

knlnguyen1802 · 2025-12-22T03:23:23Z

test plan and results are missing. can you add unit tests for the new interfaces?

Add unit test

Signed-off-by: knlnguyen1802 <[email protected]>

tests/diffusion/test_gpu_worker.py

ZJY0516 · 2026-01-02T14:52:46Z

vllm_omni/diffusion/worker/gpu_worker.py

+    def load_weights(self, weights: Iterable[tuple[str, torch.Tensor]]) -> set[str]:
+        return self.pipeline.load_weights(weights)
+
+    def sleep(self, level: int = 1) -> bool:


Do we need to add a function in engine to call this and expose an interface to user?

Yes for the new interface of OmniStage after #391. And it is also fix by #355

ZJY0516 · 2026-01-02T15:16:20Z

I'd like to discuss one more point. Since vLLM-Omni uses libraries like Transformers and Diffusers to load components (e.g., text encoder and VAE), does the current sleep method also handle memory allocated by these external libraries? @knlnguyen1802

knlnguyen1802 · 2026-01-02T15:18:49Z

I'd like to discuss one more point. Since vLLM-Omni uses libraries like Transformers and Diffusers to load components (e.g., text encoder and VAE), does the current sleep method also handle memory allocated by these external libraries? @knlnguyen1802

The answer is no since sleep only handle memory that wrap in the context of _maybe_get_memory_pool_context

ZJY0516 · 2026-01-02T15:29:26Z

Could we just offload model to cpu like model.to(cpu) since diffusion models don't have cuda graph and kv cache

knlnguyen1802 · 2026-01-02T15:31:38Z

Could we just offload model to cpu like model.to(cpu) since diffusion models don't have cuda graph and kv cache

For model, it already work as in this PR because I wrap the model loader in context of _maybe_get_memory_pool_context.
For other part like Cache it might need to do that way.
Noted: Sorry that I misunderstanding your question at the begining

ZJY0516 · 2026-01-02T15:42:11Z

Could we just offload model to cpu like model.to(cpu) since diffusion models don't have cuda graph and kv cache

For model, it already work as in this PR because I wrap the model loader in context of _maybe_get_memory_pool_context. For other part like Cache it might need to do that way. Noted: Sorry that I misunderstanding your question at the begining

Being able to call model.to('cpu') directly would make the current PR, with its new _maybe_get_memory_pool_context, seem unnecessarily complex.

knlnguyen1802 · 2026-01-03T01:35:54Z

Being able to call model.to('cpu') directly would make the current PR, with its new _maybe_get_memory_pool_context, seem unnecessarily complex.

Yes, but the "CuMemAllocator" is a well define class that help to easier track how many GB of memory if offload and move back to GPU when wakeup, also keep track of time to handle these operation. Also it can have some optimization.

hsliuustc0106

lgtm

tests/diffusion/test_gpu_worker.py

Signed-off-by: knlnguyen1802 <[email protected]>

…vllm-omni into diffusion_support

ZJY0516 · 2026-01-05T13:43:59Z

Could you please add some doc for sleep mode? @knlnguyen1802

knlnguyen1802 · 2026-01-06T01:59:56Z

Could you please add some doc for sleep mode? @knlnguyen1802

Got it I'll add in a new PR

…ect#376) Signed-off-by: knlnguyen1802 <[email protected]>

Fix pre-commit

3070c9e

Signed-off-by: knlnguyen1802 <[email protected]>

knlnguyen1802 requested a review from hsliuustc0106 as a code owner December 19, 2025 05:35

chatgpt-codex-connector bot reviewed Dec 19, 2025

View reviewed changes

vllm_omni/diffusion/worker/gpu_worker.py Outdated Show resolved Hide resolved

knlnguyen1802 mentioned this pull request Dec 19, 2025

[Feature][RL]: Support Model weight offload, reload and sync model weight & Offload DIT cache #316

Closed

1 task

Fix code

db68abf

Signed-off-by: knlnguyen1802 <[email protected]>

knlnguyen1802 force-pushed the diffusion_support branch from 87a2ce0 to db68abf Compare December 19, 2025 06:24

SamitHuang reviewed Dec 22, 2025

View reviewed changes

knlnguyen1802 added 4 commits December 22, 2025 11:02

Add test

28b7195

Signed-off-by: knlnguyen1802 <[email protected]>

Fix name and add test

72a15d3

Signed-off-by: knlnguyen1802 <[email protected]>

Fix pre-commit

728162b

Signed-off-by: knlnguyen1802 <[email protected]>

Fix test

03321f0

Signed-off-by: knlnguyen1802 <[email protected]>

knlnguyen1802 mentioned this pull request Dec 22, 2025

RPC support for OmniDiffusion #371

Merged

5 tasks

ZJY0516 self-requested a review December 22, 2025 08:45

chenyingshu mentioned this pull request Dec 22, 2025

[WIP][RFC] Support Qwen-Image Flow-GRPO Training based on vLLM-Omni verl-project/verl#4639

Open

22 tasks

Rebase

adf821e

Signed-off-by: knlnguyen1802 <[email protected]>

ZJY0516 reviewed Jan 2, 2026

View reviewed changes

knlnguyen1802 mentioned this pull request Jan 3, 2026

RPC support for entrypoints (Omni/AsyncOmni) #355

Closed

5 tasks

hsliuustc0106 added the ready label to trigger buildkite CI label Jan 5, 2026

Merge branch 'main' into diffusion_support

c36dbd8

hsliuustc0106 approved these changes Jan 5, 2026

View reviewed changes

hsliuustc0106 enabled auto-merge (squash) January 5, 2026 09:00

ZJY0516 requested changes Jan 5, 2026

View reviewed changes

tests/diffusion/test_gpu_worker.py Show resolved Hide resolved

Add test to pipeline

5260232

Signed-off-by: knlnguyen1802 <[email protected]>

Merge branch 'diffusion_support' of https://github.com/knlnguyen1802/…

6b966d9

…vllm-omni into diffusion_support

auto-merge was automatically disabled January 5, 2026 09:43
Head branch was pushed to by a user without write access

Merge branch 'main' into diffusion_support

9f77c12

hsliuustc0106 approved these changes Jan 5, 2026

View reviewed changes

hsliuustc0106 requested a review from ZJY0516 January 5, 2026 13:14

ZJY0516 approved these changes Jan 5, 2026

View reviewed changes

ZJY0516 merged commit b414a4d into vllm-project:main Jan 5, 2026
7 checks passed

tzhouam pushed a commit to tzhouam/vllm-omni that referenced this pull request Jan 6, 2026

Support sleep, wake_up and load_weights for Omni Diffusion (vllm-proj…

73c4816

…ect#376) Signed-off-by: knlnguyen1802 <[email protected]>

knlnguyen1802 mentioned this pull request Jan 6, 2026

[Docs] Guide for using sleep mode and enable sleep mode #660

Merged

5 tasks

princepride pushed a commit to princepride/vllm-omni that referenced this pull request Jan 10, 2026

Support sleep, wake_up and load_weights for Omni Diffusion (vllm-proj…

98cd3c1

…ect#376) Signed-off-by: knlnguyen1802 <[email protected]>

sniper35 pushed a commit to sniper35/vllm-omni that referenced this pull request Jan 10, 2026

Support sleep, wake_up and load_weights for Omni Diffusion (vllm-proj…

c8d9c89

…ect#376) Signed-off-by: knlnguyen1802 <[email protected]>

ZJY0516 pushed a commit to LawJarp-A/vllm-omni that referenced this pull request Jan 10, 2026

Support sleep, wake_up and load_weights for Omni Diffusion (vllm-proj…

f86202f

…ect#376) Signed-off-by: knlnguyen1802 <[email protected]>

knlnguyen1802 mentioned this pull request Jan 14, 2026

[RFC]: Reinforcement learning support on vllm-omni #778

Open

1 task

Comments

Conversation

knlnguyen1802 commented Dec 19, 2025

Purpose

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

SamitHuang left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

knlnguyen1802 commented Dec 22, 2025

Uh oh!

Uh oh!

ZJY0516 Jan 2, 2026

Choose a reason for hiding this comment

Uh oh!

knlnguyen1802 Jan 2, 2026

Choose a reason for hiding this comment

Uh oh!

ZJY0516 commented Jan 2, 2026

Uh oh!

knlnguyen1802 commented Jan 2, 2026

Uh oh!

ZJY0516 commented Jan 2, 2026

Uh oh!

knlnguyen1802 commented Jan 2, 2026

Uh oh!

ZJY0516 commented Jan 2, 2026

Uh oh!

knlnguyen1802 commented Jan 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hsliuustc0106 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ZJY0516 commented Jan 5, 2026

Uh oh!

knlnguyen1802 commented Jan 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

SamitHuang left a comment •

edited

Loading

knlnguyen1802 commented Jan 3, 2026 •

edited

Loading