Skip to content

Comments

Support sleep, wake_up and load_weights for Omni Diffusion#376

Merged
ZJY0516 merged 11 commits intovllm-project:mainfrom
knlnguyen1802:diffusion_support
Jan 5, 2026
Merged

Support sleep, wake_up and load_weights for Omni Diffusion#376
ZJY0516 merged 11 commits intovllm-project:mainfrom
knlnguyen1802:diffusion_support

Conversation

@knlnguyen1802
Copy link
Contributor

Purpose

Fix #316

This will support load and offload weight for Diffusion Model


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

Signed-off-by: knlnguyen1802 <[email protected]>
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Signed-off-by: knlnguyen1802 <[email protected]>
Copy link
Collaborator

@SamitHuang SamitHuang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test plan and results are missing. can you add unit tests for the new interfaces?

Signed-off-by: knlnguyen1802 <[email protected]>
Signed-off-by: knlnguyen1802 <[email protected]>
Signed-off-by: knlnguyen1802 <[email protected]>
Signed-off-by: knlnguyen1802 <[email protected]>
@knlnguyen1802
Copy link
Contributor Author

test plan and results are missing. can you add unit tests for the new interfaces?

Add unit test

Signed-off-by: knlnguyen1802 <[email protected]>
def load_weights(self, weights: Iterable[tuple[str, torch.Tensor]]) -> set[str]:
return self.pipeline.load_weights(weights)

def sleep(self, level: int = 1) -> bool:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to add a function in engine to call this and expose an interface to user?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes for the new interface of OmniStage after #391. And it is also fix by #355

@ZJY0516
Copy link
Collaborator

ZJY0516 commented Jan 2, 2026

I'd like to discuss one more point. Since vLLM-Omni uses libraries like Transformers and Diffusers to load components (e.g., text encoder and VAE), does the current sleep method also handle memory allocated by these external libraries? @knlnguyen1802

@knlnguyen1802
Copy link
Contributor Author

I'd like to discuss one more point. Since vLLM-Omni uses libraries like Transformers and Diffusers to load components (e.g., text encoder and VAE), does the current sleep method also handle memory allocated by these external libraries? @knlnguyen1802

The answer is no since sleep only handle memory that wrap in the context of _maybe_get_memory_pool_context

@ZJY0516
Copy link
Collaborator

ZJY0516 commented Jan 2, 2026

Could we just offload model to cpu like model.to(cpu) since diffusion models don't have cuda graph and kv cache

@knlnguyen1802
Copy link
Contributor Author

Could we just offload model to cpu like model.to(cpu) since diffusion models don't have cuda graph and kv cache

For model, it already work as in this PR because I wrap the model loader in context of _maybe_get_memory_pool_context.
For other part like Cache it might need to do that way.
Noted: Sorry that I misunderstanding your question at the begining

@ZJY0516
Copy link
Collaborator

ZJY0516 commented Jan 2, 2026

Could we just offload model to cpu like model.to(cpu) since diffusion models don't have cuda graph and kv cache

For model, it already work as in this PR because I wrap the model loader in context of _maybe_get_memory_pool_context. For other part like Cache it might need to do that way. Noted: Sorry that I misunderstanding your question at the begining

Being able to call model.to('cpu') directly would make the current PR, with its new _maybe_get_memory_pool_context, seem unnecessarily complex.

@knlnguyen1802
Copy link
Contributor Author

knlnguyen1802 commented Jan 3, 2026

Being able to call model.to('cpu') directly would make the current PR, with its new _maybe_get_memory_pool_context, seem unnecessarily complex.

Yes, but the "CuMemAllocator" is a well define class that help to easier track how many GB of memory if offload and move back to GPU when wakeup, also keep track of time to handle these operation. Also it can have some optimization.

@hsliuustc0106 hsliuustc0106 added the ready label to trigger buildkite CI label Jan 5, 2026
Copy link
Collaborator

@hsliuustc0106 hsliuustc0106 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@hsliuustc0106 hsliuustc0106 enabled auto-merge (squash) January 5, 2026 09:00
Signed-off-by: knlnguyen1802 <[email protected]>
auto-merge was automatically disabled January 5, 2026 09:43

Head branch was pushed to by a user without write access

@hsliuustc0106 hsliuustc0106 requested a review from ZJY0516 January 5, 2026 13:14
@ZJY0516 ZJY0516 merged commit b414a4d into vllm-project:main Jan 5, 2026
7 checks passed
@ZJY0516
Copy link
Collaborator

ZJY0516 commented Jan 5, 2026

Could you please add some doc for sleep mode? @knlnguyen1802

@knlnguyen1802
Copy link
Contributor Author

Could you please add some doc for sleep mode? @knlnguyen1802

Got it I'll add in a new PR

tzhouam pushed a commit to tzhouam/vllm-omni that referenced this pull request Jan 6, 2026
princepride pushed a commit to princepride/vllm-omni that referenced this pull request Jan 10, 2026
sniper35 pushed a commit to sniper35/vllm-omni that referenced this pull request Jan 10, 2026
ZJY0516 pushed a commit to LawJarp-A/vllm-omni that referenced this pull request Jan 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready label to trigger buildkite CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature][RL]: Support Model weight offload, reload and sync model weight & Offload DIT cache

4 participants