Skip to content

[Core] add clean up method for diffusion engine#219

Merged
ZJY0516 merged 10 commits intovllm-project:mainfrom
ZJY0516:random-ci
Dec 8, 2025
Merged

[Core] add clean up method for diffusion engine#219
ZJY0516 merged 10 commits intovllm-project:mainfrom
ZJY0516:random-ci

Conversation

@ZJY0516
Copy link
Collaborator

@ZJY0516 ZJY0516 commented Dec 5, 2025

Purpose

  • add clean up method for diffusion engine
  • add a test for qwen-image

Test Plan

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

Signed-off-by: zjy0516 <[email protected]>
Signed-off-by: zjy0516 <[email protected]>
@chatgpt-codex-connector
Copy link

The account who enabled Codex for this repo no longer has access to Codex. Please contact the admins of this repo to enable Codex again.

Signed-off-by: zjy0516 <[email protected]>
Signed-off-by: zjy0516 <[email protected]>
Signed-off-by: zjy0516 <[email protected]>
Signed-off-by: zjy0516 <[email protected]>
@ZJY0516
Copy link
Collaborator Author

ZJY0516 commented Dec 5, 2025

cc @gcanlin This pr changes worker

@ZJY0516 ZJY0516 changed the title [WIP] add small random ckpt for ci [Core] add clean up method for diffusion engine Dec 5, 2025
Copy link
Collaborator

@hsliuustc0106 hsliuustc0106 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we move this omni_diffusion file under entrypoints since omni_llm is also listed there?

@ZJY0516
Copy link
Collaborator Author

ZJY0516 commented Dec 5, 2025

shall we move this omni_diffusion file under entrypoints since omni_llm is also listed there?

Yes

@SamitHuang
Copy link
Collaborator

will these semaphore leakage warning resolved by this PR?

/usr/lib/python3.12/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '
/usr/lib/python3.12/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 2 leaked shared_memory objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '

@ZJY0516
Copy link
Collaborator Author

ZJY0516 commented Dec 6, 2025

will these semaphore leakage warning resolved by this PR?

/usr/lib/python3.12/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '
/usr/lib/python3.12/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 2 leaked shared_memory objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '

Yes. But there is a new warning now. I'll resolve it later

[rank0]:[W1206 13:49:30.182850024 ProcessGroupNCCL.cpp:1538] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())

Signed-off-by: zjy0516 <[email protected]>
@ZJY0516
Copy link
Collaborator Author

ZJY0516 commented Dec 6, 2025

will these semaphore leakage warning resolved by this PR?

/usr/lib/python3.12/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '
/usr/lib/python3.12/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 2 leaked shared_memory objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '

Yes. But there is a new warning now. I'll resolve it later

[rank0]:[W1206 13:49:30.182850024 ProcessGroupNCCL.cpp:1538] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())

Done

Copy link
Collaborator

@SamitHuang SamitHuang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@gcanlin
Copy link
Contributor

gcanlin commented Dec 8, 2025

Qwen-Image on NPU can work normally. Thanks!

INFO 12-08 03:16:35 [utils.py:474] No adjustment needed for ACL graph batch sizes: Qwen3ForCausalLM model (layers: 28) with 35 sizes
WARNING 12-08 03:16:35 [platform.py:275] If chunked prefill or prefix caching is enabled, block size must be set to 128.
WARNING 12-08 03:16:35 [__init__.py:755] Current vLLM config is not set.
WARNING 12-08 03:16:35 [platform.py:143] Model config is missing. This may indicate that we are running a test case
WARNING 12-08 03:16:35 [platform.py:275] If chunked prefill or prefix caching is enabled, block size must be set to 128.
WARNING 12-08 03:16:35 [parallel_state.py:1055] Distributed backend nccl is not available; falling back to gloo.
WARNING 12-08 03:16:35 [__init__.py:755] Current vLLM config is not set.
WARNING 12-08 03:16:35 [platform.py:143] Model config is missing. This may indicate that we are running a test case
WARNING 12-08 03:16:35 [platform.py:275] If chunked prefill or prefix caching is enabled, block size must be set to 128.
INFO 12-08 03:16:35 [parallel_state.py:1208] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, TP rank 0, EP rank 0
WARNING:vllm_omni.diffusion.attention.backends.flash_attn:FlashAttentionBackend is not available. You may install flash-attn by running `uv pip install flash-attn==2.8.1 --no-build-isolation` or install pre-built flash-attn from https://github.com/Dao-AILab/flash-attention/releases
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████| 4/4 [00:02<00:00,  1.38it/s]
/usr/local/python3.11.13/lib/python3.11/site-packages/torch/utils/_device.py:104: UserWarning: Cannot create tensor with interal format while allow_internel_format=False, tensor will be created with base format. (Triggered internally at build/CMakeFiles/torch_npu.dir/compiler_depend.ts:335.)
  return func(*args, **kwargs)
WARNING 12-08 03:16:41 [__init__.py:755] Current vLLM config is not set.
WARNING 12-08 03:16:41 [platform.py:143] Model config is missing. This may indicate that we are running a test case
WARNING 12-08 03:16:41 [platform.py:275] If chunked prefill or prefix caching is enabled, block size must be set to 128.
INFO:vllm_omni.diffusion.worker.npu.npu_worker:Worker 0: Initialized device, model, and distributed environment.
INFO:vllm_omni.diffusion.worker.npu.npu_worker:Worker 0: Model loaded successfully.
INFO:vllm_omni.diffusion.worker.npu.npu_worker:Worker 0: Scheduler loop started.
INFO:vllm_omni.diffusion.worker.npu.npu_worker:Worker 0 ready to receive requests via shared memory
INFO:vllm_omni.diffusion.scheduler:SyncScheduler initialized result MessageQueue
INFO:vllm_omni.diffusion.omni_diffusion:Prepared 1 requests for generation.
('Warning: torch.save with "_use_new_zipfile_serialization = False" is not recommended for npu tensor, which may bring unexpected errors and hopefully set "_use_new_zipfile_serialization = True"', 'if it is necessary to use this, please convert the npu tensor to cpu tensor for saving')
('Warning: torch.save with "_use_new_zipfile_serialization = False" is not recommended for npu tensor, which may bring unexpected errors and hopefully set "_use_new_zipfile_serialization = True"', 'if it is necessary to use this, please convert the npu tensor to cpu tensor for saving')
Warning: The current version of the file storing weights is old, and it is relanded due to internal bug of torch and compatibility issue. We will deprecate the loading support for this type of file in the future, please use newer torch to re-store the weight file.
INFO:vllm_omni.diffusion.diffusion_engine:Generation completed successfully.
Saved generated image to outputs/coffee.png
INFO:vllm_omni.diffusion.worker.npu.npu_worker:Worker 0: Received shutdown message
INFO:vllm_omni.diffusion.worker.npu.npu_worker:event loop terminated.
INFO:vllm_omni.diffusion.worker.npu.npu_worker:Worker 0: Destroyed process group
INFO:vllm_omni.diffusion.worker.npu.npu_worker:Worker 0: Shutdown complete.

@ZJY0516 ZJY0516 enabled auto-merge (squash) December 8, 2025 03:22
@ZJY0516 ZJY0516 merged commit aa689e6 into vllm-project:main Dec 8, 2025
4 checks passed
@ZJY0516 ZJY0516 deleted the random-ci branch December 8, 2025 07:34
@hsliuustc0106
Copy link
Collaborator

why omni_diffusion.py is not moved to entrypoints?

@hsliuustc0106
Copy link
Collaborator

@ZJY0516

@ZJY0516
Copy link
Collaborator Author

ZJY0516 commented Dec 11, 2025

why omni_diffusion.py is not moved to entrypoints?

I'll do it later

LawJarp-A pushed a commit to LawJarp-A/vllm-omni that referenced this pull request Dec 12, 2025
LawJarp-A pushed a commit to LawJarp-A/vllm-omni that referenced this pull request Dec 12, 2025
faaany pushed a commit to faaany/vllm-omni that referenced this pull request Dec 19, 2025
princepride pushed a commit to princepride/vllm-omni that referenced this pull request Jan 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants