Skip to content

[Bug][NPU]: USP fail #598

@gcanlin

Description

@gcanlin

Your current environment

The output of python collect_env.py
Your output of `python collect_env.py` here

🐛 Describe the bug

Adding requests:   0%|                                                                                                  | 0/1 [00:13<?, ?it/s]
[Stage-0] ERROR 01-03 17:00:34 [omni_stage.py:636] Received shutdown signal
[Stage-0] INFO 01-03 17:00:34 [gpu_worker.py:265] Worker 0: Received shutdown message
[Stage-0] INFO 01-03 17:00:34 [gpu_worker.py:287] event loop terminated.
[Stage-0] ERROR 01-03 17:00:34 [gpu_worker.py:226] Error executing RPC: Tensors must be contiguous
[Stage-0] ERROR 01-03 17:00:34 [gpu_worker.py:226] [ERROR] 2026-01-03-17:00:34 (PID:1242551, Device:1, RankID:1) ERR02002 DIST invalid type
[Stage-0] ERROR 01-03 17:00:34 [gpu_worker.py:226] Traceback (most recent call last):
[Stage-0] ERROR 01-03 17:00:34 [gpu_worker.py:226]   File "/root/vllm-workspace/vllm-omni/vllm_omni/diffusion/worker/gpu_worker.py", line 221, in execute_rpc
[Stage-0] ERROR 01-03 17:00:34 [gpu_worker.py:226]     result = func(*args, **kwargs)
[Stage-0] ERROR 01-03 17:00:34 [gpu_worker.py:226]              ^^^^^^^^^^^^^^^^^^^^^
[Stage-0] ERROR 01-03 17:00:34 [gpu_worker.py:226]   File "/root/vllm-workspace/vllm-omni/vllm_omni/diffusion/worker/gpu_worker.py", line 120, in generate
[Stage-0] ERROR 01-03 17:00:34 [gpu_worker.py:226]     return self.execute_model(requests, self.od_config)
[Stage-0] ERROR 01-03 17:00:34 [gpu_worker.py:226]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[Stage-0] ERROR 01-03 17:00:34 [gpu_worker.py:226]   File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
[Stage-0] ERROR 01-03 17:00:34 [gpu_worker.py:226]     return func(*args, **kwargs)
[Stage-0] ERROR 01-03 17:00:34 [gpu_worker.py:226]            ^^^^^^^^^^^^^^^^^^^^^
[Stage-0] ERROR 01-03 17:00:34 [gpu_worker.py:226]   File "/root/vllm-workspace/vllm-omni/vllm_omni/diffusion/worker/gpu_worker.py", line 140, in execute_model
[Stage-0] ERROR 01-03 17:00:34 [gpu_worker.py:226]     output = self.pipeline.forward(req)
[Stage-0] ERROR 01-03 17:00:34 [gpu_worker.py:226]              ^^^^^^^^^^^^^^^^^^^^^^^^^^
[Stage-0] ERROR 01-03 17:00:34 [gpu_worker.py:226]   File "/root/vllm-workspace/vllm-omni/vllm_omni/diffusion/models/qwen_image/pipeline_qwen_image.py", line 717, in forward
[Stage-0] ERROR 01-03 17:00:34 [gpu_worker.py:226]     latents = self.diffuse(
[Stage-0] ERROR 01-03 17:00:34 [gpu_worker.py:226]               ^^^^^^^^^^^^^
[Stage-0] ERROR 01-03 17:00:34 [gpu_worker.py:226]   File "/root/vllm-workspace/vllm-omni/vllm_omni/diffusion/models/qwen_image/pipeline_qwen_image.py", line 556, in diffuse
[Stage-0] ERROR 01-03 17:00:34 [gpu_worker.py:226]     noise_pred = self.transformer(
[Stage-0] ERROR 01-03 17:00:34 [gpu_worker.py:226]                  ^^^^^^^^^^^^^^^^^
[Stage-0] ERROR 01-03 17:00:34 [gpu_worker.py:226]   File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
[Stage-0] ERROR 01-03 17:00:34 [gpu_worker.py:226]     return self._call_impl(*args, **kwargs)
[Stage-0] ERROR 01-03 17:00:34 [gpu_worker.py:226]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[Stage-0] ERROR 01-03 17:00:34 [gpu_worker.py:226]   File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
[Stage-0] ERROR 01-03 17:00:34 [gpu_worker.py:226]     return forward_call(*args, **kwargs)
[Stage-0] ERROR 01-03 17:00:34 [gpu_worker.py:226]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[Stage-0] ERROR 01-03 17:00:34 [gpu_worker.py:226]   File "/usr/local/python3.11.13/lib/python3.11/site-packages/cache_dit/caching/cache_adapters/cache_adapter.py", line 439, in new_forward_with_hf_hook
[Stage-0] ERROR 01-03 17:00:34 [gpu_worker.py:226]     outputs = new_forward(self, *args, **kwargs)
[Stage-0] ERROR 01-03 17:00:34 [gpu_worker.py:226]               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[Stage-0] ERROR 01-03 17:00:34 [gpu_worker.py:226]   File "/usr/local/python3.11.13/lib/python3.11/site-packages/cache_dit/caching/cache_adapters/cache_adapter.py", line 427, in new_forward
[Stage-0] ERROR 01-03 17:00:34 [gpu_worker.py:226]     outputs = original_forward(*args, **kwargs)
[Stage-0] ERROR 01-03 17:00:34 [gpu_worker.py:226]               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[Stage-0] ERROR 01-03 17:00:34 [gpu_worker.py:226]   File "/root/vllm-workspace/vllm-omni/vllm_omni/diffusion/models/qwen_image/qwen_image_transformer.py", line 784, in forward
[Stage-0] ERROR 01-03 17:00:34 [gpu_worker.py:226]     encoder_hidden_states, hidden_states = block(
[Stage-0] ERROR 01-03 17:00:34 [gpu_worker.py:226]                                            ^^^^^^
[Stage-0] ERROR 01-03 17:00:34 [gpu_worker.py:226]   File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
[Stage-0] ERROR 01-03 17:00:34 [gpu_worker.py:226]     return self._call_impl(*args, **kwargs)
[Stage-0] ERROR 01-03 17:00:34 [gpu_worker.py:226]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[Stage-0] ERROR 01-03 17:00:34 [gpu_worker.py:226]   File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
[Stage-0] ERROR 01-03 17:00:34 [gpu_worker.py:226]     return forward_call(*args, **kwargs)
[Stage-0] ERROR 01-03 17:00:34 [gpu_worker.py:226]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[Stage-0] ERROR 01-03 17:00:34 [gpu_worker.py:226]   File "/usr/local/python3.11.13/lib/python3.11/site-packages/cache_dit/caching/cache_blocks/pattern_base.py", line 250, in forward
[Stage-0] ERROR 01-03 17:00:34 [gpu_worker.py:226]     hidden_states, encoder_hidden_states = self.call_Fn_blocks(
[Stage-0] ERROR 01-03 17:00:34 [gpu_worker.py:226]                                            ^^^^^^^^^^^^^^^^^^^^
[Stage-0] ERROR 01-03 17:00:34 [gpu_worker.py:226]   File "/usr/local/python3.11.13/lib/python3.11/site-packages/cache_dit/caching/cache_blocks/pattern_base.py", line 426, in call_Fn_blocks
[Stage-0] ERROR 01-03 17:00:34 [gpu_worker.py:226]     hidden_states = block(
[Stage-0] ERROR 01-03 17:00:34 [gpu_worker.py:226]                     ^^^^^^
[Stage-0] ERROR 01-03 17:00:34 [gpu_worker.py:226]   File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
[Stage-0] ERROR 01-03 17:00:34 [gpu_worker.py:226]     return self._call_impl(*args, **kwargs)
[Stage-0] ERROR 01-03 17:00:34 [gpu_worker.py:226]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[Stage-0] ERROR 01-03 17:00:34 [gpu_worker.py:226]   File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
[Stage-0] ERROR 01-03 17:00:34 [gpu_worker.py:226]     return forward_call(*args, **kwargs)
[Stage-0] ERROR 01-03 17:00:34 [gpu_worker.py:226]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[Stage-0] ERROR 01-03 17:00:34 [gpu_worker.py:226]   File "/root/vllm-workspace/vllm-omni/vllm_omni/diffusion/models/qwen_image/qwen_image_transformer.py", line 575, in forward
[Stage-0] ERROR 01-03 17:00:34 [gpu_worker.py:226]     attn_output = self.attn(
[Stage-0] ERROR 01-03 17:00:34 [gpu_worker.py:226]                   ^^^^^^^^^^
[Stage-0] ERROR 01-03 17:00:34 [gpu_worker.py:226]   File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
[Stage-0] ERROR 01-03 17:00:34 [gpu_worker.py:226]     return self._call_impl(*args, **kwargs)
[Stage-0] ERROR 01-03 17:00:34 [gpu_worker.py:226]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[Stage-0] ERROR 01-03 17:00:34 [gpu_worker.py:226]   File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
[Stage-0] ERROR 01-03 17:00:34 [gpu_worker.py:226]     return forward_call(*args, **kwargs)
[Stage-0] ERROR 01-03 17:00:34 [gpu_worker.py:226]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[Stage-0] ERROR 01-03 17:00:34 [gpu_worker.py:226]   File "/root/vllm-workspace/vllm-omni/vllm_omni/diffusion/models/qwen_image/qwen_image_transformer.py", line 427, in forward
[Stage-0] ERROR 01-03 17:00:34 [gpu_worker.py:226]     joint_hidden_states = self.attn(
[Stage-0] ERROR 01-03 17:00:34 [gpu_worker.py:226]                           ^^^^^^^^^^
[Stage-0] ERROR 01-03 17:00:34 [gpu_worker.py:226]   File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
[Stage-0] ERROR 01-03 17:00:34 [gpu_worker.py:226]     return self._call_impl(*args, **kwargs)
[Stage-0] ERROR 01-03 17:00:34 [gpu_worker.py:226]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[Stage-0] ERROR 01-03 17:00:34 [gpu_worker.py:226]   File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
[Stage-0] ERROR 01-03 17:00:34 [gpu_worker.py:226]     return forward_call(*args, **kwargs)
[Stage-0] ERROR 01-03 17:00:34 [gpu_worker.py:226]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[Stage-0] ERROR 01-03 17:00:34 [gpu_worker.py:226]   File "/root/vllm-workspace/vllm-omni/vllm_omni/diffusion/attention/layer.py", line 102, in forward
[Stage-0] ERROR 01-03 17:00:34 [gpu_worker.py:226]     out = self.parallel_strategy.post_attention(out, ctx)
[Stage-0] ERROR 01-03 17:00:34 [gpu_worker.py:226]           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[Stage-0] ERROR 01-03 17:00:34 [gpu_worker.py:226]   File "/root/vllm-workspace/vllm-omni/vllm_omni/diffusion/attention/parallel/ulysses.py", line 196, in post_attention
[Stage-0] ERROR 01-03 17:00:34 [gpu_worker.py:226]     dist.all_gather(gathered_joint, output_joint, group=ctx.ulysses_pg)
[Stage-0] ERROR 01-03 17:00:34 [gpu_worker.py:226]   File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/distributed/c10d_logger.py", line 81, in wrapper
[Stage-0] ERROR 01-03 17:00:34 [gpu_worker.py:226]     return func(*args, **kwargs)
[Stage-0] ERROR 01-03 17:00:34 [gpu_worker.py:226]            ^^^^^^^^^^^^^^^^^^^^^
[Stage-0] ERROR 01-03 17:00:34 [gpu_worker.py:226]   File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py", line 3879, in all_gather
[Stage-0] ERROR 01-03 17:00:34 [gpu_worker.py:226]     work = group.allgather([tensor_list], [tensor], opts)
[Stage-0] ERROR 01-03 17:00:34 [gpu_worker.py:226]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[Stage-0] ERROR 01-03 17:00:34 [gpu_worker.py:226] RuntimeError: Tensors must be contiguous
[Stage-0] ERROR 01-03 17:00:34 [gpu_worker.py:226] [ERROR] 2026-01-03-17:00:34 (PID:1242551, Device:1, RankID:1) ERR02002 DIST invalid type
[Stage-0] INFO 01-03 17:00:34 [gpu_worker.py:265] Worker 1: Received shutdown message

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions