[Bugfix] Removed the NPU-specific code path in _run_local_attention by gcanlin · Pull Request #597 · vllm-project/vllm-omni

gcanlin · 2026-01-03T16:43:29Z

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

Following #273, this PR fixes #596 and #598.

Removed the NPU-specific code path that tried to call self.attention() directly (which wasn't callable). Now it simply calls self.attention.forward(query, key, value, attn_metadata) for all platforms, delegating the platform-specific logic to the attention backend implementation. The AscendAttentionBackendImpl.forward() method will handle the NPU dispatch automatically.
The second issue was that when output_joint is created by slicing attn_output, it becomes a non-contiguous view of the original tensor. PyTorch's dist.all_gather requires contiguous tensors for the collective operation. The NPU backend is stricter about this requirement than the GPU backend, which is maybe why it worked on GPU but failed on NPU.

Test Plan

python text_to_image.py \
  --model Qwen/Qwen-Image \
  --prompt "a cup of coffee on the table" \
  --seed 42 \
  --cfg_scale 4.0 \
  --num_images_per_prompt 1 \
  --num_inference_steps 50 \
  --cache_backend  cache_dit \
  --ulysses_degree 2

Test Result

[Stage-0] INFO 01-03 17:10:12 [diffusion_engine.py:86] Generation completed successfully.
[Stage-0] INFO 01-03 17:10:12 [diffusion_engine.py:109] Post-processing completed in 0.1150 seconds
INFO 01-03 17:10:12 [log_utils.py:549] {'type': 'request_level_metrics',
INFO 01-03 17:10:12 [log_utils.py:549]  'request_id': '0_eb52da8d-61b8-40e8-b7a1-6682171e6411',
INFO 01-03 17:10:12 [log_utils.py:549]  'e2e_time_ms': 26954.47301864624,
INFO 01-03 17:10:12 [log_utils.py:549]  'e2e_tpt': 0.0,
INFO 01-03 17:10:12 [log_utils.py:549]  'e2e_total_tokens': 0,
INFO 01-03 17:10:12 [log_utils.py:549]  'transfers_total_time_ms': 0.0,
INFO 01-03 17:10:12 [log_utils.py:549]  'transfers_total_bytes': 0,
INFO 01-03 17:10:12 [log_utils.py:549]  'stages': {0: {'stage_gen_time_ms': 26936.877489089966,
INFO 01-03 17:10:12 [log_utils.py:549]                 'num_tokens_out': 0,
INFO 01-03 17:10:12 [log_utils.py:549]                 'num_tokens_in': 0}}}
Processed prompts: 100%|███████████████████████████████████| 1/1 [00:26<00:00, 26.95s/img, est. speed stage-0 img/s: 0.00, avg e2e_lat: 0.0ms]
INFO 01-03 17:10:12 [omni.py:687] [Summary] {'e2e_requests': 1,1 [00:26<00:00, 26.95s/img, est. speed stage-0 img/s: 0.00, avg e2e_lat: 0.0ms]
INFO 01-03 17:10:12 [omni.py:687]  'e2e_total_time_ms': 26955.371618270874,
INFO 01-03 17:10:12 [omni.py:687]  'e2e_sum_time_ms': 26954.47301864624,
INFO 01-03 17:10:12 [omni.py:687]  'e2e_total_tokens': 0,
INFO 01-03 17:10:12 [omni.py:687]  'e2e_avg_time_per_request_ms': 26954.47301864624,
INFO 01-03 17:10:12 [omni.py:687]  'e2e_avg_tokens_per_s': 0.0,
INFO 01-03 17:10:12 [omni.py:687]  'wall_time_ms': 26955.371618270874,
INFO 01-03 17:10:12 [omni.py:687]  'final_stage_id': {'0_eb52da8d-61b8-40e8-b7a1-6682171e6411': 0},
INFO 01-03 17:10:12 [omni.py:687]  'stages': [{'stage_id': 0,
INFO 01-03 17:10:12 [omni.py:687]              'requests': 1,
INFO 01-03 17:10:12 [omni.py:687]              'tokens': 0,
INFO 01-03 17:10:12 [omni.py:687]              'total_time_ms': 26954.70142364502,
INFO 01-03 17:10:12 [omni.py:687]              'avg_time_per_request_ms': 26954.70142364502,
INFO 01-03 17:10:12 [omni.py:687]              'avg_tokens_per_s': 0.0}],
INFO 01-03 17:10:12 [omni.py:687]  'transfers': []}
Adding requests:   0%|                                                                                                  | 0/1 [00:26<?, ?it/s]
[Stage-0] ERROR 01-03 17:10:12 [omni_stage.py:636] Received shutdown signal
[Stage-0] INFO 01-03 17:10:12 [gpu_worker.py:265] Worker 0: Received shutdown message
[Stage-0] INFO 01-03 17:10:12 [gpu_worker.py:287] event loop terminated.
[Stage-0] INFO 01-03 17:10:12 [gpu_worker.py:265] Worker 1: Received shutdown message
[Stage-0] INFO 01-03 17:10:12 [gpu_worker.py:287] event loop terminated.
[Stage-0] INFO 01-03 17:10:12 [npu_worker.py:126] Worker 1: Shutdown complete.
[Stage-0] INFO 01-03 17:10:12 [npu_worker.py:126] Worker 0: Shutdown complete.
INFO 01-03 17:10:17 [text_to_image.py:169] Outputs: [OmniRequestOutput(request_id='', finished=True, stage_id=0, final_output_type='image', request_output=[OmniRequestOutput(request_id='0_eb52da8d-61b8-40e8-b7a1-6682171e6411', finished=True, stage_id=None, final_output_type='image', request_output=None, images=[1 PIL Images], prompt='a cup of coffee on the table', latents=None, metrics={})], images=[], prompt=None, latents=None, metrics={})]
Saved generated image to qwen_image_output.png

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

Signed-off-by: gcanlin <[email protected]>

gcanlin · 2026-01-03T17:22:00Z

cc @mxuax @Gaohan123

hsliuustc0106

lgtm

…llm-project#597) Signed-off-by: gcanlin <[email protected]>

[Bugfix] Removed the NPU-specific code path in _run_local_attention

04da252

Signed-off-by: gcanlin <[email protected]>

gcanlin requested a review from hsliuustc0106 as a code owner January 3, 2026 16:43

bugfix:contiguous

6e00dbf

Signed-off-by: gcanlin <[email protected]>

gcanlin mentioned this pull request Jan 3, 2026

[Bug][NPU]: USP fail #598

Closed

1 task

hsliuustc0106 approved these changes Jan 4, 2026

View reviewed changes

hsliuustc0106 added the ready label to trigger buildkite CI label Jan 4, 2026

hsliuustc0106 linked an issue Jan 4, 2026 that may be closed by this pull request

[Bug][NPU]: USP fail #598

Closed

1 task

hsliuustc0106 merged commit 1c9e58e into vllm-project:main Jan 4, 2026
6 of 7 checks passed

hsliuustc0106 mentioned this pull request Jan 4, 2026

[Misc] Merge diffusion forward context #582

Merged

princepride pushed a commit to princepride/vllm-omni that referenced this pull request Jan 10, 2026

[Bugfix] Removed the NPU-specific code path in _run_local_attention (v…

2d0eb75

…llm-project#597) Signed-off-by: gcanlin <[email protected]>

ZJY0516 pushed a commit to LawJarp-A/vllm-omni that referenced this pull request Jan 10, 2026

[Bugfix] Removed the NPU-specific code path in _run_local_attention (v…

ac8d388

…llm-project#597) Signed-off-by: gcanlin <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bugfix] Removed the NPU-specific code path in _run_local_attention#597

[Bugfix] Removed the NPU-specific code path in _run_local_attention#597
hsliuustc0106 merged 2 commits intovllm-project:mainfrom
gcanlin:ring-fix

gcanlin commented Jan 3, 2026 •

edited

Loading

Uh oh!

gcanlin commented Jan 3, 2026

Uh oh!

hsliuustc0106 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

gcanlin commented Jan 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gcanlin commented Jan 3, 2026

Uh oh!

hsliuustc0106 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

gcanlin commented Jan 3, 2026 •

edited

Loading