Skip to content

[Bugfix] Removed the NPU-specific code path in _run_local_attention#597

Merged
hsliuustc0106 merged 2 commits intovllm-project:mainfrom
gcanlin:ring-fix
Jan 4, 2026
Merged

[Bugfix] Removed the NPU-specific code path in _run_local_attention#597
hsliuustc0106 merged 2 commits intovllm-project:mainfrom
gcanlin:ring-fix

Conversation

@gcanlin
Copy link
Contributor

@gcanlin gcanlin commented Jan 3, 2026

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

Following #273, this PR fixes #596 and #598.

  • Removed the NPU-specific code path that tried to call self.attention() directly (which wasn't callable). Now it simply calls self.attention.forward(query, key, value, attn_metadata) for all platforms, delegating the platform-specific logic to the attention backend implementation. The AscendAttentionBackendImpl.forward() method will handle the NPU dispatch automatically.
  • The second issue was that when output_joint is created by slicing attn_output, it becomes a non-contiguous view of the original tensor. PyTorch's dist.all_gather requires contiguous tensors for the collective operation. The NPU backend is stricter about this requirement than the GPU backend, which is maybe why it worked on GPU but failed on NPU.

Test Plan

python text_to_image.py \
  --model Qwen/Qwen-Image \
  --prompt "a cup of coffee on the table" \
  --seed 42 \
  --cfg_scale 4.0 \
  --num_images_per_prompt 1 \
  --num_inference_steps 50 \
  --cache_backend  cache_dit \
  --ulysses_degree 2

Test Result

[Stage-0] INFO 01-03 17:10:12 [diffusion_engine.py:86] Generation completed successfully.
[Stage-0] INFO 01-03 17:10:12 [diffusion_engine.py:109] Post-processing completed in 0.1150 seconds
INFO 01-03 17:10:12 [log_utils.py:549] {'type': 'request_level_metrics',
INFO 01-03 17:10:12 [log_utils.py:549]  'request_id': '0_eb52da8d-61b8-40e8-b7a1-6682171e6411',
INFO 01-03 17:10:12 [log_utils.py:549]  'e2e_time_ms': 26954.47301864624,
INFO 01-03 17:10:12 [log_utils.py:549]  'e2e_tpt': 0.0,
INFO 01-03 17:10:12 [log_utils.py:549]  'e2e_total_tokens': 0,
INFO 01-03 17:10:12 [log_utils.py:549]  'transfers_total_time_ms': 0.0,
INFO 01-03 17:10:12 [log_utils.py:549]  'transfers_total_bytes': 0,
INFO 01-03 17:10:12 [log_utils.py:549]  'stages': {0: {'stage_gen_time_ms': 26936.877489089966,
INFO 01-03 17:10:12 [log_utils.py:549]                 'num_tokens_out': 0,
INFO 01-03 17:10:12 [log_utils.py:549]                 'num_tokens_in': 0}}}
Processed prompts: 100%|███████████████████████████████████| 1/1 [00:26<00:00, 26.95s/img, est. speed stage-0 img/s: 0.00, avg e2e_lat: 0.0ms]
INFO 01-03 17:10:12 [omni.py:687] [Summary] {'e2e_requests': 1,1 [00:26<00:00, 26.95s/img, est. speed stage-0 img/s: 0.00, avg e2e_lat: 0.0ms]
INFO 01-03 17:10:12 [omni.py:687]  'e2e_total_time_ms': 26955.371618270874,
INFO 01-03 17:10:12 [omni.py:687]  'e2e_sum_time_ms': 26954.47301864624,
INFO 01-03 17:10:12 [omni.py:687]  'e2e_total_tokens': 0,
INFO 01-03 17:10:12 [omni.py:687]  'e2e_avg_time_per_request_ms': 26954.47301864624,
INFO 01-03 17:10:12 [omni.py:687]  'e2e_avg_tokens_per_s': 0.0,
INFO 01-03 17:10:12 [omni.py:687]  'wall_time_ms': 26955.371618270874,
INFO 01-03 17:10:12 [omni.py:687]  'final_stage_id': {'0_eb52da8d-61b8-40e8-b7a1-6682171e6411': 0},
INFO 01-03 17:10:12 [omni.py:687]  'stages': [{'stage_id': 0,
INFO 01-03 17:10:12 [omni.py:687]              'requests': 1,
INFO 01-03 17:10:12 [omni.py:687]              'tokens': 0,
INFO 01-03 17:10:12 [omni.py:687]              'total_time_ms': 26954.70142364502,
INFO 01-03 17:10:12 [omni.py:687]              'avg_time_per_request_ms': 26954.70142364502,
INFO 01-03 17:10:12 [omni.py:687]              'avg_tokens_per_s': 0.0}],
INFO 01-03 17:10:12 [omni.py:687]  'transfers': []}
Adding requests:   0%|                                                                                                  | 0/1 [00:26<?, ?it/s]
[Stage-0] ERROR 01-03 17:10:12 [omni_stage.py:636] Received shutdown signal
[Stage-0] INFO 01-03 17:10:12 [gpu_worker.py:265] Worker 0: Received shutdown message
[Stage-0] INFO 01-03 17:10:12 [gpu_worker.py:287] event loop terminated.
[Stage-0] INFO 01-03 17:10:12 [gpu_worker.py:265] Worker 1: Received shutdown message
[Stage-0] INFO 01-03 17:10:12 [gpu_worker.py:287] event loop terminated.
[Stage-0] INFO 01-03 17:10:12 [npu_worker.py:126] Worker 1: Shutdown complete.
[Stage-0] INFO 01-03 17:10:12 [npu_worker.py:126] Worker 0: Shutdown complete.
INFO 01-03 17:10:17 [text_to_image.py:169] Outputs: [OmniRequestOutput(request_id='', finished=True, stage_id=0, final_output_type='image', request_output=[OmniRequestOutput(request_id='0_eb52da8d-61b8-40e8-b7a1-6682171e6411', finished=True, stage_id=None, final_output_type='image', request_output=None, images=[1 PIL Images], prompt='a cup of coffee on the table', latents=None, metrics={})], images=[], prompt=None, latents=None, metrics={})]
Saved generated image to qwen_image_output.png

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

@gcanlin gcanlin requested a review from hsliuustc0106 as a code owner January 3, 2026 16:43
Signed-off-by: gcanlin <[email protected]>
@gcanlin gcanlin mentioned this pull request Jan 3, 2026
1 task
@gcanlin
Copy link
Contributor Author

gcanlin commented Jan 3, 2026

cc @mxuax @Gaohan123

Copy link
Collaborator

@hsliuustc0106 hsliuustc0106 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@hsliuustc0106 hsliuustc0106 added the ready label to trigger buildkite CI label Jan 4, 2026
@hsliuustc0106 hsliuustc0106 linked an issue Jan 4, 2026 that may be closed by this pull request
1 task
@hsliuustc0106 hsliuustc0106 merged commit 1c9e58e into vllm-project:main Jan 4, 2026
6 of 7 checks passed
princepride pushed a commit to princepride/vllm-omni that referenced this pull request Jan 10, 2026
ZJY0516 pushed a commit to LawJarp-A/vllm-omni that referenced this pull request Jan 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready label to trigger buildkite CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug][NPU]: USP fail [Bug]: TypeError: 'AscendAttentionBackendImpl' object is not callable

2 participants