[Bugfix] yield when stage response by Bounty-hunter · Pull Request #287 · vllm-project/vllm-omni

Bounty-hunter · 2025-12-11T12:50:06Z

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

As described in #286
, the execution becomes blocked while waiting for the stage response. We need to yield control so that other tasks can proceed.

Test Plan

Result:

The log as follow:

[Stage-0] Received batch size=1, request_ids=chatcmpl-23ee5fcf98294c3f88087feae38f9f94
--------------------------------
(APIServer pid=106638) INFO:vllm_omni.entrypoints.openai.serving_chat:dyyyyyy come to generate  1
(APIServer pid=106638) INFO:vllm_omni.entrypoints.openai.serving_chat:dyyyyyy come to generate chatcmpl-f55e61b8fa1a4305aec7d2aa3b81eefc
(APIServer pid=106638) INFO:vllm_omni.entrypoints.async_omni:[Orchestrator] generate() called
(APIServer pid=106638) INFO:vllm_omni.entrypoints.async_omni:[Orchestrator] Seeding request into stage-0
(APIServer pid=106638) INFO:vllm_omni.entrypoints.async_omni:[Orchestrator] Enqueued request chatcmpl-f55e61b8fa1a4305aec7d2aa3b81eefc to stage-0
(APIServer pid=106638) INFO:vllm_omni.entrypoints.async_omni:[Orchestrator] Entering scheduling loop: stages=3
('Warning: torch.save with "_use_new_zipfile_serialization = False" is not recommended for npu tensor, which may bring unexpected errors and hopefully set "_use_new_zipfile_serialization = True"', 'if it is necessary to use this, please convert the npu tensor to cpu tensor for saving')
--------------------------------
[Stage-0] Received batch size=1, request_ids=chatcmpl-f55e61b8fa1a4305aec7d2aa3b81eefc
--------------------------------
(APIServer pid=106638) INFO:vllm_omni.entrypoints.async_omni:[Orchestrator] Stage-0 completed request chatcmpl-23ee5fcf98294c3f88087feae38f9f94;                         forwarding or finalizing
(APIServer pid=106638) INFO:vllm_omni.entrypoints.async_omni:[Orchestrator] Request chatcmpl-23ee5fcf98294c3f88087feae38f9f94 finalized at stage-0
(APIServer pid=106638) ('Warning: torch.save with "_use_new_zipfile_serialization = False" is not recommended for npu tensor, which may bring unexpected errors and hopefully set "_use_new_zipfile_serialization = True"', 'if it is necessary to use this, please convert the npu tensor to cpu tensor for saving')
--------------------------------
[Stage-1] Received batch size=1, request_ids=chatcmpl-23ee5fcf98294c3f88087feae38f9f94
--------------------------------
(EngineCore_DP0 pid=107787) /workspace/d00806799/code/epd/vllm-omni/vllm_omni/worker/npu/npu_model_runner.py:190: UserWarning: The given NumPy array is not writable, and PyTorch does not support non-writable tensors. This means writing to this tensor will result in undefined behavior. You may want to copy the array to protect its data or make it writable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at /pytorch/torch/csrc/utils/tensor_numpy.cpp:203.)
(EngineCore_DP0 pid=107787)   info_dict[k] = torch.from_numpy(arr)
(APIServer pid=106638) INFO:vllm_omni.entrypoints.async_omni:[Orchestrator] Stage-0 completed request chatcmpl-f55e61b8fa1a4305aec7d2aa3b81eefc;                         forwarding or finalizing
(APIServer pid=106638) INFO:vllm_omni.entrypoints.async_omni:[Orchestrator] Request chatcmpl-f55e61b8fa1a4305aec7d2aa3b81eefc finalized at stage-0
('Warning: torch.save with "_use_new_zipfile_serialization = False" is not recommended for npu tensor, which may bring unexpected errors and hopefully set "_use_new_zipfile_serialization = True"', 'if it is necessary to use this, please convert the npu tensor to cpu tensor for saving')
--------------------------------
[Stage-1] Received batch size=1, request_ids=chatcmpl-f55e61b8fa1a4305aec7d2aa3b81eefc
--------------------------------
(APIServer pid=106638) INFO:vllm_omni.entrypoints.async_omni:[Orchestrator] Stage-1 completed request chatcmpl-23ee5fcf98294c3f88087feae38f9f94;                         forwarding or finalizing
--------------------------------
[Stage-2] Received batch size=1, request_ids=chatcmpl-23ee5fcf98294c3f88087feae38f9f94
--------------------------------
(EngineCore_DP0 pid=108698) INFO:vllm_omni.model_executor.models.qwen2_5_omni.qwen2_5_omni:Currently, we do not use the chunked process, we only use the token2wav.process_chunk for the whole sequence. The stream mode will be implemented in the future.
INFO 12-11 20:43:20 [__init__.py:36] Available plugins for group vllm.platform_plugins:
INFO 12-11 20:43:20 [__init__.py:38] - ascend -> vllm_ascend:register
INFO 12-11 20:43:20 [__init__.py:41] All plugins in this group will be loaded. Set `VLLM_PLUGINS` to control which plugins to load.
INFO 12-11 20:43:20 [__init__.py:207] Platform plugin ascend is activated
INFO 12-11 20:43:24 [importing.py:63] Triton not installed or not compatible; certain GPU-related functions will not be available.
WARNING 12-11 20:43:26 [_custom_ops.py:20] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'")
INFO:datasets:PyTorch version 2.7.1 available.
(APIServer pid=106638) INFO:vllm_omni.entrypoints.async_omni:[Orchestrator] Stage-1 completed request chatcmpl-f55e61b8fa1a4305aec7d2aa3b81eefc;                         forwarding or finalizing
......('Warning: torch.save with "_use_new_zipfile_serialization = False" is not recommended for npu tensor, which may bring unexpected errors and hopefully set "_use_new_zipfile_serialization = True"', 'if it is necessary to use this, please convert the npu tensor to cpu tensor for saving')
--------------------------------
[Stage-2] Received batch size=1, request_ids=chatcmpl-f55e61b8fa1a4305aec7d2aa3b81eefc
--------------------------------
(APIServer pid=106638) INFO:vllm_omni.entrypoints.async_omni:[Orchestrator] Stage-2 completed request chatcmpl-23ee5fcf98294c3f88087feae38f9f94;                         forwarding or finalizing
(APIServer pid=106638) INFO:vllm_omni.entrypoints.async_omni:[Orchestrator] Request chatcmpl-23ee5fcf98294c3f88087feae38f9f94 finalized at stage-2
(APIServer pid=106638) INFO:vllm_omni.entrypoints.async_omni:[Orchestrator] All requests completed
(APIServer pid=106638) INFO:vllm_omni.entrypoints.async_omni:[Summary] {'e2e_requests': 1, 'e2e_total_time_ms': 261757.46083259583, 'e2e_sum_time_ms': 261757.03525543213, 'e2e_total_tokens': 0, 'e2e_avg_time_per_request_ms': 261757.03525543213, 'e2e_avg_tokens_per_s': 0.0, 'wall_time_ms': 261757.46083259583, 'final_stage_id': 2, 'stages': [{'stage_id': 0, 'requests': 2, 'tokens': 170, 'total_time_ms': 9340.158939361572, 'avg_time_per_request_ms': 4670.079469680786, 'avg_tokens_per_s': 18.2009750694478}, {'stage_id': 1, 'requests': 1, 'tokens': 1005, 'total_time_ms': 48908.53309631348, 'avg_time_per_request_ms': 48908.53309631348, 'avg_tokens_per_s': 20.54856149582112}, {'stage_id': 2, 'requests': 1, 'tokens': 0, 'total_time_ms': 209500.1938343048, 'avg_time_per_request_ms': 209500.1938343048, 'avg_tokens_per_s': 0.0}], 'transfers': [{'from_stage': 0, 'to_stage': 1, 'samples': 2, 'total_bytes': 4109230, 'total_time_ms': 18.233299255371094, 'tx_mbps': 1802.9562033495476, 'rx_samples': 1, 'rx_total_bytes': 1681800, 'rx_total_time_ms': 8.18181037902832, 'rx_mbps': 1644.4282349156395, 'total_samples': 1, 'total_transfer_t

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

hsliuustc0106 · 2025-12-11T12:59:40Z

please update the execution time comparison

Gaohan123

Thanks for a nice catch!. Please use git commit -s --amend to pass the DCO check.

fake0fan

LGTM. But this is just the first step, ensuring that different requests can be switched between processes during execution. We will continue to solve parallel issues and out-of-order return results.

Bounty-hunter · 2025-12-12T01:25:45Z

Thanks for a nice catch!. Please use git commit -s --amend to pass the DCO check.

Okay, but I will try to address it together with #293
, which involves a response mismatch issue among concurrent requests.

Bounty-hunter · 2025-12-12T01:26:59Z

LGTM. But this is just the first step, ensuring that different requests can be switched between processes during execution. We will continue to solve parallel issues and out-of-order return results.

#293 track this problem

david6666666 · 2025-12-12T02:18:35Z

nice catch

david6666666 · 2025-12-15T01:31:06Z

@Bounty-hunter fix DCO

Bounty-hunter · 2025-12-15T01:39:23Z

@Bounty-hunter fix DCO

I will close this pr. when it resolved, new problem has emerged where responses are being mismatched across concurrent requests, as describe in #293, I both of them in #301 , please help to review

yield when no respone

e810b9e

Bounty-hunter requested a review from hsliuustc0106 as a code owner December 11, 2025 12:50

Bounty-hunter changed the title ~~[Bugfix]yield when no respone~~ [Bugfix] yield when stage response Dec 11, 2025

hsliuustc0106 linked an issue Dec 11, 2025 that may be closed by this pull request

[Bug]: Concurrent requests fail to execute in a pipelined manner in online mode #286

Closed

1 task

Gaohan123 reviewed Dec 11, 2025

View reviewed changes

fake0fan approved these changes Dec 12, 2025

View reviewed changes

hsliuustc0106 removed a link to an issue Dec 12, 2025

[Bug]: Concurrent requests fail to execute in a pipelined manner in online mode #286

Closed

1 task

hsliuustc0106 linked an issue Dec 12, 2025 that may be closed by this pull request

[Bug]: Concurrent requests fail to execute in a pipelined manner in online mode #286

Closed

1 task

Bounty-hunter closed this Dec 15, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bugfix] yield when stage response#287

[Bugfix] yield when stage response#287
Bounty-hunter wants to merge 1 commit intovllm-project:mainfrom
Bounty-hunter:async_sleep

Bounty-hunter commented Dec 11, 2025 •

edited

Loading

Uh oh!

hsliuustc0106 commented Dec 11, 2025

Uh oh!

Gaohan123 left a comment

Uh oh!

fake0fan left a comment

Uh oh!

Bounty-hunter commented Dec 12, 2025

Uh oh!

Bounty-hunter commented Dec 12, 2025

Uh oh!

david6666666 commented Dec 12, 2025

Uh oh!

david6666666 commented Dec 15, 2025

Uh oh!

Bounty-hunter commented Dec 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

Bounty-hunter commented Dec 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

hsliuustc0106 commented Dec 11, 2025

Uh oh!

Gaohan123 left a comment

Choose a reason for hiding this comment

Uh oh!

fake0fan left a comment

Choose a reason for hiding this comment

Uh oh!

Bounty-hunter commented Dec 12, 2025

Uh oh!

Bounty-hunter commented Dec 12, 2025

Uh oh!

david6666666 commented Dec 12, 2025

Uh oh!

david6666666 commented Dec 15, 2025

Uh oh!

Bounty-hunter commented Dec 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Bounty-hunter commented Dec 11, 2025 •

edited

Loading