Skip to content

Comments

[Feature] Send response with request id#301

Merged
hsliuustc0106 merged 1 commit intovllm-project:mainfrom
Bounty-hunter:send_respond_with_id
Dec 16, 2025
Merged

[Feature] Send response with request id#301
hsliuustc0106 merged 1 commit intovllm-project:mainfrom
Bounty-hunter:send_respond_with_id

Conversation

@Bounty-hunter
Copy link
Contributor

@Bounty-hunter Bounty-hunter commented Dec 12, 2025

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

async_omni get response according to request id, modify same as vllm:
generate:
init a dedicated queue for this request, and async await response from this queue

ouput_handler:
init loop to accept respones, and send to correct request queue according to request id.

Test Plan

benmark:

#!/bin/bash

start=$(date +%s)


python openai_chat_completion_client_for_multimodal_generation.py \
    --query-type text \
    --prompt "Generate a 400-character introduction about Huawei." \
    > out1.log 2>&1 &

python openai_chat_completion_client_for_multimodal_generation.py \
    --query-type text \
    --prompt "Generate a 400-character introduction about Beijing." \
    > out2.log 2>&1 &

python openai_chat_completion_client_for_multimodal_generation.py \
    --query-type text \
    --prompt "Generate a 400-character introduction about wuhan." \
    > out3.log 2>&1 &

python openai_chat_completion_client_for_multimodal_generation.py \
    --query-type text \
    --prompt "Generate a 400-character introduction about shenzhen." \
    > out4.log 2>&1 &

wait


end=$(date +%s)

echo "All tasks completed."
echo "Total elapsed time: $((end - start)) seconds"

Test Result

The response is correct:

[root@devserver-bms-163 qwen2_5_omni]# cat out1.log 
Chat completion output from text: Huawei is a Chinese tech giant. It's known for its high - quality mobile phones like the P series and Mate series. They also have great networking equipment. Their products are sold worldwide. Huawei has a large R D team that constantly innovates. They're really important in the global tech industry.If you want to know more about Huawei, like their specific technologies or business strategies, feel free to ask me.
Audio saved to audio_0.wav
[root@devserver-bms-163 qwen2_5_omni]# cat out2.log 
Chat completion output from text: Beijing is an amazing city. It's the capital of China with a rich history. There are ancient palaces like the Forbidden City that show off its long - standing culture. The Great Wall is also nearby, a symbol of strength. It has modern skyscrapers too, showing it's a big business hub. The food there is diverse, from Peking duck to all kinds of local snacks. And the people are friendly and welcoming. It's really a place full of contrasts. So, what do you think? Do you want to know more about any specific part of Beijing?
Audio saved to audio_0.wav
[root@devserver-bms-163 qwen2_5_omni]# cat out3.log 
Chat completion output from text: Wuhan is a city in China. It's got a population of over ten million. It has a long history, being an important cultural and economic center. There are many famous places like the Yellow Crane Tower. The food there is also amazing, with things like hot and dry noodles. It's a place full of energy and vitality.If you want to know more about Wuhan, like its attractions or local culture, feel free to ask me.
Audio saved to audio_0.wav
[root@devserver-bms-163 qwen2_5_omni]# cat out4.log 
Chat completion output from text: Shenzhen is a really cool city in Guangdong Province. It's known for being super modern and full of innovation. There are lots of high - tech companies there. The city has a great business environment too. It attracts people from all over to start their own businesses. And it's also a place with a lot of cultural diversity. You can find different cuisines everywhere you go. It's growing fast, always changing and becoming more amazing.If you want to know more about Shenzhen, like its famous attractions or local life, feel free to ask me.
Audio saved to audio_0.wav

With Pipe line execute, the execute time reduce from: 789s to 642s

image image

If requests are more balanced across different stages, the benefits will be more significant.


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

@Bounty-hunter Bounty-hunter force-pushed the send_respond_with_id branch 2 times, most recently from 594de26 to 15b350c Compare December 12, 2025 09:13
@Bounty-hunter Bounty-hunter changed the title [WIP] send response with request id [Feature] Send response with request id Dec 12, 2025
@Bounty-hunter
Copy link
Contributor Author

@Gaohan123 @fake0fan @hsliuustc0106
Please help review this PR, it address #286 and #293 :
(1) Concurrent requests cannot be processed in a pipeline manner.
(2) Responses of concurrent requests may get mixed up.

i would appreciate for your feedback.

@hsliuustc0106
Copy link
Collaborator

@Gaohan123 @fake0fan @hsliuustc0106 Please help review this PR, it address #286 and #293 : (1) Concurrent requests cannot be processed in a pipeline manner. (2) Responses of concurrent requests may get mixed up.

i would appreciate for your feedback.

this change seems to help fix an important bug in entrypoint, I was wondering whether you can help to provide a systematic test plan for qwen2.5/3-Omni for online serving.

@david6666666
Copy link
Collaborator

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@david6666666 david6666666 linked an issue Dec 15, 2025 that may be closed by this pull request
1 task
@Bounty-hunter Bounty-hunter force-pushed the send_respond_with_id branch 3 times, most recently from 5f979a7 to 344a91e Compare December 16, 2025 12:01
Copy link
Collaborator

@hsliuustc0106 hsliuustc0106 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@hsliuustc0106 hsliuustc0106 added the ready label to trigger buildkite CI label Dec 16, 2025
@hsliuustc0106 hsliuustc0106 enabled auto-merge (squash) December 16, 2025 15:14
@hsliuustc0106 hsliuustc0106 merged commit bd53347 into vllm-project:main Dec 16, 2025
4 checks passed
faaany pushed a commit to faaany/vllm-omni that referenced this pull request Dec 19, 2025
yenuo26 pushed a commit to yenuo26/vllm-omni that referenced this pull request Dec 29, 2025
princepride pushed a commit to princepride/vllm-omni that referenced this pull request Jan 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready label to trigger buildkite CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature]: Async omni client get the stages response according to request id

3 participants