[Feature] Send response with request id#301
Conversation
594de26 to
15b350c
Compare
|
@Gaohan123 @fake0fan @hsliuustc0106 i would appreciate for your feedback. |
this change seems to help fix an important bug in entrypoint, I was wondering whether you can help to provide a systematic test plan for qwen2.5/3-Omni for online serving. |
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
5f979a7 to
344a91e
Compare
Signed-off-by: dengyunyang <[email protected]>
344a91e to
7d40b4e
Compare
Signed-off-by: dengyunyang <[email protected]> Signed-off-by: Fanli Lin <[email protected]>
Signed-off-by: dengyunyang <[email protected]> Signed-off-by: wangyu31577 <[email protected]>
Signed-off-by: dengyunyang <[email protected]>
PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.
Purpose
async_omni get response according to request id, modify same as vllm:
generate:
init a dedicated queue for this request, and async await response from this queue
ouput_handler:
init loop to accept respones, and send to correct request queue according to request id.
Test Plan
benmark:
Test Result
The response is correct:
With Pipe line execute, the execute time reduce from: 789s to 642s
If requests are more balanced across different stages, the benefits will be more significant.
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)