-
-
Notifications
You must be signed in to change notification settings - Fork 11.7k
[Frontend] Refactor prompt processing #4028
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ions API and legacy Completions API
vllm.entrypoints.openai|
Seems that #4032 fixed the LoRA bugs, however |
|
Update: I found that it is due to a bug in my refactored parsing, my bad. I have fixed it just now. |
I'm updating |
I've moved out the logging to a separate class |
|
I have finished addressing your comments. |
njhill
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @DarkLight1337!
Co-authored-by: Roger Wang <[email protected]>
Co-authored-by: Roger Wang <[email protected]> Signed-off-by: Alvant <[email protected]>
Co-authored-by: Roger Wang <[email protected]> Signed-off-by: LeiWang1999 <[email protected]>
This PR refactors various parts of the OpenAI-compatible server:
_validate_prompt_and_tokenizemethod has been decomposed so thatpromptandprompt_idsare processed separately.promptandprompt_idshas been moved fromvllm.AsyncLLMEnginetovllm.entrypoints.logger.RequestLoggersuch that redundant data is no longer passed into the core engine. This also enables logging for tokenization endpoints.request_idbased on the endpoint type:cmpl-*(as before)chat-*embd-*tokn-*