-
-
Notifications
You must be signed in to change notification settings - Fork 12.9k
[frontend] Refactor CLI Args for a better modular integration #20206
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[frontend] Refactor CLI Args for a better modular integration #20206
Conversation
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
Hello @kouroshHakha, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request primarily focuses on improving the modularity of CLI argument handling for the OpenAI API server and optimizing the internal management of LoRA adapters. The changes aim to enhance code organization and efficiency, particularly in how command-line options are processed and how LoRA models are tracked and accessed.
Highlights
- CLI Argument Refactoring: I've refactored the command-line argument parsing for the OpenAI API server by introducing a new
FrontendArgsdataclass. This class now encapsulates all frontend-related arguments, making themake_arg_parserfunction more modular and easier to extend. Arguments are now added dynamically from the dataclass fields. - LoRA Adapter Management Optimization: I've changed the internal storage of LoRA (Low-Rank Adaptation) adapter requests from a list to a dictionary (specifically
self.lora_requestsinserving_models.py). This change allows for more efficient lookup, addition, and removal of LoRA adapters by their name, improving performance and clarity in managing multiple adapters. - Debugging and Logging Additions: I've added several temporary debug logging statements, prefixed with
[Kourosh], acrossapi_server.py,serving_chat.py,serving_engine.py, andserving_models.py. These logs are intended to assist in tracing execution flow and variable states during development and debugging.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.
| Feature | Command | Description |
|---|---|---|
| Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
| Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
| Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
| Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
|
This pull request has merge conflicts that must be resolved before it can be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request refactors the command-line argument parsing for the OpenAI-compatible server, which is a great step towards better modularity and maintainability. The changes to LoRA request handling also improve performance. My main concerns are the presence of numerous debug log statements and a large block of commented-out code, which should be cleaned up. Additionally, there's a potentially breaking change in how list-based CLI arguments are parsed, which needs to be addressed or documented.
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
|
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
|
@gemini-code-assist please re-review? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
The code changes introduce a new bucket for capturing all non engine arg cli args in a group called FrontendArgs. The CLI behavior should be equivalent. But this allows higher level integrations (e.g. Ray Serve LLM) to clearly separate out available EngineArgs and FrontendArgs and treat them differently.
|
The failed tests are not related, I think. |
|
Ok, in that case could you merge from main to get the fixes? |
|
Based on the time you merged, it looks like the TPU V1 test was failing on main 61e2082 |
- Use upstream `RayPrometheusStatLogger` to close spec. decode + lora errors - Include fix for vllm-project/vllm#20647 - Restore PP=2 to DeepSeek-V2-Lite release test - Remove copy of `FrontendArgs` upstreamed with vllm-project/vllm#20206 Closes #54952 Includes fix for #54812 --------- Signed-off-by: Seiji Eicher <seiji@anyscale.com>
- Use upstream `RayPrometheusStatLogger` to close spec. decode + lora errors - Include fix for vllm-project/vllm#20647 - Restore PP=2 to DeepSeek-V2-Lite release test - Remove copy of `FrontendArgs` upstreamed with vllm-project/vllm#20206 Closes #54952 Includes fix for #54812 --------- Signed-off-by: Seiji Eicher <seiji@anyscale.com>
- Use upstream `RayPrometheusStatLogger` to close spec. decode + lora errors - Include fix for vllm-project/vllm#20647 - Restore PP=2 to DeepSeek-V2-Lite release test - Remove copy of `FrontendArgs` upstreamed with vllm-project/vllm#20206 Closes #54952 Includes fix for #54812 --------- Signed-off-by: Seiji Eicher <seiji@anyscale.com>
- Use upstream `RayPrometheusStatLogger` to close spec. decode + lora errors - Include fix for vllm-project/vllm#20647 - Restore PP=2 to DeepSeek-V2-Lite release test - Remove copy of `FrontendArgs` upstreamed with vllm-project/vllm#20206 Closes #54952 Includes fix for #54812 --------- Signed-off-by: Seiji Eicher <seiji@anyscale.com> Signed-off-by: Kamil Kaczmarek <kamil@anyscale.com>
- Use upstream `RayPrometheusStatLogger` to close spec. decode + lora errors - Include fix for vllm-project/vllm#20647 - Restore PP=2 to DeepSeek-V2-Lite release test - Remove copy of `FrontendArgs` upstreamed with vllm-project/vllm#20206 Closes ray-project#54952 Includes fix for ray-project#54812 --------- Signed-off-by: Seiji Eicher <seiji@anyscale.com> Signed-off-by: Michael Acar <michael.j.acar@gmail.com>
…roject#20206) Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com> Signed-off-by: x22x22 <wadeking@qq.com>
…roject#20206) Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
…roject#20206) Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
…roject#20206) Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com> Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>
- Use upstream `RayPrometheusStatLogger` to close spec. decode + lora errors - Include fix for vllm-project/vllm#20647 - Restore PP=2 to DeepSeek-V2-Lite release test - Remove copy of `FrontendArgs` upstreamed with vllm-project/vllm#20206 Closes #54952 Includes fix for #54812 --------- Signed-off-by: Seiji Eicher <seiji@anyscale.com> Signed-off-by: sampan <sampan@anyscale.com>
…roject#20206) Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com> Signed-off-by: Paul Pak <paulpak58@gmail.com>
…roject#20206) Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com> Signed-off-by: Diego-Castan <diego.castan@ibm.com>
…roject#20206) Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
- Use upstream `RayPrometheusStatLogger` to close spec. decode + lora errors - Include fix for vllm-project/vllm#20647 - Restore PP=2 to DeepSeek-V2-Lite release test - Remove copy of `FrontendArgs` upstreamed with vllm-project/vllm#20206 Closes ray-project#54952 Includes fix for ray-project#54812 --------- Signed-off-by: Seiji Eicher <seiji@anyscale.com> Signed-off-by: jugalshah291 <shah.jugal291@gmail.com>
- Use upstream `RayPrometheusStatLogger` to close spec. decode + lora errors - Include fix for vllm-project/vllm#20647 - Restore PP=2 to DeepSeek-V2-Lite release test - Remove copy of `FrontendArgs` upstreamed with vllm-project/vllm#20206 Closes #54952 Includes fix for #54812 --------- Signed-off-by: Seiji Eicher <seiji@anyscale.com> Signed-off-by: Douglas Strodtman <douglas@anyscale.com>
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.Purpose
The high level goal is to provide a better modularization for cli args.
This PR introduces a new bucket for capturing all non engine arg cli args in a group called FrontendArgs.
The CLI behavior should be equivalent. But this allows higher level integrations (e.g. Ray Serve LLM) to clearly separate out available EngineArgs and FrontendArgs and treat them differently.
Test Plan
The CLI args are already tested in test_cli_args.py. So it should hopefully cover and catch any potential regression.
In terms of visual testing, I ran
vllm serve --helpbetween main and this branch and asked AI to summarize the diffs. There is no diff to basically worry about.Test Result
Here is the summary:
(Optional) Documentation Update