Skip to content

Conversation

@kouroshHakha
Copy link
Collaborator

@kouroshHakha kouroshHakha commented Jun 28, 2025

Essential Elements of an Effective PR Description Checklist

  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

Purpose

The high level goal is to provide a better modularization for cli args.

This PR introduces a new bucket for capturing all non engine arg cli args in a group called FrontendArgs.
The CLI behavior should be equivalent. But this allows higher level integrations (e.g. Ray Serve LLM) to clearly separate out available EngineArgs and FrontendArgs and treat them differently.

Test Plan

The CLI args are already tested in test_cli_args.py. So it should hopefully cover and catch any potential regression.

In terms of visual testing, I ran vllm serve --help between main and this branch and asked AI to summarize the diffs. There is no diff to basically worry about.

Test Result

Here is the summary:

Aspect First CLI Second CLI Notes
Boolean flags with explicit disable form Mostly single --flag Paired --flag, --no-flag for most booleans Allows explicit disabling via --no-flag in second CLI, not available in first CLI.
Frontend options grouped under "Frontend" header Not grouped Grouped separately Organizational only, no functional difference.
--chat-template-content-format choices order {auto,string,openai} {auto,openai,string} Order difference only, no effect.
--uvicorn-log-level choices ordering debug, info, warning, error, critical, trace critical, debug, error, info, trace, warning Ordering difference; same options available.
Additional notes/warnings in descriptions Minimal Some flags have extra notes/warnings E.g., --enable-request-id-headers notes on performance impact.
Deprecated flags Present, marked deprecated without paired Same with paired forms No effect on behavior but second CLI clearer about deprecated flags.

(Optional) Documentation Update

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
@github-actions
Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

@kouroshHakha kouroshHakha changed the title [WIP] Refactor CLI Args for a better modular integration [WIP][NOT READY] Refactor CLI Args for a better modular integration Jun 28, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @kouroshHakha, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request primarily focuses on improving the modularity of CLI argument handling for the OpenAI API server and optimizing the internal management of LoRA adapters. The changes aim to enhance code organization and efficiency, particularly in how command-line options are processed and how LoRA models are tracked and accessed.

Highlights

  • CLI Argument Refactoring: I've refactored the command-line argument parsing for the OpenAI API server by introducing a new FrontendArgs dataclass. This class now encapsulates all frontend-related arguments, making the make_arg_parser function more modular and easier to extend. Arguments are now added dynamically from the dataclass fields.
  • LoRA Adapter Management Optimization: I've changed the internal storage of LoRA (Low-Rank Adaptation) adapter requests from a list to a dictionary (specifically self.lora_requests in serving_models.py). This change allows for more efficient lookup, addition, and removal of LoRA adapters by their name, improving performance and clarity in managing multiple adapters.
  • Debugging and Logging Additions: I've added several temporary debug logging statements, prefixed with [Kourosh], across api_server.py, serving_chat.py, serving_engine.py, and serving_models.py. These logs are intended to assist in tracing execution flow and variable states during development and debugging.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@mergify mergify bot added the frontend label Jun 28, 2025
@mergify
Copy link

mergify bot commented Jun 28, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @kouroshHakha.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Jun 28, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the command-line argument parsing for the OpenAI-compatible server, which is a great step towards better modularity and maintainability. The changes to LoRA request handling also improve performance. My main concerns are the presence of numerous debug log statements and a large block of commented-out code, which should be cleaned up. Additionally, there's a potentially breaking change in how list-based CLI arguments are parsed, which needs to be addressed or documented.

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
@mergify mergify bot removed the needs-rebase label Jul 1, 2025
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
@mergify
Copy link

mergify bot commented Jul 2, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @kouroshHakha.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Jul 2, 2025
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
@mergify mergify bot removed the needs-rebase label Jul 8, 2025
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
@kouroshHakha kouroshHakha changed the title [WIP][NOT READY] Refactor CLI Args for a better modular integration [WIP][frontend] Refactor CLI Args for a better modular integration Jul 8, 2025
@kouroshHakha
Copy link
Collaborator Author

@gemini-code-assist please re-review?

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The code changes introduce a new bucket for capturing all non engine arg cli args in a group called FrontendArgs. The CLI behavior should be equivalent. But this allows higher level integrations (e.g. Ray Serve LLM) to clearly separate out available EngineArgs and FrontendArgs and treat them differently.

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
@kouroshHakha kouroshHakha added the ready ONLY add when PR is ready to merge/full CI is needed label Jul 8, 2025
@kouroshHakha kouroshHakha changed the title [WIP][frontend] Refactor CLI Args for a better modular integration [frontend] Refactor CLI Args for a better modular integration Jul 8, 2025
@kouroshHakha kouroshHakha marked this pull request as ready for review July 8, 2025 18:06
@kouroshHakha kouroshHakha requested a review from aarnphm as a code owner July 8, 2025 18:06
@simon-mo simon-mo requested a review from hmellor July 8, 2025 20:47
@kouroshHakha
Copy link
Collaborator Author

The failed tests are not related, I think.

@hmellor
Copy link
Member

hmellor commented Jul 14, 2025

Ok, in that case could you merge from main to get the fixes?

@hmellor
Copy link
Member

hmellor commented Jul 15, 2025

Based on the time you merged, it looks like the TPU V1 test was failing on main 61e2082

@vllm-bot vllm-bot merged commit f148c44 into vllm-project:main Jul 15, 2025
64 of 66 checks passed
aslonnie pushed a commit to ray-project/ray that referenced this pull request Aug 4, 2025
- Use upstream `RayPrometheusStatLogger` to close spec. decode + lora
errors
- Include fix for vllm-project/vllm#20647
  - Restore PP=2 to DeepSeek-V2-Lite release test
  - Remove copy of `FrontendArgs` upstreamed with vllm-project/vllm#20206

Closes #54952

Includes fix for #54812

---------

Signed-off-by: Seiji Eicher <seiji@anyscale.com>
elliot-barn pushed a commit to ray-project/ray that referenced this pull request Aug 4, 2025
- Use upstream `RayPrometheusStatLogger` to close spec. decode + lora
errors
- Include fix for vllm-project/vllm#20647
  - Restore PP=2 to DeepSeek-V2-Lite release test
  - Remove copy of `FrontendArgs` upstreamed with vllm-project/vllm#20206

Closes #54952

Includes fix for #54812

---------

Signed-off-by: Seiji Eicher <seiji@anyscale.com>
elliot-barn pushed a commit to ray-project/ray that referenced this pull request Aug 4, 2025
- Use upstream `RayPrometheusStatLogger` to close spec. decode + lora
errors
- Include fix for vllm-project/vllm#20647
  - Restore PP=2 to DeepSeek-V2-Lite release test
  - Remove copy of `FrontendArgs` upstreamed with vllm-project/vllm#20206

Closes #54952

Includes fix for #54812

---------

Signed-off-by: Seiji Eicher <seiji@anyscale.com>
kamil-kaczmarek pushed a commit to ray-project/ray that referenced this pull request Aug 4, 2025
- Use upstream `RayPrometheusStatLogger` to close spec. decode + lora
errors
- Include fix for vllm-project/vllm#20647
  - Restore PP=2 to DeepSeek-V2-Lite release test
  - Remove copy of `FrontendArgs` upstreamed with vllm-project/vllm#20206

Closes #54952

Includes fix for #54812

---------

Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Signed-off-by: Kamil Kaczmarek <kamil@anyscale.com>
mjacar pushed a commit to mjacar/ray that referenced this pull request Aug 5, 2025
- Use upstream `RayPrometheusStatLogger` to close spec. decode + lora
errors
- Include fix for vllm-project/vllm#20647
  - Restore PP=2 to DeepSeek-V2-Lite release test
  - Remove copy of `FrontendArgs` upstreamed with vllm-project/vllm#20206

Closes ray-project#54952

Includes fix for ray-project#54812

---------

Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Signed-off-by: Michael Acar <michael.j.acar@gmail.com>
x22x22 pushed a commit to x22x22/vllm that referenced this pull request Aug 5, 2025
…roject#20206)

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: x22x22 <wadeking@qq.com>
Pradyun92 pushed a commit to Pradyun92/vllm that referenced this pull request Aug 6, 2025
…roject#20206)

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
npanpaliya pushed a commit to odh-on-pz/vllm-upstream that referenced this pull request Aug 6, 2025
…roject#20206)

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
jinzhen-lin pushed a commit to jinzhen-lin/vllm that referenced this pull request Aug 9, 2025
…roject#20206)

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>
sampan-s-nayak pushed a commit to ray-project/ray that referenced this pull request Aug 12, 2025
- Use upstream `RayPrometheusStatLogger` to close spec. decode + lora
errors
- Include fix for vllm-project/vllm#20647
  - Restore PP=2 to DeepSeek-V2-Lite release test
  - Remove copy of `FrontendArgs` upstreamed with vllm-project/vllm#20206

Closes #54952

Includes fix for #54812

---------

Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Signed-off-by: sampan <sampan@anyscale.com>
paulpak58 pushed a commit to paulpak58/vllm that referenced this pull request Aug 13, 2025
…roject#20206)

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Paul Pak <paulpak58@gmail.com>
diegocastanibm pushed a commit to diegocastanibm/vllm that referenced this pull request Aug 15, 2025
…roject#20206)

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Diego-Castan <diego.castan@ibm.com>
epwalsh pushed a commit to epwalsh/vllm that referenced this pull request Aug 27, 2025
…roject#20206)

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
jugalshah291 pushed a commit to jugalshah291/ray_fork that referenced this pull request Sep 11, 2025
- Use upstream `RayPrometheusStatLogger` to close spec. decode + lora
errors
- Include fix for vllm-project/vllm#20647
  - Restore PP=2 to DeepSeek-V2-Lite release test
  - Remove copy of `FrontendArgs` upstreamed with vllm-project/vllm#20206

Closes ray-project#54952

Includes fix for ray-project#54812

---------

Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Signed-off-by: jugalshah291 <shah.jugal291@gmail.com>
dstrodtman pushed a commit to ray-project/ray that referenced this pull request Oct 6, 2025
- Use upstream `RayPrometheusStatLogger` to close spec. decode + lora
errors
- Include fix for vllm-project/vllm#20647
  - Restore PP=2 to DeepSeek-V2-Lite release test
  - Remove copy of `FrontendArgs` upstreamed with vllm-project/vllm#20206

Closes #54952

Includes fix for #54812

---------

Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Signed-off-by: Douglas Strodtman <douglas@anyscale.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

frontend ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants