[frontend] Refactor CLI Args for a better modular integration #20206

kouroshHakha · 2025-06-28T02:37:46Z

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

Purpose

The high level goal is to provide a better modularization for cli args.

This PR introduces a new bucket for capturing all non engine arg cli args in a group called FrontendArgs.
The CLI behavior should be equivalent. But this allows higher level integrations (e.g. Ray Serve LLM) to clearly separate out available EngineArgs and FrontendArgs and treat them differently.

Test Plan

The CLI args are already tested in test_cli_args.py. So it should hopefully cover and catch any potential regression.

In terms of visual testing, I ran vllm serve --help between main and this branch and asked AI to summarize the diffs. There is no diff to basically worry about.

Test Result

Here is the summary:

Aspect	First CLI	Second CLI	Notes
Boolean flags with explicit disable form	Mostly single --flag	Paired --flag, --no-flag for most booleans	Allows explicit disabling via --no-flag in second CLI, not available in first CLI.
Frontend options grouped under "Frontend" header	Not grouped	Grouped separately	Organizational only, no functional difference.
--chat-template-content-format choices order	{auto,string,openai}	{auto,openai,string}	Order difference only, no effect.
--uvicorn-log-level choices ordering	debug, info, warning, error, critical, trace	critical, debug, error, info, trace, warning	Ordering difference; same options available.
Additional notes/warnings in descriptions	Minimal	Some flags have extra notes/warnings	E.g., --enable-request-id-headers notes on performance impact.
Deprecated flags	Present, marked deprecated without paired	Same with paired forms	No effect on behavior but second CLI clearer about deprecated flags.

(Optional) Documentation Update

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

github-actions · 2025-06-28T02:37:55Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

gemini-code-assist

Summary of Changes

Hello @kouroshHakha, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request primarily focuses on improving the modularity of CLI argument handling for the OpenAI API server and optimizing the internal management of LoRA adapters. The changes aim to enhance code organization and efficiency, particularly in how command-line options are processed and how LoRA models are tracked and accessed.

Highlights

CLI Argument Refactoring: I've refactored the command-line argument parsing for the OpenAI API server by introducing a new FrontendArgs dataclass. This class now encapsulates all frontend-related arguments, making the make_arg_parser function more modular and easier to extend. Arguments are now added dynamically from the dataclass fields.
LoRA Adapter Management Optimization: I've changed the internal storage of LoRA (Low-Rank Adaptation) adapter requests from a list to a dictionary (specifically self.lora_requests in serving_models.py). This change allows for more efficient lookup, addition, and removal of LoRA adapters by their name, improving performance and clarity in managing multiple adapters.
Debugging and Logging Additions: I've added several temporary debug logging statements, prefixed with [Kourosh], across api_server.py, serving_chat.py, serving_engine.py, and serving_models.py. These logs are intended to assist in tracing execution flow and variable states during development and debugging.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

mergify · 2025-06-28T02:38:20Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @kouroshHakha.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

gemini-code-assist

Code Review

This pull request refactors the command-line argument parsing for the OpenAI-compatible server, which is a great step towards better modularity and maintainability. The changes to LoRA request handling also improve performance. My main concerns are the presence of numerous debug log statements and a large block of commented-out code, which should be cleaned up. Additionally, there's a potentially breaking change in how list-based CLI arguments are parsed, which needs to be addressed or documented.

vllm/entrypoints/openai/cli_args.py

vllm/entrypoints/openai/serving_chat.py

vllm/entrypoints/openai/serving_engine.py

vllm/entrypoints/openai/serving_models.py

vllm/utils.py

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

mergify · 2025-07-02T01:12:36Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @kouroshHakha.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

kouroshHakha · 2025-07-08T17:58:38Z

@gemini-code-assist please re-review?

gemini-code-assist

Code Review

The code changes introduce a new bucket for capturing all non engine arg cli args in a group called FrontendArgs. The CLI behavior should be equivalent. But this allows higher level integrations (e.g. Ray Serve LLM) to clearly separate out available EngineArgs and FrontendArgs and treat them differently.

vllm/entrypoints/openai/cli_args.py

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

kouroshHakha · 2025-07-14T16:58:04Z

The failed tests are not related, I think.

hmellor · 2025-07-14T17:33:06Z

Ok, in that case could you merge from main to get the fixes?

hmellor · 2025-07-15T09:14:39Z

Based on the time you merged, it looks like the TPU V1 test was failing on main 61e2082

- Use upstream `RayPrometheusStatLogger` to close spec. decode + lora errors - Include fix for vllm-project/vllm#20647 - Restore PP=2 to DeepSeek-V2-Lite release test - Remove copy of `FrontendArgs` upstreamed with vllm-project/vllm#20206 Closes #54952 Includes fix for #54812 --------- Signed-off-by: Seiji Eicher <seiji@anyscale.com>

- Use upstream `RayPrometheusStatLogger` to close spec. decode + lora errors - Include fix for vllm-project/vllm#20647 - Restore PP=2 to DeepSeek-V2-Lite release test - Remove copy of `FrontendArgs` upstreamed with vllm-project/vllm#20206 Closes #54952 Includes fix for #54812 --------- Signed-off-by: Seiji Eicher <seiji@anyscale.com> Signed-off-by: Kamil Kaczmarek <kamil@anyscale.com>

- Use upstream `RayPrometheusStatLogger` to close spec. decode + lora errors - Include fix for vllm-project/vllm#20647 - Restore PP=2 to DeepSeek-V2-Lite release test - Remove copy of `FrontendArgs` upstreamed with vllm-project/vllm#20206 Closes ray-project#54952 Includes fix for ray-project#54812 --------- Signed-off-by: Seiji Eicher <seiji@anyscale.com> Signed-off-by: Michael Acar <michael.j.acar@gmail.com>

…roject#20206) Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com> Signed-off-by: x22x22 <wadeking@qq.com>

…roject#20206) Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

…roject#20206) Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com> Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>

- Use upstream `RayPrometheusStatLogger` to close spec. decode + lora errors - Include fix for vllm-project/vllm#20647 - Restore PP=2 to DeepSeek-V2-Lite release test - Remove copy of `FrontendArgs` upstreamed with vllm-project/vllm#20206 Closes #54952 Includes fix for #54812 --------- Signed-off-by: Seiji Eicher <seiji@anyscale.com> Signed-off-by: sampan <sampan@anyscale.com>

…roject#20206) Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com> Signed-off-by: Paul Pak <paulpak58@gmail.com>

…roject#20206) Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com> Signed-off-by: Diego-Castan <diego.castan@ibm.com>

…roject#20206) Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

- Use upstream `RayPrometheusStatLogger` to close spec. decode + lora errors - Include fix for vllm-project/vllm#20647 - Restore PP=2 to DeepSeek-V2-Lite release test - Remove copy of `FrontendArgs` upstreamed with vllm-project/vllm#20206 Closes ray-project#54952 Includes fix for ray-project#54812 --------- Signed-off-by: Seiji Eicher <seiji@anyscale.com> Signed-off-by: jugalshah291 <shah.jugal291@gmail.com>

- Use upstream `RayPrometheusStatLogger` to close spec. decode + lora errors - Include fix for vllm-project/vllm#20647 - Restore PP=2 to DeepSeek-V2-Lite release test - Remove copy of `FrontendArgs` upstreamed with vllm-project/vllm#20206 Closes #54952 Includes fix for #54812 --------- Signed-off-by: Seiji Eicher <seiji@anyscale.com> Signed-off-by: Douglas Strodtman <douglas@anyscale.com>

kouroshHakha added 3 commits June 24, 2025 17:43

wip

6a71b0a

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

wip

f682310

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

enable lora does exist check

9d2d91a

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

kouroshHakha changed the title ~~[WIP] Refactor CLI Args for a better modular integration~~ [WIP][NOT READY] Refactor CLI Args for a better modular integration Jun 28, 2025

gemini-code-assist bot reviewed Jun 28, 2025

View reviewed changes

mergify bot added the frontend label Jun 28, 2025

mergify bot added the needs-rebase label Jun 28, 2025

gemini-code-assist bot reviewed Jun 28, 2025

View reviewed changes

kouroshHakha mentioned this pull request Jun 28, 2025

[Serve.llm][Prototype][WIP] Simplify LLMServer and inherit OpenAIServingChat behavior ray-project/ray#54189

Closed

Merge branch 'main' into kh/refactor-cli-args

1518876

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

mergify bot removed the needs-rebase label Jul 1, 2025

wip

35d766a

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

mergify bot added the needs-rebase label Jul 2, 2025

kouroshHakha added 2 commits July 7, 2025 18:52

Wip

c66a2af

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

Merge branch 'main' into kh/refactor-cli-args

988f2a1

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

mergify bot removed the needs-rebase label Jul 8, 2025

kouroshHakha added 2 commits July 8, 2025 10:30

revert changes to lora stuff

88f8a8f

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

revert test

65f1d43

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

kouroshHakha changed the title ~~[WIP][NOT READY] Refactor CLI Args for a better modular integration~~ [WIP][frontend] Refactor CLI Args for a better modular integration Jul 8, 2025

gemini-code-assist bot reviewed Jul 8, 2025

View reviewed changes

vllm/entrypoints/openai/cli_args.py Show resolved Hide resolved

vllm/entrypoints/openai/cli_args.py Outdated Show resolved Hide resolved

vllm/entrypoints/openai/cli_args.py Outdated Show resolved Hide resolved

wip

17e2efe

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

kouroshHakha added the ready ONLY add when PR is ready to merge/full CI is needed label Jul 8, 2025

kouroshHakha changed the title ~~[WIP][frontend] Refactor CLI Args for a better modular integration~~ [frontend] Refactor CLI Args for a better modular integration Jul 8, 2025

kouroshHakha marked this pull request as ready for review July 8, 2025 18:06

kouroshHakha requested a review from aarnphm as a code owner July 8, 2025 18:06

simon-mo requested a review from hmellor July 8, 2025 20:47

lk-chen mentioned this pull request Jul 12, 2025

[llm.serve] Add unit test for completions endpoint ray-project/ray#54570

Closed

8 tasks

Merge branch 'main' into kh/refactor-cli-args

59c39f9

vllm-bot merged commit f148c44 into vllm-project:main Jul 15, 2025
64 of 66 checks passed

hmellor mentioned this pull request Jul 15, 2025

Add full serve CLI reference back to docs #20978

Merged

mgoin mentioned this pull request Jul 15, 2025

[Frontend] Remove print left in FrontendArgs.add_cli_args #21004

Merged

22quinn mentioned this pull request Jul 19, 2025

[Bugfix][Frontend] Fix openai CLI arg middleware #21220

Merged

4 tasks

eicherseiji mentioned this pull request Jul 30, 2025

Bump vLLM to version 0.10.0 ray-project/ray#55067

Merged

8 tasks

x22x22 pushed a commit to x22x22/vllm that referenced this pull request Aug 5, 2025

[frontend] Refactor CLI Args for a better modular integration (vllm-p…

a70cf72

…roject#20206) Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com> Signed-off-by: x22x22 <wadeking@qq.com>

Pradyun92 pushed a commit to Pradyun92/vllm that referenced this pull request Aug 6, 2025

[frontend] Refactor CLI Args for a better modular integration (vllm-p…

816d91d

…roject#20206) Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

npanpaliya pushed a commit to odh-on-pz/vllm-upstream that referenced this pull request Aug 6, 2025

[frontend] Refactor CLI Args for a better modular integration (vllm-p…

3181b7e

…roject#20206) Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

paulpak58 pushed a commit to paulpak58/vllm that referenced this pull request Aug 13, 2025

[frontend] Refactor CLI Args for a better modular integration (vllm-p…

5297024

…roject#20206) Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com> Signed-off-by: Paul Pak <paulpak58@gmail.com>

epwalsh pushed a commit to epwalsh/vllm that referenced this pull request Aug 27, 2025

[frontend] Refactor CLI Args for a better modular integration (vllm-p…

aae9781

…roject#20206) Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

Uh oh!

[frontend] Refactor CLI Args for a better modular integration #20206

[frontend] Refactor CLI Args for a better modular integration #20206

Uh oh!

Conversation

kouroshHakha commented Jun 28, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Essential Elements of an Effective PR Description Checklist

Purpose

Test Plan

Test Result

(Optional) Documentation Update

Uh oh!

github-actions bot commented Jun 28, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Footnotes

Uh oh!

mergify bot commented Jun 28, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mergify bot commented Jul 2, 2025

Uh oh!

kouroshHakha commented Jul 8, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kouroshHakha commented Jul 14, 2025

Uh oh!

hmellor commented Jul 14, 2025

Uh oh!

hmellor commented Jul 15, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

kouroshHakha commented Jun 28, 2025 •

edited by github-actions bot

Loading