[Feature] Diffusion LoRA Adapter Support (PEFT compatible) for vLLM alignment by AndyZhou952 · Pull Request #758 · vllm-project/vllm-omni

AndyZhou952 · 2026-01-13T02:41:34Z

This is a joint work by @AndyZhou952 and @dongbo910220.

Design doc here.

Purpose:

Following Issue #281 and PR #657, this PR adds diffusion LoRA Adapter Support (PEFT compatible) for vLLM alignment.

This PR reuses the LoRA RPC logic from #657 (thanks to @dongbo910220's implementation), while utilizing the vLLM self-defined layer for LoRA support and PEFT format to be incorporated in the reinforcement training pipeline.

How vLLM adds LoRA support:

Three steps: (1) initialization; (2) per-request; (3) inference (via vLLM self-defined LoRA layers, calculations in forward).

vLLM-Omni PEFT LoRA integration Design:

Besides add_lora and remove_lora, we also support pin_lora and list_lora as public APIs to be consistent with the vLLM base behavior.

Design principles:

Preserve vLLM variable and function naming convention; try to keep the functionalities consistent with vLLM
Reuse helper functions from vLLM whenever possible, add minimum new code for diffusion-specific wrappers
Extensibility: easy to add support for more layer adapters and for multiple LoRA loading

Design choices:

Support PEFT LoRA format and reuse vLLM LoRA layers (to be consistent with/ vLLM behavior). We reuse PEFTHelper from vLLM, which will look for the file adapter_config.json when loading LoRAs.
DiffusionLoRAManager does not inherit from LoRAManager, since (1) LoRAManager is LLM-centric with redundant variables in __init__ for diffusion models; (2) diffusion component-based nature requires separate treatment;
Incorporate LRU cache management within DiffusionLoRAManager to keep things compact, also no need for a separate WorkerLoRAManager as vllm-omni has gpu_worker that does the job. Also, LRU cache management is kept within DiffusionLoRAManager.
In vLLM BaseLinearLayerWithLoRA, the calculation is done in self.punica_wrapper.add_linear_layer(). Note that punica_wrapper is used for multiple LoRA management. In most of the diffusion use cases, having one LoRA would be sufficient. One issue in the vLLM's BaseLinearLayerWithLoRA is that it is too closely tied to punica_wrapper, while the current implementation for punica_wrapper is not really suitable for the diffusion use case. As a temporary workaround, we define class DiffusionBaseLinearWithLoRA(BaseLinearLayerWithLoRA) and rewrite apply (where the LoRA calculation happens) to eliminate the dependency on punica_wrapper.

Functions/variables/classes reused from vLLM:

LoRARequest request structure
get_supported_lora_modules, get_adapter_absolute_path, PEFTHelper.from_local_dir, LoRAModel.from_local_checkpoint, LoRALayerWeights.optimize for scaling and use in _load_adapter.
LoRAConfig, from_layer, replace_submodule, BaseLayerWithLoRA.set_mapping in _replace_layers_with_lora to substitute with vLLM self-defined LoRA layers.

Current limitations:

Currently, can only load one LoRA adapter at a time for one batch

Test Plan:

Start server with SD3.5 and LoRA:
python -m vllm_omni.entrypoints.openai.api_server
--model stabilityai/stable-diffusion-3.5-medium
--lora-dirs /path/to/lora-test
Send request with LoRA:
curl -X POST http://localhost:8000/v1/images/generations
-H "Content-Type: application/json"
-d '{"prompt": "A whimsical hand-drawn animation still of a small countryside train station at sunset,
warm golden light, lush greenery, soft watercolor textures, highly detailed, sharp focus", "lora": {"name": "rafadan", "local_path": "/path/to/lora.safetensors", "scale": 1.0}}'

Test Result:

No LoRA	LoRA

1024x1024, steps=30, seed=42	1024x1024, steps=30, seed=42, lora scale = 1.0

Co-authored-by: dongbo910220

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing the test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user-facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

Signed-off-by: AndyZhou952 <[email protected]>

Signed-off-by: Andy Zhou <[email protected]>

Co-authored-by: dongbo910220 <[email protected]> Signed-off-by: AndyZhou952 <[email protected]>

Signed-off-by: AndyZhou952 <[email protected]>

knlnguyen1802 · 2026-01-13T05:59:48Z

@AndyZhou952 Thank you for the work.
I think for RLHF in verl, it need to support load and remove static lora (as after lora weight update it will remove old lora and add new lora weight).
So I think vllm-omni also need to support it.
I think current design only support deactivate dynamic lora.

AndyZhou952 · 2026-01-13T08:49:24Z

@AndyZhou952 Thank you for the work. I think for RLHF in verl, it need to support load and remove static lora (as after lora weight update it will remove old lora and add new lora weight). So I think vllm-omni also need to support it. I think current design only support deactivate dynamic lora.

Thanks for your interest! For this part I think we can keep it consistent with the base vLLM design to add add_lora and remove_lora in the public API.

Also I think it makes sense to unite the static/dynamic support. Essentially, static support means to load lora weights at the very start, while everything else remains the same as lora support.

Will update the design workflow and code base shortly to reflect the changes.

update 01/13 design & code updated to reflect the changes above. PTAL.

Signed-off-by: AndyZhou952 <[email protected]>

vllm_omni/entrypoints/async_omni_diffusion.py

Signed-off-by: dongbo910220 <[email protected]>

Fix diffusion weight index path for subfolders Signed-off-by: Andy Zhou <[email protected]>

Signed-off-by: dongbo910220 <[email protected]>

Signed-off-by: AndyZhou952 <[email protected]>

vllm_omni/diffusion/lora/manager.py

Signed-off-by: AndyZhou952 <[email protected]>

jeejeelee · 2026-01-14T08:40:45Z

vllm_omni/diffusion/worker/gpu_worker.py

        return self.pipeline.load_weights(weights)

+    def remove_lora(self, adapter_id: int) -> bool:
+        return self.lora_manager.remove_adapter(adapter_id) if self.lora_manager else False


Do we need to guarantee self.lora_manager is not None?

Not needed here & removed condition checks. Thanks

jeejeelee · 2026-01-14T09:01:18Z

vllm_omni/diffusion/lora/manager.py

+            max_num_batched_tokens=max_num_batched_tokens,
+            max_batches=1,  # single request
+            device=self.device,
+            max_loras=1,  # single lora


QQ:

Do SD models generally only have one concurrent LoRA?

Do we need to develop a new punicarapper for SD?

For diffusion, in most scenarios, it suffices to use only 1 LoRA. May leave that to future work

Since the punica wrapper is mostly used for multiple LoRA management, I think we can leave this out for this PR. Though one caveat is that punica_wrapper is too closely tied with BaseLinearLayerWithLoRA (handles LoRA calculation, and the reason why we init punica_wrapper in the first place) but punica_wrapper has quite a few LLM-specific components.

Please check this implementation (commit 955a2cf) and see if this makes sense. TL;DR we still inherit from BaseLinearLayerWithLoRA in DiffusionBaseLayerWithLoRA but rewrite the apply function to eliminate the need of using punica_wrapper. The current diffusion linear layer design inherits both DiffusionBaseLayerWithLoRA and the self-defined layers in vLLM.

For a better design, probably can decouple punica_wrapper and BaseLinearLayerWithLoRA. But this can work as a temporary solution for now. Let me know what you think. Thanks

SD users do sometimes stack multiple LoRAs. For this initial PR, we intentionally support a single active LoRA per diffusion execution (max_loras=1). We do allow multiple adapters to be cached on CPU and swapped per request. Also note that our current diffusion runner effectively executes one request per model execution (no cross-request batching), so max_loras=1 matches the existing execution semantics; when multiple requests are passed in one call, we require the same LoRARequest and lora_scale across the batch to avoid silently applying the wrong adapter. Multi‑LoRA composition (weighted mixing, per-sample different adapters, etc.) would require an explicit API for multiple adapters + weights and a more complex kernel/memory-management path, so we’d prefer to follow up in a separate PR if/when needed.

Re punica_wrapper: we don’t introduce a diffusion-specific punica wrapper here. We still inherit from vLLM’s BaseLinearLayerWithLoRA for weight/buffer management, but in DiffusionBaseLinearLayerWithLoRA we override apply() to compute the single‑LoRA delta via direct matmuls (same shrink+expand semantics as Punica) and handle packed projections per-slice (e.g. fused QKV), avoiding the LLM-specific dependencies in punica_wrapper.

jeejeelee · 2026-01-14T09:07:58Z

vllm_omni/diffusion/lora/manager.py

+
+        if static_lora_path is not None:
+            logger.info("Loading static LoRA from %s with scale %.2f", static_lora_path, static_lora_scale)
+            static_request = LoRARequest(


QQ： what's static_lora?

This is a follow-up from @SamitHuang's comment under #657 to support both static/dynamic LoRA. Static here means to load the LoRA during the init stage when providing the path in od_config.

I suppose now that this PR unites the processing flow of static/dynamic LoRA (all via LoRARequest), can probably update the variable naming here as well to avoid confusion.

This is a follow-up from @SamitHuang's comment under #657 to support both static/dynamic LoRA. Static here means to load the LoRA during the init stage when providing the path.

I suppose now that this PR unites the processing flow of static/dynamic LoRA (all via LoRARequest), can probably update the variable naming here as well to avoid confusion.

update 01/16: updated variable naming for clarity.

jeejeelee · 2026-01-14T09:09:56Z

vllm_omni/diffusion/lora/manager.py

+            lora_request: LoRARequest,
+    ) -> tuple[LoRAModel, PEFTHelper]:
+
+        supported_lora_modules = set(get_supported_lora_modules(self.pipeline))


Does SD have any special layers that need to support LoRA? Can get_supported_lora_modules in vLLM be used directly?

Can see _expand_expected_modules_for_merged_projections in L40 (called L163) to handle additional cases like add_kv_proj, to_qkv.

This PR has only been tested on SD. Might still need to investigate to see if we need to further expand to support other models.

Good question. We do reuse vLLM’s get_supported_lora_modules() as the baseline, but we snapshot it before injecting LoRA wrappers — after replacement the original LinearBase modules live under .base_layer, which would make the helper return base_layer and break adapter matching across reloads.

For diffusion/SD we also need to cover merged/packed projections (e.g. to_qkv, add_kv_proj), so we expand the expected module set via _expand_expected_modules_for_merged_projections() and treat packed projections as multi-slice when replacing/activating. This has been validated on SD; we can extend the expansion map as we add more pipelines.

Signed-off-by: AndyZhou952 <[email protected]>

zhtmike · 2026-01-15T04:58:34Z

vllm_omni/diffusion/lora/__init__.py

+# SPDX-License-Identifier: Apache-2.0
+# SPDX-FileCopyrightText: Copyright contributors to the vLLM project
+
+from vllm_omni.diffusion.lora.manager import DiffusionLoRAManager


perhaps to add LoRARequest in lora/request.py for external package import

done - thanks for the suggestion. This makes sense since LoRARequest is a user-facing class. Now can use from vllm_omni.lora.request import LoRARequest.

there are other scripts using the LoRARequest from vllm, such as input_prcessor.py, async_omni.py and serving_chat.py, may unify them as well

there are other scripts using the LoRARequest from vllm, such as input_prcessor.py, async_omni.py and serving_chat.py, may unify them as well

done - thanks for the observation

Also per discussion, added in-house LoRAConfig within vllm-omni as well.

in config/__init__.py, better add from vllm_omni.config.lora import LoRAConfig

in config/__init__.py, better add from vllm_omni.config.lora import LoRAConfig

done

looks nice :)

Signed-off-by: AndyZhou952 <[email protected]>

Peft lora wrapper Signed-off-by: Andy Zhou <[email protected]>

Signed-off-by: dongbo910220 <[email protected]>

Signed-off-by: AndyZhou952 <[email protected]>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 950e388e96

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

vllm_omni/diffusion/lora/layers/base_linear.py

AndyZhou952 · 2026-01-23T07:32:01Z

PR is ready @SamitHuang, thanks :-)

ZJY0516 · 2026-01-25T09:18:55Z

tests/diffusion/attention/test_flash_attn.py

 import pytest
 import torch

+pytest.importorskip("flash_attn", reason="flash_attn is not installed")


Remove this. I'll enable this test later.

ZJY0516 · 2026-01-25T09:20:42Z

tests/diffusion/lora/test_lora_manager.py

@@ -0,0 +1,376 @@
+# SPDX-License-Identifier: Apache-2.0
+# SPDX-FileCopyrightText: Copyright contributors to the vLLM project


If this test only needs cpu, please add this to https://github.com/vllm-project/vllm-omni/blob/main/.buildkite/scripts/simple_test.sh

Good suggestion. Added

ZJY0516 · 2026-01-25T09:27:49Z

vllm_omni/diffusion/lora/layers/base_linear.py

@@ -0,0 +1,152 @@
+# SPDX-License-Identifier: Apache-2.0


Could you add a test for this file?

Added unit tests in tests/diffusion/lora/test_base_linear.py (multi-slice apply, reset fast-path to skip matmuls when inactive, and inactive-slice behavior).

ZJY0516 · 2026-01-25T09:28:17Z

vllm_omni/diffusion/lora/utils.py

+    # Known packed projections: accept their separate counterparts.
+    packed_expansions: dict[str, list[str]] = {
+        # diffusion: fused QKV
+        "to_qkv": ["to_q", "to_k", "to_v"],


Why we need these hard coding here?

The hard-coded packed→submodule expansion is needed to support diffusion models with fused projections (e.g. to_qkv, w13) while many PEFT/diffusers LoRA checkpoints are saved against the logical sub-projections (e.g. to_q/to_k/to_v, w1/w3).
We pass expected_lora_modules into LoRAModel.from_local_checkpoint to filter loaded weights; without expanding these names, those submodule keys would be dropped at load time and the LoRA would never be applied. The mapping is intentionally small and only takes effect when the packed module exists in the model, so the impact is contained.

My concern is that this solution won't scale well or remain transparent when we encounter a new packed layer in a future model.

I refactored this to follow vLLM’s packed_modules_mapping pattern: the packed→sublayer mapping now lives with each diffusion transformer implementation (e.g. to_qkv -> [to_q, to_k, to_v], add_kv_proj -> [...], w13 -> [w1, w3]), instead of being hard-coded in the LoRA framework. DiffusionLoRAManager collects packed_modules_mapping from the pipeline modules at init and uses it to:

expand expected_lora_modules so LoRA keys saved against sub-projections are not dropped at load time, and

map per-sublayer LoRA weights onto packed LoRA layers during target-module matching.

This makes new packed layers explicit and transparent: adding support is done next to the model code (similar to how we already maintain stacked_params_mapping in load_weights()),without changing LoRA core logic.

Signed-off-by: dongbo910220 <[email protected]>

SamitHuang · 2026-01-26T01:55:59Z

@AndyZhou952 Final question, can we also add test results on QwenImageLightning? since it's a typical timestep distilled lora model and is important in Q1 roadmap. https://huggingface.co/lightx2v/Qwen-Image-Lightning

AndyZhou952 · 2026-01-26T06:38:16Z

@AndyZhou952 Final question, can we also add test results on QwenImageLightning? since it's a typical timestep distilled lora model and is important in Q1 roadmap. https://huggingface.co/lightx2v/Qwen-Image-Lightning

Currently, the LoRA support in vLLM-Omni only supports peft loading (with adapter_config.json). The loading for Qwen-Image-Lightning (distilled style LoRA) is not quite straightforward based on the current design. This may require manual inspection of the safetensors on the fly, and the exact way to implement this requires further inspection.

I think we can consider making this available in a separate PR if such support is needed (quite a bit of refactoring may be needed). We keep the behavior consistent with base vLLM for this PR for now.

SamitHuang · 2026-01-26T08:11:01Z

@AndyZhou952 Final question, can we also add test results on QwenImageLightning? since it's a typical timestep distilled lora model and is important in Q1 roadmap. https://huggingface.co/lightx2v/Qwen-Image-Lightning

Currently, the LoRA support in vLLM-Omni only supports peft loading (with adapter_config.json). The loading for Qwen-Image-Lightning (distilled style LoRA) is not quite straightforward based on the current design. This may require manual inspection of the safetensors on the fly, and the exact way to implement this requires further inspection.

I think we can consider making this available in a separate PR if such support is needed (quite a bit of refactoring may be needed). We keep the behavior consistent with base vLLM for this PR for now.

Okay, let's support it in the next PR.

Signed-off-by: dongbo910220 <[email protected]>

david6666666 · 2026-01-27T02:22:15Z

LTGM, Thanks for the contribution

ZJY0516 · 2026-01-27T02:31:05Z

tests/diffusion/attention/test_flash_attn.py

-# SPDX-License-Identifier: Apache-2.0
-# SPDX-FileCopyrightText: Copyright contributors to the vLLM project
-
-"""


The intention is to delete only the newly added line, as opposed to removing the entire file.

Please see #972 for the reversion

ZJY0516 · 2026-01-27T02:35:05Z

vllm_omni/diffusion/models/qwen_image/qwen_image_transformer.py

    # -- typically a transformer layer
    # used for torch compile optimizations
    _repeated_blocks = ["QwenImageTransformerBlock"]
+    packed_modules_mapping = {


I don't think it's a good idea to put it here, because we also have something like this in load_weights function for every model

I'll open a new PR to do it.

…lignment (vllm-project#758) Signed-off-by: AndyZhou952 <[email protected]> Signed-off-by: Andy Zhou <[email protected]> Signed-off-by: dongbo910220 <[email protected]> Signed-off-by: Andy Zhou <[email protected]> Signed-off-by: Samit <[email protected]> Co-authored-by: dongbo910220 <[email protected]> Co-authored-by: Samit <[email protected]> Signed-off-by: jzz <[email protected]>

AndyZhou952 added 5 commits January 12, 2026 14:25

peft lora support

2572e82

Signed-off-by: AndyZhou952 <[email protected]>

add logging

2387d27

Signed-off-by: AndyZhou952 <[email protected]>

fix add_kv_proj, static load

4a9a0b1

Signed-off-by: AndyZhou952 <[email protected]>

Merge branch 'vllm-project:main' into peft_lora

cf2890a

Signed-off-by: Andy Zhou <[email protected]>

fix evict_if_needed

4f60ab0

Co-authored-by: dongbo910220 <[email protected]> Signed-off-by: AndyZhou952 <[email protected]>

AndyZhou952 mentioned this pull request Jan 13, 2026

Add diffusion LoRA request path and worker cache #657

Closed

5 tasks

fix

146cca4

Signed-off-by: AndyZhou952 <[email protected]>

david6666666 mentioned this pull request Jan 13, 2026

[RFC]: vLLM-Omni 2026 Q1 Roadmap #677

Open

41 tasks

add_lora/remove_lora apis, unite static/dynamic loading

168e507

Signed-off-by: AndyZhou952 <[email protected]>

zhtmike reviewed Jan 13, 2026

View reviewed changes

vllm_omni/entrypoints/async_omni_diffusion.py Show resolved Hide resolved

dongbo910220 and others added 4 commits January 13, 2026 19:38

Fix diffusion weight index path for subfolders

1658fe6

Signed-off-by: dongbo910220 <[email protected]>

Merge pull request #1 from dongbo910220/peft_lora

4dc5db7

Fix diffusion weight index path for subfolders Signed-off-by: Andy Zhou <[email protected]>

Add LoRA list/pin APIs for diffusion

ea49d01

Signed-off-by: dongbo910220 <[email protected]>

add_adapter renaming

898018e

Signed-off-by: AndyZhou952 <[email protected]>

knlnguyen1802 reviewed Jan 14, 2026

View reviewed changes

vllm_omni/diffusion/lora/manager.py Outdated Show resolved Hide resolved

fix typo

05b7743

Signed-off-by: AndyZhou952 <[email protected]>

SamitHuang mentioned this pull request Jan 14, 2026

[RFC]: Reinforcement learning support on vllm-omni #778

Open

1 task

SamitHuang added the RL Related to Reinforcement Learning label Jan 14, 2026

jeejeelee reviewed Jan 14, 2026

View reviewed changes

offline example

62732e8

Signed-off-by: AndyZhou952 <[email protected]>

zhtmike reviewed Jan 15, 2026

View reviewed changes

AndyZhou952 added 7 commits January 15, 2026 14:54

simplify logic, vllm_omni lora; README

aea5376

Signed-off-by: AndyZhou952 <[email protected]>

fix - single lora attempt w/ punica_wrapper

d0abb9e

Signed-off-by: AndyZhou952 <[email protected]>

fix naming

e2c6db1

Signed-off-by: AndyZhou952 <[email protected]>

fix dim

05e1e52

Signed-off-by: AndyZhou952 <[email protected]>

diffusion self-defined layers

6c01e51

Signed-off-by: AndyZhou952 <[email protected]>

rearrange utils'

f701e27

Signed-off-by: AndyZhou952 <[email protected]>

Merge pull request #2 from AndyZhou952/peft_lora_wrapper

ba1bb2d

Peft lora wrapper Signed-off-by: Andy Zhou <[email protected]>

dongbo910220 added 3 commits January 23, 2026 02:47

cleanup: align vLLM imports with origin/main

fa8c2f3

Signed-off-by: dongbo910220 <[email protected]>

diffusion/lora: import LoRAModel from vllm 0.14

e668f33

Signed-off-by: dongbo910220 <[email protected]>

tests/e2e: don't blanket-skip diffusion LoRA on ROCm

1c4a9da

Signed-off-by: dongbo910220 <[email protected]>

david6666666 added this to the v0.14.0 milestone Jan 23, 2026

david6666666 added the high priority high priority issue, needs to be done asap label Jan 23, 2026

LoRARequest import consistency from vllm_omni

950e388

Signed-off-by: AndyZhou952 <[email protected]>

AndyZhou952 marked this pull request as ready for review January 23, 2026 07:15

chatgpt-codex-connector bot reviewed Jan 23, 2026

View reviewed changes

vllm_omni/diffusion/lora/layers/base_linear.py Show resolved Hide resolved

ZJY0516 requested a review from jeejeelee January 25, 2026 09:15

ZJY0516 reviewed Jan 25, 2026

View reviewed changes

dongbo910220 added 4 commits January 26, 2026 00:39

tests: add diffusion LoRA unit coverage

ffed231

Signed-off-by: dongbo910220 <[email protected]>

tests: move diffusion LoRA tests under diffusion/lora

4b02161

Signed-off-by: dongbo910220 <[email protected]>

tests: reorganize diffusion LoRA unit tests

e883f97

Signed-off-by: dongbo910220 <[email protected]>

tests: fix pre-commit formatting

de80a98

Signed-off-by: dongbo910220 <[email protected]>

Merge branch 'vllm-project:main' into peft_lora

2411d54

SamitHuang added the ready label to trigger buildkite CI label Jan 26, 2026

SamitHuang approved these changes Jan 26, 2026

View reviewed changes

dongbo910220 added 2 commits January 26, 2026 23:45

diffusion/lora: source packed mapping from models

2e4a153

Signed-off-by: dongbo910220 <[email protected]>

tests: reduce flakiness in images LoRA e2e

0646f51

Signed-off-by: dongbo910220 <[email protected]>

david6666666 approved these changes Jan 27, 2026

View reviewed changes

david6666666 merged commit 5037af1 into vllm-project:main Jan 27, 2026
7 checks passed

ZJY0516 reviewed Jan 27, 2026

View reviewed changes

AndyZhou952 mentioned this pull request Jan 27, 2026

[test] revert test flash attn file #972

Merged

5 tasks

		@@ -0,0 +1,376 @@
		# SPDX-License-Identifier: Apache-2.0
		# SPDX-FileCopyrightText: Copyright contributors to the vLLM project

Comments

Conversation

AndyZhou952 commented Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose:

How vLLM adds LoRA support:

vLLM-Omni PEFT LoRA integration Design:

Design principles:

Design choices:

Functions/variables/classes reused from vLLM:

Current limitations:

Test Plan:

Test Result:

Uh oh!

knlnguyen1802 commented Jan 13, 2026

Uh oh!

AndyZhou952 commented Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dongbo910220 Jan 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AndyZhou952 Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zhtmike Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

AndyZhou952 commented Jan 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

AndyZhou952 commented Jan 13, 2026 •

edited

Loading

AndyZhou952 commented Jan 13, 2026 •

edited

Loading

dongbo910220 Jan 18, 2026 •

edited

Loading

AndyZhou952 Jan 14, 2026 •

edited

Loading

zhtmike Jan 15, 2026 •

edited

Loading

AndyZhou952 commented Jan 23, 2026 •

edited

Loading