Skip to content

[Core]: KV Cache Transfer Encapsulation#979

Merged
hsliuustc0106 merged 20 commits intovllm-project:mainfrom
princepride:kv-cache-transfer-encapsulation
Jan 28, 2026
Merged

[Core]: KV Cache Transfer Encapsulation#979
hsliuustc0106 merged 20 commits intovllm-project:mainfrom
princepride:kv-cache-transfer-encapsulation

Conversation

@princepride
Copy link
Collaborator

@princepride princepride commented Jan 27, 2026

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

Related: #944
Refactor the KV cache transfer logic by extracting duplicated code from GPUARModelRunner and GPUDiffusionModelRunner into a unified OmniKVTransferManager class.

Test Plan

Unit Test

pytest -o "addopts=" tests/distributed/omni_connectors/test_kv_flow.py

End2End Test

python3 examples/offline_inference/bagel/end2end.py --prompts "A cute cat" --modality text2img
mooncake_master \
  --rpc_port=50051 \
  --enable_http_metadata_server=true \
  --http_metadata_server_host=0.0.0.0 \
  --http_metadata_server_port=8080 \
  --metrics_port=9003

# vllm-omni server

python3 examples/offline_inference/bagel/end2end.py --prompts "A cute cat" --modality text2img --stage-configs-path vllm_omni/model_executor/stage_configs/bagel_multiconnector.yaml

Test Result

========================================================================================= test session starts =========================================================================================
platform linux -- Python 3.13.11, pytest-9.0.2, pluggy-1.6.0
rootdir: /proj-tango-pvc/users/zhipeng.wang/workspace/vllm-omni
configfile: pyproject.toml
plugins: forked-1.6.0, rerunfailures-16.1, shard-0.1.2, timeout-2.4.0, anyio-4.12.1, asyncio-1.3.0
asyncio: mode=Mode.STRICT, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
collected 3 items                                                                                                                                                                                     
Running 3 items in this shard

tests/distributed/omni_connectors/test_kv_flow.py ...                                                                                                                                           [100%]

========================================================================================== warnings summary ===========================================================================================
<frozen importlib._bootstrap>:488
  <frozen importlib._bootstrap>:488: DeprecationWarning: builtin type SwigPyPacked has no __module__ attribute

<frozen importlib._bootstrap>:488
  <frozen importlib._bootstrap>:488: DeprecationWarning: builtin type SwigPyObject has no __module__ attribute

tests/distributed/omni_connectors/test_kv_flow.py::TestKVFlow::test_manager_extraction
  /root/.local/share/uv/python/cpython-3.13.11-linux-x86_64-gnu/lib/python3.13/unittest/case.py:707: DeprecationWarning: It is deprecated to return a value that is not None from a test case (<bound method TestKVFlow.test_manager_extraction of <test_kv_flow.TestKVFlow testMethod=test_manager_extraction>>)
    return self.run(*args, **kwds)

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=================================================================================== 3 passed, 3 warnings in 13.25s ====================================================================================

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: fe624efe17

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Signed-off-by: princepride <[email protected]>
Signed-off-by: princepride <[email protected]>
Signed-off-by: princepride <[email protected]>
Signed-off-by: princepride <[email protected]>
@princepride princepride force-pushed the kv-cache-transfer-encapsulation branch from 0547aa2 to 6a12095 Compare January 27, 2026 10:54
Signed-off-by: princepride <[email protected]>
Signed-off-by: princepride <[email protected]>
@princepride
Copy link
Collaborator Author

@natureofnature @tzhouam PTAL.

@princepride princepride mentioned this pull request Jan 27, 2026
14 tasks
@ZJY0516
Copy link
Collaborator

ZJY0516 commented Jan 27, 2026

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c1c0e2dec6

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Refactors Omni KV-cache transfer by centralizing duplicated send/receive + connector lifecycle logic into a single OmniKVTransferManager, and updates AR/diffusion runners to delegate to it.

Changes:

  • Added OmniKVTransferManager to encapsulate connector creation, KV extraction, send-with-retry, and receive-with-timeout.
  • Updated GPUARModelRunner and GPUDiffusionModelRunner to use the new manager instead of inlined logic.
  • Adjusted SHM connector API for compatibility and updated KV-flow unit tests to validate the new manager.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
vllm_omni/worker/gpu_ar_model_runner.py Replaces inlined sender-side KV transfer with manager calls and removes duplicated methods.
vllm_omni/distributed/omni_connectors/kv_transfer_manager.py Introduces the unified manager and shared KV transfer data container/config.
vllm_omni/distributed/omni_connectors/connectors/shm_connector.py Extends put/get to accept request_id for compatibility with manager usage.
vllm_omni/diffusion/worker/gpu_diffusion_model_runner.py Replaces receiver-side KV polling/injection with manager calls.
tests/distributed/omni_connectors/test_kv_flow.py Updates tests to validate extraction/send/receive flows via the new manager.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@princepride
Copy link
Collaborator Author

@tzhouam @hsliuustc0106 PTAL😊

Signed-off-by: princepride <[email protected]>
@david6666666 david6666666 linked an issue Jan 28, 2026 that may be closed by this pull request
1 task
@princepride
Copy link
Collaborator Author

@congw729 I want also add Bagel e2e pytest in this pr, what do you think?

Signed-off-by: princepride <[email protected]>
Signed-off-by: princepride <[email protected]>
Signed-off-by: princepride <[email protected]>
@hsliuustc0106 hsliuustc0106 added the ready label to trigger buildkite CI label Jan 28, 2026
Signed-off-by: princepride <[email protected]>
Signed-off-by: princepride <[email protected]>
Signed-off-by: princepride <[email protected]>
@princepride
Copy link
Collaborator Author

@hsliuustc0106 Ready to merge.

Copy link
Collaborator

@hsliuustc0106 hsliuustc0106 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@hsliuustc0106 hsliuustc0106 merged commit 741f7e2 into vllm-project:main Jan 28, 2026
7 checks passed
@gcanlin gcanlin mentioned this pull request Jan 30, 2026
5 tasks
dongbo910220 pushed a commit to dongbo910220/vllm-omni that referenced this pull request Feb 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready label to trigger buildkite CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Bagel text_to_image error related to connector messaging

5 participants