Skip to content
Merged
Show file tree
Hide file tree
Changes from 127 commits
Commits
Show all changes
129 commits
Select commit Hold shift + click to select a range
4730522
[Update] LMcache connector v1 implementation
ApostaC Apr 17, 2025
4162650
[Add] examples for disaggregated prefill
ApostaC Apr 17, 2025
3ccd34c
[add] extra information about evns
ApostaC Apr 18, 2025
161010c
Initial stubs for P/D scheduling changes
tlrmchlsmth Apr 18, 2025
38a2eb8
Merge branch 'main' into local-dev/lmcache-v1-connector-pr
tlrmchlsmth Apr 19, 2025
6c3191f
Merge branch 'local-dev/lmcache-v1-connector-pr' into pd_scheduling_l…
tlrmchlsmth Apr 19, 2025
1f708e9
Updates
tlrmchlsmth Apr 19, 2025
038f2f8
Rs branch (#3)
robertgshaw2-redhat Apr 20, 2025
5c4fc6f
Rs branch (#5)
robertgshaw2-redhat Apr 20, 2025
1800689
Remove Unneeded Arguments (#7)
robertgshaw2-redhat Apr 21, 2025
7a1f25f
Improve disagg-example.sh (#8)
tlrmchlsmth Apr 21, 2025
2385d8e
updated
robertgshaw2-redhat Apr 22, 2025
6eeb47c
updated
robertgshaw2-redhat Apr 22, 2025
266fcee
updated
robertgshaw2-redhat Apr 22, 2025
f7e16f1
updated
robertgshaw2-redhat Apr 22, 2025
f591b8e
added connector
robertgshaw2-redhat Apr 22, 2025
184d0b6
updated
robertgshaw2-redhat Apr 22, 2025
d4a9e5b
updated
robertgshaw2-redhat Apr 22, 2025
4b0d1dc
updated
robertgshaw2-redhat Apr 22, 2025
bfef039
updated
robertgshaw2-redhat Apr 22, 2025
54f4a43
updated
robertgshaw2-redhat Apr 22, 2025
e604b09
updated
robertgshaw2-redhat Apr 22, 2025
2fc00ad
updated
robertgshaw2-redhat Apr 22, 2025
e5967b6
updated
robertgshaw2-redhat Apr 22, 2025
f1bc0f7
updated
robertgshaw2-redhat Apr 22, 2025
1cea2bb
updated
robertgshaw2-redhat Apr 22, 2025
489e4c0
updated
robertgshaw2-redhat Apr 22, 2025
437ac91
updated
robertgshaw2-redhat Apr 22, 2025
ea47af7
updated
robertgshaw2-redhat Apr 22, 2025
554b27d
updated
robertgshaw2-redhat Apr 22, 2025
1aea5ba
updated
robertgshaw2-redhat Apr 22, 2025
e0c112b
updated
robertgshaw2-redhat Apr 22, 2025
c7717c1
update
robertgshaw2-redhat Apr 22, 2025
e0af1db
remove
robertgshaw2-redhat Apr 22, 2025
9533471
updated
robertgshaw2-redhat Apr 22, 2025
2eb068e
updated
robertgshaw2-redhat Apr 22, 2025
0f2b7e3
updated
robertgshaw2-redhat Apr 22, 2025
6127cb8
updated
robertgshaw2-redhat Apr 22, 2025
568249e
updated
robertgshaw2-redhat Apr 23, 2025
ccb44ea
seems to load properly
robertgshaw2-redhat Apr 23, 2025
3785905
updated
robertgshaw2-redhat Apr 23, 2025
8a94b2e
updated
robertgshaw2-redhat Apr 24, 2025
ac19437
updated
robertgshaw2-redhat Apr 24, 2025
6391ec9
updated
robertgshaw2-redhat Apr 24, 2025
7dd764b
updated
robertgshaw2-redhat Apr 24, 2025
97316d9
updated
robertgshaw2-redhat Apr 24, 2025
2771353
Revert "updated"
robertgshaw2-redhat Apr 24, 2025
baed1bf
updated
robertgshaw2-redhat Apr 24, 2025
d0ad6d9
updated
robertgshaw2-redhat Apr 24, 2025
055885e
updated
robertgshaw2-redhat Apr 24, 2025
5ed3806
updated
robertgshaw2-redhat Apr 24, 2025
58266b5
updated
robertgshaw2-redhat Apr 24, 2025
344d9da
stash
robertgshaw2-redhat Apr 24, 2025
2996638
added
robertgshaw2-redhat Apr 24, 2025
bcc88dc
diffs for local dev on macos
Apr 24, 2025
62205ae
updated
Apr 24, 2025
b4609a5
update
Apr 24, 2025
5d78ba6
updaed
Apr 25, 2025
c1f26b9
updated
Apr 25, 2025
9b9ef36
updated
Apr 25, 2025
c60639e
Checkpoint.
tlrmchlsmth Apr 25, 2025
006dda3
Merge branch 'pd_scheduling_nixl' of https://github.com/robertgshaw2-…
tlrmchlsmth Apr 25, 2025
c5e023e
updated
Apr 25, 2025
8b0c93c
Cleanup
tlrmchlsmth Apr 26, 2025
5e45d90
WIP
tlrmchlsmth Apr 26, 2025
20a5491
updated
Apr 27, 2025
cee3c61
updated
Apr 27, 2025
5972571
updated on scheduler side
Apr 27, 2025
1b69d33
updated
Apr 27, 2025
74e105a
Merge remote-tracking branch 'rs/pd_scheduling_rob_dev' into nixl_int…
tlrmchlsmth Apr 27, 2025
8adf1ad
updated
Apr 27, 2025
21ab3d9
updated
Apr 27, 2025
3a27bbc
updated
Apr 27, 2025
f252df9
updated
Apr 27, 2025
8104803
updated
Apr 27, 2025
10bbe21
Hacking away
tlrmchlsmth Apr 27, 2025
a14278c
Merge remote-tracking branch 'rs/pd_scheduling_rob_dev_2' into nixl_i…
tlrmchlsmth Apr 27, 2025
65ea91f
cleanup
Apr 27, 2025
f2550ef
ensure request removed from running list
Apr 27, 2025
985bac3
Runs E2E. Garbage output. Crashes on 2nd request
tlrmchlsmth Apr 27, 2025
bf37a7d
update
tlrmchlsmth Apr 27, 2025
ebe1263
updated
Apr 27, 2025
a008aa3
updated
Apr 27, 2025
195dceb
rename files
Apr 27, 2025
e2cc365
updated
Apr 27, 2025
2324a50
Merge remote-tracking branch 'rs/pd_scheduling_rob_dev_2' into nixl_i…
tlrmchlsmth Apr 27, 2025
b4b64fe
updated
Apr 27, 2025
6686397
updated
Apr 27, 2025
8736043
updated
Apr 27, 2025
dcbf6e5
updated
Apr 27, 2025
7c8e21a
update
Apr 27, 2025
a4855d2
Second request no longer crashes
tlrmchlsmth Apr 27, 2025
0914040
Merge remote-tracking branch 'rs/pd_scheduling_rob_dev_2' into nixl_i…
tlrmchlsmth Apr 27, 2025
c5b3053
Remove gpu_model_runner hacks
tlrmchlsmth Apr 27, 2025
7502819
Clean up Justfile
tlrmchlsmth Apr 28, 2025
7768b96
[Bugfix] Stale finished requests in EMPTY_MODEL_RUNNER_OUTPUT
tlrmchlsmth Apr 28, 2025
a5950b7
update
tlrmchlsmth Apr 28, 2025
610a357
justfile edits
tlrmchlsmth Apr 28, 2025
5b026ab
Update
tlrmchlsmth Apr 28, 2025
f2fadd6
Fixes - lm_eval gsm8k has correctness
tlrmchlsmth Apr 29, 2025
4060f86
"just delete the assert"
tlrmchlsmth Apr 29, 2025
bfe9d19
fixup precommit issues
tlrmchlsmth Apr 29, 2025
ced529a
Fixes
tlrmchlsmth Apr 29, 2025
83f2872
updated (#12)
robertgshaw2-redhat Apr 30, 2025
e853b3c
Add Accuracy Test (#13)
robertgshaw2-redhat Apr 30, 2025
1c45ed1
Preemption Bugfixes (#15)
robertgshaw2-redhat May 1, 2025
a45a694
updated (#16)
robertgshaw2-redhat May 1, 2025
f6d0ac5
Merge branch 'main' into nixl_integration
tlrmchlsmth May 1, 2025
2f9a3f3
Fix Bad Merge | Fix Memory Leak in Upstream (#18)
robertgshaw2-redhat May 2, 2025
113527b
clean up justfile, examples
tlrmchlsmth May 2, 2025
b3733e8
Merge branch 'nixl_integration' of https://github.com/tlrmchlsmth/vll…
tlrmchlsmth May 2, 2025
5f8b280
more cleanup
tlrmchlsmth May 2, 2025
79e7b2a
more cleanup
tlrmchlsmth May 2, 2025
e8ab678
more cleanup
tlrmchlsmth May 2, 2025
969daa9
more cleanup
tlrmchlsmth May 2, 2025
f9a3f3a
More cleanup
tlrmchlsmth May 2, 2025
aec447c
more cleanup
tlrmchlsmth May 2, 2025
ac68a75
more cleanup, precommit fixes
tlrmchlsmth May 2, 2025
e72245b
More cleanup
tlrmchlsmth May 2, 2025
cd2aa72
run_accuracy_test.sh UX
tlrmchlsmth May 2, 2025
10183d5
squash warnings
tlrmchlsmth May 2, 2025
9eb9787
pre-commit
tlrmchlsmth May 2, 2025
aeef78b
update
tlrmchlsmth May 2, 2025
5306d5b
Add get_finished to base kv connector
mgoin May 2, 2025
a16f2be
revert test.txt
tlrmchlsmth May 2, 2025
445b010
cleanup
tlrmchlsmth May 2, 2025
14a4c64
Cleanup
tlrmchlsmth May 2, 2025
fc7d8ad
review comments
tlrmchlsmth May 3, 2025
82c8679
Review comments, merge latest main
tlrmchlsmth May 3, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Empty file.
45 changes: 45 additions & 0 deletions tests/v1/kv_connector/run_accuracy_test.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
#!/bin/bash

set -xe

# Model to run.
MODEL_NAME=Qwen/Qwen3-0.6B

# Find the git repository root directory
GIT_ROOT=$(git rev-parse --show-toplevel)

# Trap the SIGINT signal (triggered by Ctrl+C)
trap 'kill $(jobs -pr)' SIGINT SIGTERM EXIT

# Waits for vLLM to start.
wait_for_server() {
local port=$1
timeout 1200 bash -c "
until curl -s localhost:${port}/v1/completions > /dev/null; do
sleep 1
done" && return 0 || return 1
}

# Prefill instance.
CUDA_VISIBLE_DEVICES=0 NIXL_ROLE="SENDER" vllm serve $MODEL_NAME \
--port 8100 \
--enforce-eager \
--disable-log-requests \
--kv-transfer-config '{"kv_connector":"NixlConnector","kv_role":"kv_both"}' &

# Decode instance.
CUDA_VISIBLE_DEVICES=1 NIXL_ROLE="RECVER" vllm serve $MODEL_NAME \
--port 8200 \
--enforce-eager \
--disable-log-requests \
--kv-transfer-config '{"kv_connector":"NixlConnector","kv_role":"kv_both"}' &

# wait until prefill and decode instances are ready
wait_for_server 8100
wait_for_server 8200

# Proxy server.
python ${GIT_ROOT}/tests/v1/kv_connector/toy_proxy_server.py --port 8192 &

# Run lm eval.
python -m pytest -s -x ${GIT_ROOT}/tests/v1/kv_connector/test_accuracy.py
28 changes: 28 additions & 0 deletions tests/v1/kv_connector/test_accuracy.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# SPDX-License-Identifier: Apache-2.0
import lm_eval

MODEL_NAME = "Qwen/Qwen3-0.6B"
NUM_CONCURRENT = 100
TASK = "gsm8k"
FILTER = "exact_match,strict-match"
RTOL = 0.03
EXPECTED_VALUE = 0.41


def test_accuracy():
"""Run the end to end accuracy test."""

model_args = (f"model={MODEL_NAME},"
f"base_url=http://localhost:8192/v1/completions,"
f"num_concurrent={NUM_CONCURRENT},tokenized_requests=False")

results = lm_eval.simple_evaluate(
model="local-completions",
model_args=model_args,
tasks=TASK,
)

measured_value = results["results"][TASK][FILTER]
assert (measured_value - RTOL < EXPECTED_VALUE
and measured_value + RTOL > EXPECTED_VALUE
), f"Expected: {EXPECTED_VALUE} | Measured: {measured_value}"
39 changes: 39 additions & 0 deletions tests/v1/kv_connector/test_nixl_connector.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
# SPDX-License-Identifier: Apache-2.0

from vllm.distributed.kv_transfer.kv_connector.v1.nixl_connector import (
NixlConnectorMetadata)

from .utils import create_request, create_scheduler, create_vllm_config


def test_scheduler_worker_inferface():

vllm_config = create_vllm_config()
scheduler = create_scheduler(vllm_config)

# 2 Full Blocks and 1 Half Block.
BLOCK_SIZE = vllm_config.cache_config.block_size
NUM_EXTERNAL_FULL_BLOCKS = 2
NUM_TOKENS = int(BLOCK_SIZE * (NUM_EXTERNAL_FULL_BLOCKS + 0.5))

request = create_request(request_id=1,
num_tokens=NUM_TOKENS,
do_remote_prefill=True)
request_id = request.request_id

scheduler.add_request(request)

# Remote Prefill, triggers NixlConnectorMetdata.
scheduler_output = scheduler.schedule()
kv_connector_metadata = scheduler_output.kv_connector_metadata
assert kv_connector_metadata is not None
assert isinstance(kv_connector_metadata, NixlConnectorMetadata)

assert len(kv_connector_metadata.requests) == 1
assert request_id in kv_connector_metadata.requests
req_meta = kv_connector_metadata.requests[request_id]

for block_id, block in zip(
req_meta.local_block_ids,
scheduler.kv_cache_manager.req_to_blocks[request_id]):
assert block_id == block.block_id
92 changes: 92 additions & 0 deletions tests/v1/kv_connector/test_remote_decode_lifecycle.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
# SPDX-License-Identifier: Apache-2.0
import copy

from vllm.v1.outputs import EMPTY_MODEL_RUNNER_OUTPUT
from vllm.v1.request import FinishReason, RequestStatus

from .utils import (assert_scheduler_empty, create_model_runner_output,
create_request, create_scheduler, create_vllm_config)


def test_basic_lifecycle():
"""Test lifecycle of a Remote Decode request."""

vllm_config = create_vllm_config()
scheduler = create_scheduler(vllm_config)

# 2 Full Blocks and 1 Half Block.
BLOCK_SIZE = vllm_config.cache_config.block_size
NUM_EXTERNAL_FULL_BLOCKS = 2
NUM_TOKENS = int(BLOCK_SIZE * (NUM_EXTERNAL_FULL_BLOCKS + 0.5))

request = create_request(request_id=1,
num_tokens=NUM_TOKENS,
do_remote_decode=True)

scheduler.add_request(request)
request_id = request.request_id

# STEP (1): Prefill.
# (1a): schedule()
scheduler_output = scheduler.schedule()
assert len(scheduler.running) == 1
assert len(scheduler_output.scheduled_new_reqs) == 1

# (1b): execute_model()
model_runner_output = create_model_runner_output(reqs=[request])

# (1c): update_from_output()
engine_core_outputs = scheduler.update_from_output(scheduler_output,
model_runner_output)

# Ensure the request is finished after 1 tokens.
assert request.is_finished()
assert request.status == RequestStatus.FINISHED_REMOTE_DECODE
output = engine_core_outputs.outputs[0]
assert output.finish_reason == FinishReason.REMOTE_DECODE
assert output.kv_transfer_params is not None

# Request freed in Scheduler and in Persistent Batch ...
assert request_id in scheduler.finished_req_ids
assert len(scheduler.running) == 0
assert len(scheduler.waiting) == 0

# ... but blocks should not be freed.
blocks = scheduler.kv_cache_manager.req_to_blocks[request_id]
for block in blocks:
assert block.ref_cnt == 1

# STEP (2): Send Finished to PB.
# (2a): schedule() - pass finished request to PB.
scheduler_output = scheduler.schedule()
assert len(scheduler.running) == 0
assert len(scheduler_output.finished_req_ids) == 1
assert request_id in scheduler_output.finished_req_ids
assert len(scheduler_output.scheduled_new_reqs) == 0
assert len(scheduler_output.scheduled_cached_reqs) == 0
assert len(scheduler.finished_req_ids) == 0

# (2b): execute_model()
model_runner_output = EMPTY_MODEL_RUNNER_OUTPUT

# (2c): update_from_output()
scheduler.update_from_output(scheduler_output, model_runner_output)

# STEP (3): Finished sending.
# (3a): schedule() - pass finished request to PB.
scheduler_output = scheduler.schedule()
assert len(scheduler.running) == 0
assert len(scheduler_output.finished_req_ids) == 0
assert len(scheduler_output.scheduled_new_reqs) == 0
assert len(scheduler_output.scheduled_cached_reqs) == 0
assert len(scheduler.finished_req_ids) == 0

# (3b): execute_model()
model_runner_output = copy.deepcopy(EMPTY_MODEL_RUNNER_OUTPUT)
model_runner_output.finished_sending = [request_id]

# (3c): update_from_output()
scheduler.update_from_output(scheduler_output, model_runner_output)

# Confirm we do not have any memory leaks after req lifecycle.
assert_scheduler_empty(scheduler)
Loading