Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
dcc1ade
enable hetero blocksize
xuechendi Oct 14, 2025
37e8c4c
Merge remote-tracking branch 'origin/main' into dev/nixl_heter_blocksize
xuechendi Oct 21, 2025
87b1de8
update and cleanup
xuechendi Oct 21, 2025
414215a
naive post process for HND
xuechendi Oct 23, 2025
e6e3d92
Both block_size_ratio < 1 or > 1 works
xuechendi Oct 23, 2025
f0d8b3a
enable BlockAllocator for prefill/decode block_size_ratio > 1
xuechendi Oct 24, 2025
669ea19
Tested both prefill/decode block_size ratio > 1 and < 1 accuracy
xuechendi Oct 24, 2025
8719d69
Merge remote-tracking branch 'origin/main' into dev/nixl_heter_blocksize
xuechendi Oct 24, 2025
ce27358
Enable a second local xfer handler register
xuechendi Oct 30, 2025
826f7ee
Merge remote-tracking branch 'origin/main' into dev/nixl_heter_blocksize
xuechendi Oct 31, 2025
59a3244
remove FIXME
xuechendi Oct 31, 2025
4caca02
remove duplicate func
xuechendi Oct 31, 2025
90c26f4
Limit nP > nD buffer blocks length and print warning when overlapping
xuechendi Oct 31, 2025
d9e1bc9
Merge remote-tracking branch 'origin/main' into dev/nixl_heter_blocksize
xuechendi Oct 31, 2025
0872214
small fix on default buffer setting
xuechendi Oct 31, 2025
8ebde3f
Fix for nP > nD + TP_ratio != 1 scenraio
xuechendi Oct 31, 2025
37be2f4
Merge remote-tracking branch 'origin/main' into dev/nixl_heter_blocksize
xuechendi Nov 4, 2025
c9dfb51
clean up and remove nnecessary shift
xuechendi Nov 4, 2025
5eb54e1
remove nP > nD path, will do it in seperate PR
xuechendi Nov 4, 2025
7fa82cb
move block_size_ratio to kv_topo
xuechendi Nov 5, 2025
402fadd
use default(list)
xuechendi Nov 5, 2025
9bf9c7f
remove unnecessary changes to _get_block_descs_ids
xuechendi Nov 5, 2025
bcce3a7
Clean up
xuechendi Nov 5, 2025
b99156a
Inverse block_size_ratio
xuechendi Nov 7, 2025
4d74dd2
Remove unnecessary check
xuechendi Nov 7, 2025
aa86c19
Merge remote-tracking branch 'origin/main' into dev/nixl_heter_blocksize
xuechendi Nov 7, 2025
456a8c6
don't do post_process for heter_block_size for mla
xuechendi Nov 7, 2025
d0f9035
make pre-commit happy
xuechendi Nov 7, 2025
fd21da1
Update vllm/distributed/kv_transfer/kv_connector/v1/nixl_connector.py
xuechendi Nov 12, 2025
160fb6a
Merge remote-tracking branch 'origin/main' into dev/nixl_heter_blocksize
xuechendi Nov 12, 2025
8b4507a
Fix comments
xuechendi Nov 12, 2025
2f176da
Merge remote-tracking branch 'origin/main' into dev/nixl_heter_blocksize
xuechendi Nov 12, 2025
984637d
update script
xuechendi Nov 12, 2025
23808e5
fix script and comments
xuechendi Nov 13, 2025
a6641c2
fix UT
xuechendi Nov 13, 2025
2e33323
Merge remote-tracking branch 'origin/main' into dev/nixl_heter_blocksize
xuechendi Nov 14, 2025
ad1ee8c
Merge branch 'main' into dev/nixl_heter_blocksize
xuechendi Nov 14, 2025
c758c16
Merge branch 'main' into dev/nixl_heter_blocksize
xuechendi Nov 14, 2025
08f4046
Merge branch 'main' into dev/nixl_heter_blocksize
xuechendi Nov 14, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions tests/v1/kv_connector/nixl_integration/run_accuracy_test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,8 @@ NUM_DECODE_INSTANCES=${NUM_DECODE_INSTANCES:-1} # Default to 1
PREFILLER_TP_SIZE=${PREFILLER_TP_SIZE:-1}
DECODER_TP_SIZE=${DECODER_TP_SIZE:-1}
GPU_MEMORY_UTILIZATION=${GPU_MEMORY_UTILIZATION:-0.2}
PREFILL_BLOCK_SIZE=${PREFILL_BLOCK_SIZE:-16}
DECODE_BLOCK_SIZE=${DECODE_BLOCK_SIZE:-16}
Copy link
Contributor Author

@xuechendi xuechendi Nov 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I switched to 128 but switched back to 16 again.
I tested with 128, and noticed that even on origin/main, accuracy is not correct at this moment.
Will see if I can find out main reason in separate PR.

Copy link
Contributor Author

@xuechendi xuechendi Nov 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I verified setting block_size=64 works, but somehow using block_size=128 for CUDA gets tensor Nan.
@NickLucche , do you want me to set as 64? Actually I refer to use current default for CUDA which is 16/


# Find the git repository root directory
GIT_ROOT=$(git rev-parse --show-toplevel)
Expand Down Expand Up @@ -136,6 +138,7 @@ run_tests_for_model() {
vllm serve $model_name \
--port $PORT \
--enforce-eager \
--block-size ${PREFILL_BLOCK_SIZE} \
--gpu-memory-utilization $GPU_MEMORY_UTILIZATION \
--tensor-parallel-size $PREFILLER_TP_SIZE \
--kv-transfer-config '$KV_CONFIG'"
Expand Down Expand Up @@ -177,6 +180,7 @@ run_tests_for_model() {
vllm serve $model_name \
--port $PORT \
--enforce-eager \
--block-size ${DECODE_BLOCK_SIZE} \
--gpu-memory-utilization $GPU_MEMORY_UTILIZATION \
--kv-transfer-config '$KV_CONFIG'"

Expand Down
3 changes: 3 additions & 0 deletions tests/v1/kv_connector/unit/test_nixl_connector.py
Original file line number Diff line number Diff line change
Expand Up @@ -407,6 +407,7 @@ def _nixl_handshake(
# `self.kv_cache_layout` is only forced to HND when vllm engine
# is started. We mock HND here.
kv_cache_layout="HND",
block_size=self.block_size,
),
remote_tp_size=remote_tp_size,
)
Expand Down Expand Up @@ -652,6 +653,7 @@ def test_handshake_fails_on_kv_cache_layout_mismatch(self, dist_init):
block_lens=worker.block_len_per_layer,
attn_backend_name=worker.backend_name,
kv_cache_layout=mismatched_layout,
block_size=worker.block_size,
)

with pytest.raises(RuntimeError):
Expand Down Expand Up @@ -706,6 +708,7 @@ def test_handshake_succeed_on_kv_cache_layout_mismatch_with_experimental(
block_lens=[i * 2 for i in worker.block_len_per_layer],
attn_backend_name=worker.backend_name,
kv_cache_layout="HND",
block_size=worker.block_size,
)

# We don't check layout for homogeneous TP and MLA for now, as the
Expand Down
Loading