-
-
Notifications
You must be signed in to change notification settings - Fork 13k
[NIXL] heterogeneous block_size support #26759
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 28 commits
dcc1ade
37e8c4c
87b1de8
414215a
e6e3d92
f0d8b3a
669ea19
8719d69
ce27358
826f7ee
59a3244
4caca02
90c26f4
d9e1bc9
0872214
8ebde3f
37be2f4
c9dfb51
5eb54e1
7fa82cb
402fadd
9bf9c7f
bcce3a7
b99156a
4d74dd2
aa86c19
456a8c6
d0f9035
fd21da1
160fb6a
8b4507a
2f176da
984637d
23808e5
a6641c2
2e33323
ad1ee8c
c758c16
08f4046
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -49,6 +49,8 @@ NUM_DECODE_INSTANCES=${NUM_DECODE_INSTANCES:-1} # Default to 1 | |
| PREFILLER_TP_SIZE=${PREFILLER_TP_SIZE:-1} | ||
| DECODER_TP_SIZE=${DECODER_TP_SIZE:-1} | ||
| GPU_MEMORY_UTILIZATION=${GPU_MEMORY_UTILIZATION:-0.2} | ||
| PREFILL_BLOCK_SIZE=${PREFILL_BLOCK_SIZE:-16} | ||
| DECODE_BLOCK_SIZE=${DECODE_BLOCK_SIZE:-16} | ||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I switched to 128 but switched back to 16 again.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I verified setting block_size=64 works, but somehow using block_size=128 for CUDA gets tensor Nan. |
||
|
|
||
| # Find the git repository root directory | ||
| GIT_ROOT=$(git rev-parse --show-toplevel) | ||
|
|
@@ -136,6 +138,7 @@ run_tests_for_model() { | |
| vllm serve $model_name \ | ||
| --port $PORT \ | ||
| --enforce-eager \ | ||
| --block-size ${PREFILL_BLOCK_SIZE} \ | ||
| --gpu-memory-utilization $GPU_MEMORY_UTILIZATION \ | ||
| --tensor-parallel-size $PREFILLER_TP_SIZE \ | ||
| --kv-transfer-config '$KV_CONFIG'" | ||
|
|
@@ -177,6 +180,7 @@ run_tests_for_model() { | |
| vllm serve $model_name \ | ||
| --port $PORT \ | ||
| --enforce-eager \ | ||
| --block-size ${DECODE_BLOCK_SIZE} \ | ||
| --gpu-memory-utilization $GPU_MEMORY_UTILIZATION \ | ||
| --kv-transfer-config '$KV_CONFIG'" | ||
|
|
||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can we default to 128? Block size can significantly impact our transfer efficiency, just having it as kind of suggested value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cuda block_size is currently default to 16, should I change to a large number
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep can we set it to 128 at these lines?