Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
163 commits
Select commit Hold shift + click to select a range
4730522
[Update] LMcache connector v1 implementation
ApostaC Apr 17, 2025
4162650
[Add] examples for disaggregated prefill
ApostaC Apr 17, 2025
3ccd34c
[add] extra information about evns
ApostaC Apr 18, 2025
161010c
Initial stubs for P/D scheduling changes
tlrmchlsmth Apr 18, 2025
38a2eb8
Merge branch 'main' into local-dev/lmcache-v1-connector-pr
tlrmchlsmth Apr 19, 2025
6c3191f
Merge branch 'local-dev/lmcache-v1-connector-pr' into pd_scheduling_l…
tlrmchlsmth Apr 19, 2025
1f708e9
Updates
tlrmchlsmth Apr 19, 2025
038f2f8
Rs branch (#3)
robertgshaw2-redhat Apr 20, 2025
5c4fc6f
Rs branch (#5)
robertgshaw2-redhat Apr 20, 2025
1800689
Remove Unneeded Arguments (#7)
robertgshaw2-redhat Apr 21, 2025
7a1f25f
Improve disagg-example.sh (#8)
tlrmchlsmth Apr 21, 2025
2385d8e
updated
robertgshaw2-redhat Apr 22, 2025
6eeb47c
updated
robertgshaw2-redhat Apr 22, 2025
266fcee
updated
robertgshaw2-redhat Apr 22, 2025
f7e16f1
updated
robertgshaw2-redhat Apr 22, 2025
f591b8e
added connector
robertgshaw2-redhat Apr 22, 2025
184d0b6
updated
robertgshaw2-redhat Apr 22, 2025
d4a9e5b
updated
robertgshaw2-redhat Apr 22, 2025
4b0d1dc
updated
robertgshaw2-redhat Apr 22, 2025
bfef039
updated
robertgshaw2-redhat Apr 22, 2025
54f4a43
updated
robertgshaw2-redhat Apr 22, 2025
e604b09
updated
robertgshaw2-redhat Apr 22, 2025
2fc00ad
updated
robertgshaw2-redhat Apr 22, 2025
e5967b6
updated
robertgshaw2-redhat Apr 22, 2025
f1bc0f7
updated
robertgshaw2-redhat Apr 22, 2025
1cea2bb
updated
robertgshaw2-redhat Apr 22, 2025
489e4c0
updated
robertgshaw2-redhat Apr 22, 2025
437ac91
updated
robertgshaw2-redhat Apr 22, 2025
ea47af7
updated
robertgshaw2-redhat Apr 22, 2025
554b27d
updated
robertgshaw2-redhat Apr 22, 2025
1aea5ba
updated
robertgshaw2-redhat Apr 22, 2025
e0c112b
updated
robertgshaw2-redhat Apr 22, 2025
c7717c1
update
robertgshaw2-redhat Apr 22, 2025
e0af1db
remove
robertgshaw2-redhat Apr 22, 2025
9533471
updated
robertgshaw2-redhat Apr 22, 2025
2eb068e
updated
robertgshaw2-redhat Apr 22, 2025
0f2b7e3
updated
robertgshaw2-redhat Apr 22, 2025
6127cb8
updated
robertgshaw2-redhat Apr 22, 2025
568249e
updated
robertgshaw2-redhat Apr 23, 2025
ccb44ea
seems to load properly
robertgshaw2-redhat Apr 23, 2025
3785905
updated
robertgshaw2-redhat Apr 23, 2025
8a94b2e
updated
robertgshaw2-redhat Apr 24, 2025
ac19437
updated
robertgshaw2-redhat Apr 24, 2025
6391ec9
updated
robertgshaw2-redhat Apr 24, 2025
7dd764b
updated
robertgshaw2-redhat Apr 24, 2025
97316d9
updated
robertgshaw2-redhat Apr 24, 2025
2771353
Revert "updated"
robertgshaw2-redhat Apr 24, 2025
baed1bf
updated
robertgshaw2-redhat Apr 24, 2025
d0ad6d9
updated
robertgshaw2-redhat Apr 24, 2025
055885e
updated
robertgshaw2-redhat Apr 24, 2025
5ed3806
updated
robertgshaw2-redhat Apr 24, 2025
58266b5
updated
robertgshaw2-redhat Apr 24, 2025
344d9da
stash
robertgshaw2-redhat Apr 24, 2025
2996638
added
robertgshaw2-redhat Apr 24, 2025
bcc88dc
diffs for local dev on macos
Apr 24, 2025
62205ae
updated
Apr 24, 2025
b4609a5
update
Apr 24, 2025
5d78ba6
updaed
Apr 25, 2025
c1f26b9
updated
Apr 25, 2025
9b9ef36
updated
Apr 25, 2025
c60639e
Checkpoint.
tlrmchlsmth Apr 25, 2025
006dda3
Merge branch 'pd_scheduling_nixl' of https://github.com/robertgshaw2-…
tlrmchlsmth Apr 25, 2025
c5e023e
updated
Apr 25, 2025
8b0c93c
Cleanup
tlrmchlsmth Apr 26, 2025
5e45d90
WIP
tlrmchlsmth Apr 26, 2025
20a5491
updated
Apr 27, 2025
cee3c61
updated
Apr 27, 2025
5972571
updated on scheduler side
Apr 27, 2025
1b69d33
updated
Apr 27, 2025
74e105a
Merge remote-tracking branch 'rs/pd_scheduling_rob_dev' into nixl_int…
tlrmchlsmth Apr 27, 2025
8adf1ad
updated
Apr 27, 2025
21ab3d9
updated
Apr 27, 2025
3a27bbc
updated
Apr 27, 2025
f252df9
updated
Apr 27, 2025
8104803
updated
Apr 27, 2025
10bbe21
Hacking away
tlrmchlsmth Apr 27, 2025
a14278c
Merge remote-tracking branch 'rs/pd_scheduling_rob_dev_2' into nixl_i…
tlrmchlsmth Apr 27, 2025
65ea91f
cleanup
Apr 27, 2025
f2550ef
ensure request removed from running list
Apr 27, 2025
985bac3
Runs E2E. Garbage output. Crashes on 2nd request
tlrmchlsmth Apr 27, 2025
bf37a7d
update
tlrmchlsmth Apr 27, 2025
ebe1263
updated
Apr 27, 2025
a008aa3
updated
Apr 27, 2025
195dceb
rename files
Apr 27, 2025
e2cc365
updated
Apr 27, 2025
2324a50
Merge remote-tracking branch 'rs/pd_scheduling_rob_dev_2' into nixl_i…
tlrmchlsmth Apr 27, 2025
b4b64fe
updated
Apr 27, 2025
6686397
updated
Apr 27, 2025
8736043
updated
Apr 27, 2025
dcbf6e5
updated
Apr 27, 2025
7c8e21a
update
Apr 27, 2025
a4855d2
Second request no longer crashes
tlrmchlsmth Apr 27, 2025
0914040
Merge remote-tracking branch 'rs/pd_scheduling_rob_dev_2' into nixl_i…
tlrmchlsmth Apr 27, 2025
c5b3053
Remove gpu_model_runner hacks
tlrmchlsmth Apr 27, 2025
7502819
Clean up Justfile
tlrmchlsmth Apr 28, 2025
7768b96
[Bugfix] Stale finished requests in EMPTY_MODEL_RUNNER_OUTPUT
tlrmchlsmth Apr 28, 2025
a5950b7
update
tlrmchlsmth Apr 28, 2025
610a357
justfile edits
tlrmchlsmth Apr 28, 2025
5b026ab
Update
tlrmchlsmth Apr 28, 2025
f2fadd6
Fixes - lm_eval gsm8k has correctness
tlrmchlsmth Apr 29, 2025
4060f86
"just delete the assert"
tlrmchlsmth Apr 29, 2025
bfe9d19
fixup precommit issues
tlrmchlsmth Apr 29, 2025
ced529a
Fixes
tlrmchlsmth Apr 29, 2025
83f2872
updated (#12)
robertgshaw2-redhat Apr 30, 2025
e853b3c
Add Accuracy Test (#13)
robertgshaw2-redhat Apr 30, 2025
1c45ed1
Preemption Bugfixes (#15)
robertgshaw2-redhat May 1, 2025
a45a694
updated (#16)
robertgshaw2-redhat May 1, 2025
f6d0ac5
Merge branch 'main' into nixl_integration
tlrmchlsmth May 1, 2025
39c0813
[V1][Spec Decode] Apply torch.compile & cudagraph to EAGLE3 (#17504)
zixi-qi May 1, 2025
2f9a3f3
Fix Bad Merge | Fix Memory Leak in Upstream (#18)
robertgshaw2-redhat May 2, 2025
24aebae
[Bugfix] Disable gptq_bitblas for <SM80 to fix GPTQ on V100/T4 (#17541)
mgoin May 2, 2025
afb12e4
[Doc] note that not all unit tests pass on CPU platforms (#17554)
davidxia May 2, 2025
afcb3f8
[Attention] MLA move o_proj q_proj into cuda-graph region (#17484)
LucasWilkinson May 2, 2025
292fc59
[CI] Actually run tests/kv_transfer/test_disagg.py in CI (#17555)
mgoin May 2, 2025
b4003d1
Check if bitblas is installed during support check (#17572)
mgoin May 2, 2025
f89d0e1
[Misc] Continue refactoring model tests (#17573)
DarkLight1337 May 2, 2025
f192ca9
Fix PixtralHF missing spatial_merge_size (#17571)
mgoin May 2, 2025
109e15a
Add `pt_load_map_location` to allow loading to cuda (#16869)
jerryzh168 May 2, 2025
9e2de9b
[Bugifx] Remove TritonPlaceholder from sys.modules (#17317)
Isotr0py May 2, 2025
cc2a77d
[Core] [Bugfix] Add Input Embeddings (#15428)
qthequartermasterman May 2, 2025
c777df7
[BugFix] Fix Memory Leak (#17567)
robertgshaw2-redhat May 2, 2025
d754386
[Misc] Rename assets for testing (#17575)
DarkLight1337 May 2, 2025
b8b0859
add more pytorch related tests for torch nightly (#17422)
yangw-dev May 2, 2025
6d1479c
[doc] add the print result (#17584)
reidliu41 May 2, 2025
785d75a
Automatically tell users that dict args must be valid JSON in CLI (#1…
hmellor May 2, 2025
99404f5
[Security] Fix image hash collision (#17378)
DarkLight1337 May 2, 2025
868c546
Support W8A8 INT8 MoE for compressed-tensors (#16745)
mgoin May 2, 2025
3a500cd
[doc] miss result (#17589)
reidliu41 May 2, 2025
113527b
clean up justfile, examples
tlrmchlsmth May 2, 2025
cb23495
[Misc] Clean up input processing (#17582)
DarkLight1337 May 2, 2025
4c33d67
[Bugfix] fix tmp_out and exp_sums dimensions (#17438)
hliuca May 2, 2025
b3733e8
Merge branch 'nixl_integration' of https://github.com/tlrmchlsmth/vll…
tlrmchlsmth May 2, 2025
5f8b280
more cleanup
tlrmchlsmth May 2, 2025
79e7b2a
more cleanup
tlrmchlsmth May 2, 2025
e8ab678
more cleanup
tlrmchlsmth May 2, 2025
969daa9
more cleanup
tlrmchlsmth May 2, 2025
f9a3f3a
More cleanup
tlrmchlsmth May 2, 2025
aec447c
more cleanup
tlrmchlsmth May 2, 2025
0f87d8f
[BugFix][Attention] Fix sliding window attention in V1 giving incorre…
LucasWilkinson May 2, 2025
3e887d2
permute/unpermute kernel for moe optimization (#14568)
CalebDu May 2, 2025
182f40e
Add NVIDIA TensorRT Model Optimizer in vLLM documentation (#17561)
Edwardf0t1 May 2, 2025
ac68a75
more cleanup, precommit fixes
tlrmchlsmth May 2, 2025
e72245b
More cleanup
tlrmchlsmth May 2, 2025
9352cdb
[Hardware][AMD] Improve OAM device ID + llama4 Maverick MOE tuning (#…
xw285cornell May 2, 2025
cd2aa72
run_accuracy_test.sh UX
tlrmchlsmth May 2, 2025
10183d5
squash warnings
tlrmchlsmth May 2, 2025
9eb9787
pre-commit
tlrmchlsmth May 2, 2025
aeef78b
update
tlrmchlsmth May 2, 2025
5306d5b
Add get_finished to base kv connector
mgoin May 2, 2025
a16f2be
revert test.txt
tlrmchlsmth May 2, 2025
445b010
cleanup
tlrmchlsmth May 2, 2025
14a4c64
Cleanup
tlrmchlsmth May 2, 2025
b90b085
[easy] Print number of needed GPUs in skip message (#17594)
zou3519 May 2, 2025
9b103a1
fix typo in logging (#17605)
ehartford May 3, 2025
3ec97e2
[release] Add command to clean up Docker containers/images in TPU rel…
khluu May 3, 2025
22c6f63
[Neuron][Build] Require setuptools >= 77.0.3 for PEP 639 (#17603)
liangfu May 3, 2025
d47b605
Update test requirements to CUDA 12.8 (#17576)
22quinn May 3, 2025
e3d0a1d
[Quantizaton] [AMD] Add support for running DeepSeek int8 w8a8 MoE on…
rasmith May 3, 2025
87baebe
[Frontend][TPU] Add TPU default max-num-batched-tokens based on devic…
Chenyaaang May 3, 2025
c8386fa
[Build/CI] Upgrade CUTLASS to 3.9.1 (#17602)
tlrmchlsmth May 3, 2025
a928424
[Bugfix][ROCm] Using device_type because on ROCm the API is still tor…
gshtras May 3, 2025
fc7d8ad
review comments
tlrmchlsmth May 3, 2025
82c8679
Review comments, merge latest main
tlrmchlsmth May 3, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .buildkite/release-pipeline.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,7 @@ steps:
agents:
queue: tpu_queue_postmerge
commands:
- "yes | docker system prune -a"
- "git fetch --all"
- "DOCKER_BUILDKIT=1 docker build --build-arg max_jobs=16 --build-arg USE_SCCACHE=1 --build-arg GIT_REPO_CHECK=1 --tag vllm/vllm-tpu:nightly --tag vllm/vllm-tpu:$BUILDKITE_COMMIT --progress plain -f docker/Dockerfile.tpu ."
- "docker push vllm/vllm-tpu:nightly"
Expand Down
4 changes: 4 additions & 0 deletions .buildkite/test-pipeline.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -293,6 +293,7 @@ steps:
parallelism: 4

- label: PyTorch Compilation Unit Tests
torch_nightly: true
source_file_dependencies:
- vllm/
- tests/compile
Expand All @@ -302,6 +303,7 @@ steps:
- pytest -v -s compile/test_sequence_parallelism.py

- label: PyTorch Fullgraph Smoke Test # 9min
torch_nightly: true
source_file_dependencies:
- vllm/
- tests/compile
Expand All @@ -312,6 +314,7 @@ steps:
- pytest -v -s compile/piecewise/test_toy_llama.py

- label: PyTorch Fullgraph Test # 18min
torch_nightly: true
source_file_dependencies:
- vllm/
- tests/compile
Expand Down Expand Up @@ -436,6 +439,7 @@ steps:
##### models test #####

- label: Basic Models Test # 24min
torch_nightly: true
source_file_dependencies:
- vllm/
- tests/models
Expand Down
2 changes: 1 addition & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ repos:
rev: 0.6.17
hooks:
- id: pip-compile
args: [requirements/test.in, -o, requirements/test.txt, --index-strategy, unsafe-best-match]
args: [requirements/test.in, -o, requirements/test.txt, --index-strategy, unsafe-best-match, --torch-backend, cu128]
files: ^requirements/test\.(in|txt)$
- repo: local
hooks:
Expand Down
21 changes: 16 additions & 5 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,6 @@ project(vllm_extensions LANGUAGES CXX)

# CUDA by default, can be overridden by using -DVLLM_TARGET_DEVICE=... (used by setup.py)
set(VLLM_TARGET_DEVICE "cuda" CACHE STRING "Target device backend for vLLM")

message(STATUS "Build type: ${CMAKE_BUILD_TYPE}")
message(STATUS "Target device: ${VLLM_TARGET_DEVICE}")

Expand Down Expand Up @@ -250,9 +249,8 @@ set(VLLM_EXT_SRC
if(VLLM_GPU_LANG STREQUAL "CUDA")
SET(CUTLASS_ENABLE_HEADERS_ONLY ON CACHE BOOL "Enable only the header library")

# Set CUTLASS_REVISION manually -- its revision detection doesn't work in this case.
# Please keep this in sync with FetchContent_Declare line below.
set(CUTLASS_REVISION "v3.9.0" CACHE STRING "CUTLASS revision to use")
# Set CUTLASS_REVISION. Used for FetchContent. Also fixes some bogus messages when building.
set(CUTLASS_REVISION "v3.9.1" CACHE STRING "CUTLASS revision to use")

# Use the specified CUTLASS source directory for compilation if VLLM_CUTLASS_SRC_DIR is provided
if (DEFINED ENV{VLLM_CUTLASS_SRC_DIR})
Expand All @@ -270,7 +268,7 @@ if(VLLM_GPU_LANG STREQUAL "CUDA")
cutlass
GIT_REPOSITORY https://github.com/nvidia/cutlass.git
# Please keep this in sync with CUTLASS_REVISION line above.
GIT_TAG v3.9.0
GIT_TAG ${CUTLASS_REVISION}
GIT_PROGRESS TRUE

# Speed up CUTLASS download by retrieving only the specified GIT_TAG instead of the history.
Expand Down Expand Up @@ -682,6 +680,17 @@ if(VLLM_GPU_LANG STREQUAL "CUDA")
endif()
endif()

if(VLLM_GPU_LANG STREQUAL "CUDA")
set(MOE_PERMUTE_SRC
"csrc/moe/permute_unpermute_kernels/moe_permute_unpermute_kernel.cu"
"csrc/moe/moe_permute_unpermute_op.cu")

set_gencode_flags_for_srcs(
SRCS "${MARLIN_PERMUTE_SRC}"
CUDA_ARCHS "${MOE_PERMUTE_ARCHS}")

list(APPEND VLLM_MOE_EXT_SRC "${MOE_PERMUTE_SRC}")
endif()
message(STATUS "Enabling moe extension.")
define_gpu_extension_target(
_moe_C
Expand All @@ -690,6 +699,8 @@ define_gpu_extension_target(
SOURCES ${VLLM_MOE_EXT_SRC}
COMPILE_FLAGS ${VLLM_GPU_FLAGS}
ARCHITECTURES ${VLLM_GPU_ARCHES}
INCLUDE_DIRECTORIES ${CUTLASS_INCLUDE_DIR}
INCLUDE_DIRECTORIES ${CUTLASS_TOOLS_UTIL_INCLUDE_DIR}
USE_SABI 3
WITH_SOABI)

Expand Down
3 changes: 2 additions & 1 deletion benchmarks/kernels/benchmark_grouped_gemm_cutlass.py
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,8 @@ def bench_run(results: list[benchmark.Measurement], model: str,

score = torch.randn((m, num_experts), device="cuda", dtype=dtype)

topk_weights, topk_ids = fused_topk(a, score, topk, renormalize=False)
topk_weights, topk_ids, token_expert_indices = fused_topk(
a, score, topk, renormalize=False)

def run_triton_moe(a: torch.Tensor, w1: torch.Tensor, w2: torch.Tensor,
topk_weights: torch.Tensor, topk_ids: torch.Tensor,
Expand Down
23 changes: 19 additions & 4 deletions benchmarks/kernels/benchmark_moe.py
Original file line number Diff line number Diff line change
Expand Up @@ -115,8 +115,8 @@ def run():
from vllm.model_executor.layers.fused_moe import override_config
with override_config(config):
if use_deep_gemm:
topk_weights, topk_ids = fused_topk(x, input_gating, topk,
False)
topk_weights, topk_ids, token_expert_indices = fused_topk(
x, input_gating, topk, False)
return fused_experts(
x,
w1,
Expand Down Expand Up @@ -442,8 +442,14 @@ def tune(
hidden_size, search_space,
is_fp16, topk)

with torch.cuda.device(self.device_id) if current_platform.is_rocm(
) else nullcontext():
need_device_guard = False
if current_platform.is_rocm():
visible_device = os.environ.get("ROCR_VISIBLE_DEVICES", None)
if visible_device != f"{self.device_id}":
need_device_guard = True

with torch.cuda.device(
self.device_id) if need_device_guard else nullcontext():
for config in tqdm(search_space):
try:
kernel_time = benchmark_config(
Expand Down Expand Up @@ -578,6 +584,15 @@ def main(args: argparse.Namespace):

use_deep_gemm = bool(args.use_deep_gemm)

if current_platform.is_rocm() and "HIP_VISIBLE_DEVICES" in os.environ:
# Ray will set ROCR_VISIBLE_DEVICES for device visibility
logger.warning(
"Ray uses ROCR_VISIBLE_DEVICES to control device accessibility."
"Replacing HIP_VISIBLE_DEVICES with ROCR_VISIBLE_DEVICES.")
val = os.environ["HIP_VISIBLE_DEVICES"]
os.environ["ROCR_VISIBLE_DEVICES"] = val
del os.environ["HIP_VISIBLE_DEVICES"]

ray.init()
num_gpus = int(ray.available_resources()["GPU"])
workers = [BenchmarkWorker.remote(args.seed) for _ in range(num_gpus)]
Expand Down
Loading