Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
32e8ed0
[compile] Enable sequence parallelism matching w/o custom ops enabled
angelayi Oct 15, 2025
dc60743
[compile] Fix rmsnorm
angelayi Oct 17, 2025
88cdee0
Add e2e tests
angelayi Oct 21, 2025
aadc6b2
Fix test_sequence_parallel + FP8
angelayi Nov 3, 2025
b1ff48e
more robust fp8 check
ProExpertProg Nov 3, 2025
b46577d
Merge branch 'main' into sp_custom_op
ProExpertProg Nov 4, 2025
f03f8c3
disable async tp tests
angelayi Nov 5, 2025
f7be9d1
Merge remote-tracking branch 'upstream/main' into sp_custom_op
ProExpertProg Nov 11, 2025
06f5b84
fix SP test with +rms_norm for piecewise graph
ProExpertProg Nov 11, 2025
225782d
Better handling of custom rms norm, fix unit test, reduce warnings
ProExpertProg Nov 11, 2025
a002565
Fix e2e fusion numbers
ProExpertProg Nov 11, 2025
df95562
Merge branch 'refs/heads/main' into sp_custom_op
ProExpertProg Nov 12, 2025
2a85278
Reorganize tests to fix failures and improve coverage:
ProExpertProg Nov 12, 2025
85d1bb9
Merge branch 'main' into sp_custom_op
ProExpertProg Nov 12, 2025
b138b94
Merge branch 'main' into sp_custom_op
ProExpertProg Nov 12, 2025
84255fd
Fix CI
ProExpertProg Nov 12, 2025
2547580
Merge branch 'main' into sp_custom_op
ProExpertProg Nov 12, 2025
4c5d335
Fix pre-commit
ProExpertProg Nov 12, 2025
20dc462
Merge remote-tracking branch 'upstream/main' into sp_custom_op
ProExpertProg Nov 12, 2025
06a3878
FI broken on Blackwell for llama4, broken on Hopper with fp8 kvcache
ProExpertProg Nov 13, 2025
0f2761b
Typo in CI
ProExpertProg Nov 13, 2025
6f6d0f5
Skip llama-4 in distributed tests
angelayi Nov 14, 2025
9bb0755
fix SP count for qwen
ProExpertProg Nov 15, 2025
c76f6d6
change comment
ProExpertProg Nov 15, 2025
b02f8e6
remove unnecessary todo
ProExpertProg Nov 15, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 8 additions & 6 deletions .buildkite/test-pipeline.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -475,10 +475,11 @@ steps:
- vllm/
- tests/compile
commands:
# fp8 kv scales not supported on sm89, tested on Blackwell instead
- pytest -v -s compile/test_full_graph.py -k 'not test_fp8_kv_scale_compile'
# Limit to no custom ops to reduce running time
# Wrap with quotes to escape yaml and avoid starting -k string with a -
- "pytest -v -s compile/test_fusions_e2e.py -k 'TRITON and -quant_fp8'"
- "pytest -v -s compile/test_fusions_e2e.py -k 'TRITON and not +quant_fp8 and not Llama-4'"

- label: Cudagraph test
timeout_in_minutes: 20
Expand Down Expand Up @@ -922,7 +923,7 @@ steps:
- pytest -v -s tests/kernels/moe/test_ocp_mx_moe.py
- pytest -v -s tests/kernels/moe/test_flashinfer.py

- label: Blackwell Fusion Tests # 30 min
- label: Blackwell Fusion & Compile Tests # 30 min
timeout_in_minutes: 40
working_dir: "/vllm-workspace/"
gpu: b200
Expand All @@ -943,7 +944,9 @@ steps:
- pytest -v -s tests/compile/test_fusion_all_reduce.py
# Limit to Inductor partition, no custom ops, and allreduce & attn fusion to reduce running time
# Wrap with quotes to escape yaml
- "pytest -v -s tests/compile/test_fusions_e2e.py::test_tp2_attn_quant_allreduce_rmsnorm -k 'True and Llama-3.1 and -quant_fp8 and -rms_norm'"
- "pytest -v -s tests/compile/test_fusions_e2e.py::test_tp2_attn_quant_allreduce_rmsnorm -k 'True and not +quant_fp8 and not +rms_norm'"
# test_fp8_kv_scale_compile requires FlashAttention (not supported on default L4/L40)
- pytest -v -s tests/compile/test_full_graph.py::test_fp8_kv_scale_compile

- label: Blackwell Fusion E2E Tests # 30 min
timeout_in_minutes: 40
Expand All @@ -966,8 +969,6 @@ steps:
- nvidia-smi
# Run all e2e fusion tests
- pytest -v -s tests/compile/test_fusions_e2e.py
# test_fp8_kv_scale_compile requires FlashAttention (not supported on default L4/L40)
- pytest -v -s tests/compile/test_full_graph.py::test_fp8_kv_scale_compile

- label: Blackwell GPT-OSS Eval
timeout_in_minutes: 60
Expand Down Expand Up @@ -1263,7 +1264,8 @@ steps:
- pytest -v -s tests/compile/test_async_tp.py
- pytest -v -s tests/compile/test_sequence_parallelism.py
- pytest -v -s tests/compile/test_fusion_all_reduce.py
- pytest -v -s tests/compile/test_fusions_e2e.py::test_tp2_attn_quant_allreduce_rmsnorm
- "pytest -v -s tests/compile/test_fusions_e2e.py -k 'not Llama-4'"
- pytest -v -s tests/distributed/test_sequence_parallel.py
- pytest -v -s tests/distributed/test_context_parallel.py
- CUDA_VISIBLE_DEVICES=1,2 VLLM_ALL2ALL_BACKEND=deepep_high_throughput VLLM_USE_DEEP_GEMM=1 VLLM_LOGGING_LEVEL=DEBUG python3 examples/offline_inference/data_parallel.py --model Qwen/Qwen1.5-MoE-A2.7B --tp-size=1 --dp-size=2 --max-model-len 2048
- pytest -v -s tests/v1/distributed/test_dbo.py
Expand Down
Loading