[compile] Enable sequence parallelism matching w/o custom ops enabled #27126

angelayi · 2025-10-17T21:14:19Z

Purpose

Based on #24604, modified sequence-parallelism pass to do custom op matching w/o needing to enable the custom op

Test Plan

pytest -sv tests/compile/test_sequence_parallelism.py

Performance numbers

I did some benchmarking with the command on H100 w/o flashinfer

VLLM_DISABLE_COMPILE_CACHE=1 VLLM_USE_STANDALONE_COMPILE=1 VLLM_LOGGING_LEVEL=DEBUG vllm bench latency --model=nvidia/Llama-3.3-70B-Instruct-FP8 --output-len 1 --input-len 8192 --batch-size 1 --tensor-parallel-size 8 --load-format dummy --num_iters_warmup 5 --num_iters 15 -O '{"level": 3, "use_inductor_graph_partition": false, "splitting_ops":[], "cudagraph_mode": "FULL", }' --no-enable-prefix-caching

while varying

"pass_config": {"enable_async_tp": true, "enable_sequence_parallelism": true} vs. "pass_config": {"enable_async_tp": false, "enable_sequence_parallelism": false}
"custom_ops":["+quant_fp8", "+rms_norm"] vs. "custom_ops":[]

ProExpertProg

Thanks for taking this on! Could you just add me as a co-author on one of the commits?

tests/compile/test_sequence_parallelism.py

ProExpertProg · 2025-10-18T23:06:57Z

vllm/compilation/sequence_parallelism.py

-    """Base helper for RMSNorm and RMSNorm + Quantization functionalization."""
+def get_first_out_wrapper(fn):
+    @functools.wraps(fn)
+    def wrapper(*args):


Does this work? I thought that during tracing the pattern matching tracer will think that args is a single parameter

yes! updated the test to assert the number of all_reduce/all_gather ops in the graph!

ProExpertProg

@cascade812 could you take a look at this please?

ProExpertProg · 2025-10-20T22:18:32Z

Also @angelayi just noticed there's no e2e tests - could you make the existing E2E tests use no custom ops by default (tests/distributed/test_sequence_parallelism.py or something like that) as well as add tests to test_fusions_e2e.py (feel free to grab from #27062)

cascade812 · 2025-10-20T22:42:55Z

@cascade812 could you take a look at this please?

Sure!

cascade812 · 2025-10-26T21:38:20Z

@angelayi I have below error if not specify custom_ops=["+rms_norm"]

torch._inductor.exc.InductorError: RuntimeError: The size of tensor a (s72) must match the size of tensor b ((s72//2)) at non-singleton dimension 0)

cascade812 · 2025-10-26T21:49:07Z

@angelayi It seems odd to me that enabling AsyncTP results in higher latency for Llama-70B. From our earlier benchmark, we observed about a 10% reduce in average latency for prefill stage with AsyncTP enabled for the same model on 4XH200.

ProExpertProg

We no longer have to skip the FP4 tests!

tests/compile/test_fusions_e2e.py

ZJY0516 · 2025-11-14T14:31:23Z

tests/compile/test_sequence_parallelism.py

-            # If no fusion, the original ops are checked
+        elif RMSNorm.enabled():
            return [
                torch.ops._C.fused_add_rms_norm.default,


May I ask why can't we have fused_add_rms_norm_static_fp8_quant be fused in all cases?

ZJY0516

I think this would make the logic clearer and more straightforward

tests/compile/test_sequence_parallelism.py

Signed-off-by: angelayi <[email protected]>

tests/compile/test_fusions_e2e.py

Signed-off-by: Luka Govedič <[email protected]>

tests/compile/test_fusions_e2e.py

Signed-off-by: Luka Govedič <[email protected]>

…vllm-project#27126) Signed-off-by: angelayi <[email protected]> Signed-off-by: Luka Govedič <[email protected]> Signed-off-by: ProExpertProg <[email protected]> Co-authored-by: Luka Govedič <[email protected]> Co-authored-by: Luka Govedič <[email protected]> Co-authored-by: Luka Govedič <[email protected]> Signed-off-by: George D. Torres <[email protected]>

…#27126) Signed-off-by: angelayi <[email protected]> Signed-off-by: Luka Govedič <[email protected]> Signed-off-by: ProExpertProg <[email protected]> Co-authored-by: Luka Govedič <[email protected]> Co-authored-by: Luka Govedič <[email protected]> Co-authored-by: Luka Govedič <[email protected]> (cherry picked from commit f36292d)

…vllm-project#27126) Signed-off-by: angelayi <[email protected]> Signed-off-by: Luka Govedič <[email protected]> Signed-off-by: ProExpertProg <[email protected]> Co-authored-by: Luka Govedič <[email protected]> Co-authored-by: Luka Govedič <[email protected]> Co-authored-by: Luka Govedič <[email protected]> Signed-off-by: Bram Wasti <[email protected]>

Bump vLLM version to v0.11.2 What's broken and changed by vLLM: 1. structured_output is broken by vllm-project/vllm#26866 2. get_mrope_input_positions is broken by vllm-project/vllm#28399 3. graph mode is broken by vllm-project/vllm#25110 we'll upgrade torch to 2.8 to fix the problem later 4. embedding is broken by vllm-project/vllm#27583 5. `get_attn_backend_cls` and attention backend is broken are broken by vllm-project/vllm#28534 6. spec decode is broken by vllm-project/vllm#28771 7. sp feature is broken by vllm-project/vllm#27126 8. mtp is broken by vllm-project/vllm#27922 9. lora is broken by vllm-project/vllm#21068 10. execute_model is broken by vllm-project/vllm#26866 11. `VLLM_DISABLE_SHARED_EXPERTS_STREAM` env is broken by vllm-project/vllm#28159 12. kv cahe is broken by vllm-project/vllm#27753 13. dp is broken by vllm-project/vllm#25110 What's broken and changed by ourself: 1. qwen vl is broken by vllm-project/vllm#28455 We'll remove model files in the future to avoid this kind of error 2. Engine core is broken by vllm-project/vllm#23691 We'll remove the patch file in the future. 3. Ascend scheduler is broken by vllm-project/vllm#28733 We'll remove ascend scheudler later. 4. qwen3-next is broken by vllm-project/vllm#28083 We'll remove model files in the future to avoid this kind of error 5. qwen vl is broken by vllm-project/vllm#27764. We'll remove model files in the future Known issue: 1. ray doesn't work 2. the accuracy of qwen3-next is not correct 3. qwen3-vl is broken 4. prefix cache+ ascend scheduler + deepseek v2 lite is broken. Co-authored-by: MengqingCao <[email protected]> Co-authored-by: hfadzxy <[email protected]> Co-authored-by: leo-pony <[email protected]> Co-authored-by: 22dimensions <[email protected]> Co-authored-by: shen-shanshan <[email protected]> - vLLM version: v0.11.2 --------- Signed-off-by: wangxiyuan <[email protected]> Signed-off-by: MengqingCao <[email protected]> Signed-off-by: hfadzxy <[email protected]> Signed-off-by: leo-pony <[email protected]> Co-authored-by: MengqingCao <[email protected]> Co-authored-by: hfadzxy <[email protected]> Co-authored-by: leo-pony <[email protected]>

Bump vLLM version to v0.11.2 What's broken and changed by vLLM: 1. structured_output is broken by vllm-project/vllm#26866 2. get_mrope_input_positions is broken by vllm-project/vllm#28399 3. graph mode is broken by vllm-project/vllm#25110 we'll upgrade torch to 2.8 to fix the problem later 4. embedding is broken by vllm-project/vllm#27583 5. `get_attn_backend_cls` and attention backend is broken are broken by vllm-project/vllm#28534 6. spec decode is broken by vllm-project/vllm#28771 7. sp feature is broken by vllm-project/vllm#27126 8. mtp is broken by vllm-project/vllm#27922 9. lora is broken by vllm-project/vllm#21068 10. execute_model is broken by vllm-project/vllm#26866 11. `VLLM_DISABLE_SHARED_EXPERTS_STREAM` env is broken by vllm-project/vllm#28159 12. kv cahe is broken by vllm-project/vllm#27753 13. dp is broken by vllm-project/vllm#25110 What's broken and changed by ourself: 1. qwen vl is broken by vllm-project/vllm#28455 We'll remove model files in the future to avoid this kind of error 2. Engine core is broken by vllm-project/vllm#23691 We'll remove the patch file in the future. 3. Ascend scheduler is broken by vllm-project/vllm#28733 We'll remove ascend scheudler later. 4. qwen3-next is broken by vllm-project/vllm#28083 We'll remove model files in the future to avoid this kind of error 5. qwen vl is broken by vllm-project/vllm#27764. We'll remove model files in the future Known issue: 1. ray doesn't work 2. the accuracy of qwen3-next is not correct 3. qwen3-vl is broken 4. prefix cache+ ascend scheduler + deepseek v2 lite is broken. Co-authored-by: MengqingCao <[email protected]> Co-authored-by: hfadzxy <[email protected]> Co-authored-by: leo-pony <[email protected]> Co-authored-by: 22dimensions <[email protected]> Co-authored-by: shen-shanshan <[email protected]> - vLLM version: v0.11.2 --------- Signed-off-by: wangxiyuan <[email protected]> Signed-off-by: MengqingCao <[email protected]> Signed-off-by: hfadzxy <[email protected]> Signed-off-by: leo-pony <[email protected]> Co-authored-by: MengqingCao <[email protected]> Co-authored-by: hfadzxy <[email protected]> Co-authored-by: leo-pony <[email protected]> Signed-off-by: Kurumi5210 <[email protected]>

…vllm-project#27126) Signed-off-by: angelayi <[email protected]> Signed-off-by: Luka Govedič <[email protected]> Signed-off-by: ProExpertProg <[email protected]> Co-authored-by: Luka Govedič <[email protected]> Co-authored-by: Luka Govedič <[email protected]> Co-authored-by: Luka Govedič <[email protected]>

Bump vLLM version to v0.11.2 What's broken and changed by vLLM: 1. structured_output is broken by vllm-project/vllm#26866 2. get_mrope_input_positions is broken by vllm-project/vllm#28399 3. graph mode is broken by vllm-project/vllm#25110 we'll upgrade torch to 2.8 to fix the problem later 4. embedding is broken by vllm-project/vllm#27583 5. `get_attn_backend_cls` and attention backend is broken are broken by vllm-project/vllm#28534 6. spec decode is broken by vllm-project/vllm#28771 7. sp feature is broken by vllm-project/vllm#27126 8. mtp is broken by vllm-project/vllm#27922 9. lora is broken by vllm-project/vllm#21068 10. execute_model is broken by vllm-project/vllm#26866 11. `VLLM_DISABLE_SHARED_EXPERTS_STREAM` env is broken by vllm-project/vllm#28159 12. kv cahe is broken by vllm-project/vllm#27753 13. dp is broken by vllm-project/vllm#25110 What's broken and changed by ourself: 1. qwen vl is broken by vllm-project/vllm#28455 We'll remove model files in the future to avoid this kind of error 2. Engine core is broken by vllm-project/vllm#23691 We'll remove the patch file in the future. 3. Ascend scheduler is broken by vllm-project/vllm#28733 We'll remove ascend scheudler later. 4. qwen3-next is broken by vllm-project/vllm#28083 We'll remove model files in the future to avoid this kind of error 5. qwen vl is broken by vllm-project/vllm#27764. We'll remove model files in the future Known issue: 1. ray doesn't work 2. the accuracy of qwen3-next is not correct 3. qwen3-vl is broken 4. prefix cache+ ascend scheduler + deepseek v2 lite is broken. Co-authored-by: MengqingCao <[email protected]> Co-authored-by: hfadzxy <[email protected]> Co-authored-by: leo-pony <[email protected]> Co-authored-by: 22dimensions <[email protected]> Co-authored-by: shen-shanshan <[email protected]> - vLLM version: v0.11.2 --------- Signed-off-by: wangxiyuan <[email protected]> Signed-off-by: MengqingCao <[email protected]> Signed-off-by: hfadzxy <[email protected]> Signed-off-by: leo-pony <[email protected]> Co-authored-by: MengqingCao <[email protected]> Co-authored-by: hfadzxy <[email protected]> Co-authored-by: leo-pony <[email protected]>

…vllm-project#27126) Signed-off-by: angelayi <[email protected]> Signed-off-by: Luka Govedič <[email protected]> Signed-off-by: ProExpertProg <[email protected]> Co-authored-by: Luka Govedič <[email protected]> Co-authored-by: Luka Govedič <[email protected]> Co-authored-by: Luka Govedič <[email protected]>

angelayi force-pushed the sp_custom_op branch from c1efc65 to ed10d76 Compare October 17, 2025 21:15

angelayi marked this pull request as ready for review October 18, 2025 00:33

angelayi requested review from ProExpertProg, WoosukKwon, hmellor, houseroad, mgoin, robertgshaw2-redhat, simon-mo, tlrmchlsmth, yewentao256, youkaichao and zou3519 as code owners October 18, 2025 00:33

ProExpertProg reviewed Oct 18, 2025

View reviewed changes

ProExpertProg added this to the vllm==v0.12.0/torch==2.9.0 compilation improvements milestone Oct 19, 2025

ProExpertProg added the torch.compile label Oct 19, 2025

github-project-automation bot added this to torch.compile integration Oct 19, 2025

github-project-automation bot moved this to To triage in torch.compile integration Oct 19, 2025

angelayi force-pushed the sp_custom_op branch from ed10d76 to 5d66118 Compare October 19, 2025 17:58

angelayi requested a review from ProExpertProg October 20, 2025 17:15

ProExpertProg approved these changes Oct 20, 2025

View reviewed changes

ProExpertProg added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 22, 2025

ProExpertProg enabled auto-merge (squash) October 23, 2025 06:26

ProExpertProg approved these changes Oct 28, 2025

View reviewed changes

tests/compile/test_fusions_e2e.py Outdated Show resolved Hide resolved

tests/compile/test_fusions_e2e.py Outdated Show resolved Hide resolved

tests/compile/test_fusions_e2e.py Outdated Show resolved Hide resolved

ProExpertProg disabled auto-merge October 29, 2025 00:05

ProExpertProg mentioned this pull request Oct 29, 2025

[Bug]: Inductor fails to fuse pointwise ops with sequence parallelism + async TP #27699

Open

1 task

ProExpertProg approved these changes Nov 13, 2025

View reviewed changes

ProExpertProg mentioned this pull request Nov 13, 2025

[Bug]: FlashInfer attention backend on Hopper fails with llama4-scout and llama3 with fp8 kvcache #28568

Open

1 task

ProExpertProg disabled auto-merge November 13, 2025 03:56

ZJY0516 reviewed Nov 14, 2025

View reviewed changes

tests/compile/test_sequence_parallelism.py Outdated Show resolved Hide resolved

tests/compile/test_sequence_parallelism.py Outdated Show resolved Hide resolved

Skip llama-4 in distributed tests

6f6d0f5

Signed-off-by: angelayi <[email protected]>

ProExpertProg enabled auto-merge (squash) November 14, 2025 20:49

ProExpertProg approved these changes Nov 15, 2025

View reviewed changes

tests/compile/test_fusions_e2e.py Outdated Show resolved Hide resolved

fix SP count for qwen

9bb0755

Signed-off-by: Luka Govedič <[email protected]>

ProExpertProg approved these changes Nov 15, 2025

View reviewed changes

tests/compile/test_fusions_e2e.py Outdated Show resolved Hide resolved

tests/compile/test_fusions_e2e.py Outdated Show resolved Hide resolved

ProExpertProg added 2 commits November 14, 2025 22:39

change comment

c76f6d6

Signed-off-by: Luka Govedič <[email protected]>

remove unnecessary todo

b02f8e6

Signed-off-by: Luka Govedič <[email protected]>

ProExpertProg merged commit f36292d into vllm-project:main Nov 15, 2025
50 checks passed

github-project-automation bot moved this from In review to Done in torch.compile integration Nov 15, 2025

njhill mentioned this pull request Nov 15, 2025

[CI] Fix broken pipeline #28781

Merged

khluu modified the milestones: vllm==v0.12.0/torch==2.9.0 compilation improvements, v0.11.1 Nov 15, 2025

jikunshang mentioned this pull request Nov 17, 2025

[XPU] work around for sp, avoid custom op import error #28822

Merged

5 tasks

leo-pony mentioned this pull request Nov 17, 2025

[Bug]: AttributeError: '_OpNamespace' '_C' object has no attribute vllm-project/vllm-ascend#4229

Open

leo-pony mentioned this pull request Nov 18, 2025

[Ascend]: Fixed the issue where OOT Platform vllm-ascend could not enable SP in Eager mode #28935

Merged

5 tasks

wangxiyuan mentioned this pull request Nov 25, 2025

upgrade to vllm 0.11.2 vllm-project/vllm-ascend#4400

Merged

Uh oh!

[compile] Enable sequence parallelism matching w/o custom ops enabled #27126

[compile] Enable sequence parallelism matching w/o custom ops enabled #27126

Uh oh!

Conversation

angelayi commented Oct 17, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Performance numbers

Uh oh!

ProExpertProg left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ProExpertProg Oct 18, 2025

Choose a reason for hiding this comment

Uh oh!

angelayi Oct 19, 2025

Choose a reason for hiding this comment

Uh oh!

ProExpertProg left a comment

Choose a reason for hiding this comment

Uh oh!

ProExpertProg commented Oct 20, 2025

Uh oh!

cascade812 commented Oct 20, 2025

Uh oh!

cascade812 commented Oct 26, 2025

Uh oh!

cascade812 commented Oct 26, 2025

Uh oh!

ProExpertProg left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ZJY0516 Nov 14, 2025

Choose a reason for hiding this comment

Uh oh!

ZJY0516 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

angelayi commented Oct 17, 2025 •

edited by github-actions bot

Loading