[CI] Refactor test_sequence_parallel.py and add a warmup run for more accurate performance stat#1165

Merged

hsliuustc0106 merged 101 commits intovllm-project:mainfrom

mxuax:Non-Intrusive-SP

Feb 3, 2026

Contributor

mxuax commented Feb 3, 2026 •

edited

Loading

Purpose

This PR refactors test_sequence_parallel.py, and adds a warmup run for more accurate performance stat to fix the issue6 mentioned in #1143. The redundant tests are also removed to speedup the test.

Issue 6: Test Validity Anomaly (Ring vs. Ulysses)
Evidence: In 4-GPU Sequence Parallel tests:
Ulysses: ~29.0s execution.
Ring/Hybrid: ~0.3s execution.
The reason is the time consumption during NCCL initialization. By adding a warm-up run, all those can be eliminated.

Test Plan

pytest tests/e2e/offline_inference/test_sequence_parallel.py -v -s

Test Result

Hardware: H800
Time duration 388.15s (0:06:28)

======================================================================
SUMMARY
======================================================================
Mode            GPUs   Size       Baseline     SP           Speedup    Status
----------------------------------------------------------------------
ulysses-2       2      256x256   95ms         110ms        0.86x      PASS
ring-2          2      256x256   95ms         168ms        0.57x      PASS
hybrid-2x2      4      256x256   N/A          314ms        N/A        PASS
ulysses-4       4      272x272   N/A          9168ms        N/A        PASS
======================================================================

NOTE: The performance downgrade is because we use a very small image size(256*256), and the generation bottleneck is communication-bound instead of computational-bound.

Essential Elements of an Effective PR Description Checklist

[ - ] The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
[ - ] The test plan, such as providing test command.
[ - ] The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

mxuax and others added 30 commits

January 14, 2026 15:06


          cp plan framework in vllm-omni

e3bf6e8

Signed-off-by: mxuax <mxuax@connect.ust.hk>


          add partial context split hook

e02e841

Signed-off-by: mxuax <mxuax@connect.ust.hk>


          fix licenses

4db34f3

Signed-off-by: mxuax <mxuax@connect.ust.hk>


          add test file and modify z-image for cp_plan

6e3063c

Signed-off-by: mxuax <mxuax@connect.ust.hk>


           modify z-image cp_plan

73882ae

Signed-off-by: mxuax <mxuax@connect.ust.hk>


          enable hybrid ulysses and ring, add apply_context_paralle in registry…

a039bc8

… to support cp_plan

Signed-off-by: mxuax <mxuax@connect.ust.hk>


          modify z-image-transformer, created UnifiedPrepare to put all the pre…

1abd69a

…paration in a block

Signed-off-by: mxuax <mxuax@connect.ust.hk>


          support cp_plan for qwen-image

7f94550

Signed-off-by: mxuax <mxuax@connect.ust.hk>


          modify test

50ccdd1

Signed-off-by: mxuax <mxuax@connect.ust.hk>


          add cp_plan doc

b640762

Signed-off-by: mxuax <mxuax@connect.ust.hk>


          reduction wan

d1ede83

Removed context parallelism plan and related comments.

Signed-off-by: XU Mingshi <91017482+mxuax@users.noreply.github.com>


          reduction wan from test

a5cd982

Signed-off-by: XU Mingshi <91017482+mxuax@users.noreply.github.com>


          fix doc warning

505d774

Signed-off-by: mxuax <mxuax@connect.ust.hk>


          Merge branch 'Non-Intrusive-SP' of https://github.com/mxuax/vllm-omni…

d816ec0

…-ring-attn into workingbranch


          Delete Untitled

a52093b

Signed-off-by: XU Mingshi <91017482+mxuax@users.noreply.github.com>


          refactor context parallel to sequence parallel and add some sp_plan i…

0714a44

…llustration

Signed-off-by: mxuax <mxuax@connect.ust.hk>


          Merge branch 'Non-Intrusive-SP' of https://github.com/mxuax/vllm-omni…

b369786

…-ring-attn into workingbranch


          Add attention mask support to _sp_plan framework for variable sequenc…

722c9f6

…e lengths

- Add sp_attention_mask, sp_padding_size, sp_original_seq_len to ForwardContext
- Add auto_pad option to SequenceParallelInput
- Implement _shard_with_auto_pad in SequenceParallelSplitHook
- Update SequenceParallelGatherHook to remove padding
- Update QwenImage _sp_plan with auto_pad=True
- Update QwenImageCrossAttention to use sp_attention_mask

Signed-off-by: mxuax <mxuax@connect.ust.hk>


          fix wrongly chunck attention mask issue

ddbf89e

Signed-off-by: mxuax <mxuax@connect.ust.hk>


          fix chunck attention mask bug

9e76fc2

Signed-off-by: mxuax <mxuax@connect.ust.hk>


          Merge branch 'main' into Non-Intrusive-SP

b723476


          handle mask in attention metadata

9f760a2

Signed-off-by: mxuax <mxuax@connect.ust.hk>


          Merge branch 'Non-Intrusive-SP' of https://github.com/mxuax/vllm-omni…

d7f4157

…-ring-attn into workingbranch


          remove some declarational comments

d9aeca8

Signed-off-by: mxuax <mxuax@connect.ust.hk>


          remove some declarational comments

81516a6

Signed-off-by: mxuax <mxuax@connect.ust.hk>


          refactor the sp_plan and sp_config file, removed the training related…

9c77428

… code, add some comment

Signed-off-by: mxuax <mxuax@connect.ust.hk>


          modified the parallelism_acceleration.md to give a clearer sp_plan in…

ec37009

…struction

Signed-off-by: mxuax <mxuax@connect.ust.hk>


          add test for sequence_parallel.py

c09171b

Signed-off-by: mxuax <mxuax@connect.ust.hk>


          fix error

470b737

Signed-off-by: mxuax <mxuax@connect.ust.hk>


          fix error

66e85e8

Signed-off-by: mxuax <mxuax@connect.ust.hk>

mxuax and others added 15 commits

January 30, 2026 12:20


          Remove empty line at the beginning of ring_globals.py

4841b90

Signed-off-by: XU Mingshi <91017482+mxuax@users.noreply.github.com>


          Merge branch 'main' into Non-Intrusive-SP

5752a9a


          Merge branch 'main' into Non-Intrusive-SP

7420b52


          check test ring=2

5fe3e23

Signed-off-by: mxuax <mxuax@connect.ust.hk>


          Merge branch 'vllm-project:main' into Non-Intrusive-SP

7af5e2c


          Merge branch 'Non-Intrusive-SP' of https://github.com/mxuax/vllm-omni…

444f9de

…-ring-attn into attn_backends


          check test ring=2 abnormal performance

b8b770c

Signed-off-by: mxuax <mxuax@connect.ust.hk>


          check test ring=2 abnormal performance

4fb7dc9

Signed-off-by: mxuax <mxuax@connect.ust.hk>


          check test ring=2 abnormal performance

6488faa

Signed-off-by: mxuax <mxuax@connect.ust.hk>


          check test ring=2 abnormal performance

562f1e8

Signed-off-by: mxuax <mxuax@connect.ust.hk>


          check test ring=2 abnormal performance

8d9e501

Signed-off-by: mxuax <mxuax@connect.ust.hk>


          check test ring=2 abnormal performance

b25a4d2

Signed-off-by: mxuax <mxuax@connect.ust.hk>


          refactor test_sequence_parallel

19ea2aa

Signed-off-by: mxuax <mxuax@connect.ust.hk>


          refactor test_sequence_parallel and add a warmup run for more accurat…

bcf936f

…e performance stat

Signed-off-by: mxuax <mxuax@connect.ust.hk>


          a warmup run control for more accurate performance stat

e483ab0

Signed-off-by: mxuax <mxuax@connect.ust.hk>

mxuax requested a review from hsliuustc0106 as a code owner

February 3, 2026 02:52

chatgpt-codex-connector bot reviewed

View reviewed changes

chatgpt-codex-connector bot left a comment

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e483ab08fb

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

tests/e2e/offline_inference/test_sequence_parallel.py Show resolved Hide resolved


          Merge branch 'main' into Non-Intrusive-SP

ec15db6

Contributor Author

mxuax commented Feb 3, 2026

This PR is ready. @congw729 @wtomin @SamitHuang @ZJY0516 @hsliuustc0106

Contributor

congw729 commented Feb 3, 2026

The codes are fine. Please rebase to origin/main branch to clean the commits log.

ZJY0516 reviewed

View reviewed changes

tests/e2e/offline_inference/test_sequence_parallel.py Show resolved Hide resolved


          Update SP configurations comments

00f1979

Signed-off-by: XU Mingshi <91017482+mxuax@users.noreply.github.com>

mxuax requested a review from ZJY0516

February 3, 2026 03:43

tzhouam added the ready label

ZJY0516 approved these changes

View reviewed changes

tests/e2e/offline_inference/test_sequence_parallel.py Show resolved Hide resolved

Collaborator

hsliuustc0106 commented Feb 3, 2026

this looks much better for each single function @yenuo26 @congw729

hsliuustc0106 merged commit 9494d69 into vllm-project:main

7 checks passed

Contributor

yenuo26 commented Feb 3, 2026

this looks much better for each single function @yenuo26 @congw729

Buildkite automatically collapses logs starting with "Running". If we want to achieve this effect, we can add a fixture function in conftest.py to add "Running" information before each test function runs.

futurenitian pushed a commit to futurenitian/vllm-omni that referenced this pull request


          [CI] Refactor test_sequence_parallel.py and add a warmup run for more…

eb8b1a8

… accurate performance stat (vllm-project#1165)

Signed-off-by: mxuax <mxuax@connect.ust.hk>
Signed-off-by: XU Mingshi <91017482+mxuax@users.noreply.github.com>
Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com>
Signed-off-by: future fu <3172516720@qq.com>

futurenitian pushed a commit to futurenitian/vllm-omni that referenced this pull request


          [CI] Refactor test_sequence_parallel.py and add a warmup run for more…

89280a8

… accurate performance stat (vllm-project#1165)

Signed-off-by: mxuax <mxuax@connect.ust.hk>
Signed-off-by: XU Mingshi <91017482+mxuax@users.noreply.github.com>
Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com>
Signed-off-by: future fu <3172516720@qq.com>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready