[Feature] Prefill Context Parallel (PCP) basic support #28718

pisceskkk · 2025-11-14T09:02:56Z

Purpose

This PR, splited from full PR #26864, adds the basic supports for the Prefill Context Parallelism (PCP) feature, which corresponds to DCP. For specific implementation details, please refer to the RFC #25749.

TL;DR: PCP enhances long-sequence inference capabilities by partitioning the sequence dimension during the prefill stage.

The current implementation primarily includes the following changes:

Modified files such as block_tables.py to extend the KV cache storage based on DCP & PCP;
Added a communication group pcp_group for PCP;
Added necessary command-line arguments to control parallelism for PCP. Temporarily disabled PCP parameters until backend support is complete, then re-enable;
Added PCP-related parameters to the attention backend prototype class;

CC @LookAround0301 @FENP @LucasWilkinson

chatgpt-codex-connector · 2025-11-14T09:06:24Z

💡 Codex Review

vllm/vllm/engine/arg_utils.py

Lines 762 to 772 in 14870a7

    
           parallel_group.add_argument( 
        
               "--data-parallel-size", "-dp", **parallel_kwargs["data_parallel_size"] 
        
           ) 
        
           parallel_group.add_argument( 
        
               "--prefill-context-parallel-size", 
        
               "-pcp", 
        
               **parallel_kwargs["prefill_context_parallel_size"], 
        
           ) 
        
           parallel_group.add_argument( 
        
               "--data-parallel-size", "-dp", **parallel_kwargs["data_parallel_size"] 
        
           )

Avoid duplicate --data-parallel-size argument registration

The CLI now calls parallel_group.add_argument("--data-parallel-size", …) twice in a row. argparse rejects duplicate option strings, so EngineArgs.add_cli_args() will raise ArgumentError: conflicting option string(s): --data-parallel-size before any command line can be parsed. This prevents vLLM from starting at all. One of the two registrations should be removed or renamed.

vllm/vllm/v1/worker/gpu_worker.py

Lines 726 to 737 in 14870a7

    
           def update_moe_modules(moe_modules: list[FusedMoE], num_local_experts: int): 
        
               assert all( 
        
                   module.moe_config.num_local_experts == num_local_experts 
        
                   for module in moe_modules 
        
               ), "All MoE modules must have the same number of experts" 
        
               for module in moe_modules: 
        
                   module.moe_config.num_experts = num_local_experts * new_ep_size 
        
                   module.global_num_experts = module.moe_config.num_experts 
        
                   module.moe_parallel_config = FusedMoEParallelConfig.make( 
        
                       tp_size_=get_tp_group().world_size, 
        
                       dp_size_=get_dp_group().world_size, 
        
                       vllm_parallel_config=parallel_config,

Pass new PCP argument to FusedMoEParallelConfig.make

FusedMoEParallelConfig.make now requires a pcp_size_ positional argument, but the call in update_moe_modules still passes only tp_size_ and dp_size_. Any MoE model will hit this code path and raise TypeError: make() missing 1 required positional argument: 'pcp_size_' when the worker adjusts MoE modules. Update the invocation to include the prefill-context size (or provide a default) so MoE models can initialize.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

gemini-code-assist

Code Review

This pull request introduces basic support for Prefill Context Parallelism (PCP), aligning with the RFC. The changes are consistently applied across the codebase, including updates to parallel configuration, KV cache management, and attention backends. Renaming dcp_kv_cache_interleave_size to cp_kv_cache_interleave_size generalizes the context parallelism KV cache interleaving logic to support both decode and prefill context parallelism. Compatibility checks are in place to temporarily disable PCP for certain features like full CUDA graphs and hybrid attention, indicating a phased rollout of full support. The integration appears thorough and well-considered for the initial support phase.

vllm/engine/arg_utils.py

luccafong · 2025-11-14T18:13:16Z

thanks for the PR, can you share a few combinations of PCP x DCP x TP in your summary?

LucasWilkinson

Overall looks pretty good to me; running the CI, left a couple of final nits/comments

vllm/v1/core/kv_cache_manager.py

vllm/v1/core/kv_cache_utils.py

vllm/model_executor/layers/fused_moe/config.py

Signed-off-by: QiuChunshuo <[email protected]>

Signed-off-by: Jingchun Gao <[email protected]>

LucasWilkinson

LGTM

…28718) Signed-off-by: QiuChunshuo <[email protected]> Signed-off-by: FENP <[email protected]> Signed-off-by: LookAround <[email protected]> Signed-off-by: Jingchun Gao <[email protected]> Signed-off-by: zhenwenqi2024 <[email protected]> Co-authored-by: FENP <[email protected]> Co-authored-by: LookAround <[email protected]> Co-authored-by: Jingchun Gao <[email protected]> Co-authored-by: zhenwenqi2024 <[email protected]> Co-authored-by: Jingchun Gao <[email protected]>

…28718) Signed-off-by: QiuChunshuo <[email protected]> Signed-off-by: FENP <[email protected]> Signed-off-by: LookAround <[email protected]> Signed-off-by: Jingchun Gao <[email protected]> Signed-off-by: zhenwenqi2024 <[email protected]> Co-authored-by: FENP <[email protected]> Co-authored-by: LookAround <[email protected]> Co-authored-by: Jingchun Gao <[email protected]> Co-authored-by: zhenwenqi2024 <[email protected]> Co-authored-by: Jingchun Gao <[email protected]> Signed-off-by: LuminolT <[email protected]>

…28718) Signed-off-by: QiuChunshuo <[email protected]> Signed-off-by: FENP <[email protected]> Signed-off-by: LookAround <[email protected]> Signed-off-by: Jingchun Gao <[email protected]> Signed-off-by: zhenwenqi2024 <[email protected]> Co-authored-by: FENP <[email protected]> Co-authored-by: LookAround <[email protected]> Co-authored-by: Jingchun Gao <[email protected]> Co-authored-by: zhenwenqi2024 <[email protected]> Co-authored-by: Jingchun Gao <[email protected]> (cherry picked from commit 2fd893b)

Signed-off-by: QiuChunshuo <[email protected]> Signed-off-by: FENP <[email protected]> Signed-off-by: LookAround <[email protected]> Signed-off-by: Jingchun Gao <[email protected]> Signed-off-by: zhenwenqi2024 <[email protected]> Co-authored-by: FENP <[email protected]> Co-authored-by: LookAround <[email protected]> Co-authored-by: Jingchun Gao <[email protected]> Co-authored-by: zhenwenqi2024 <[email protected]> Co-authored-by: Jingchun Gao <[email protected]> Signed-off-by: jiang1.li <[email protected]>

…28718) Signed-off-by: QiuChunshuo <[email protected]> Signed-off-by: FENP <[email protected]> Signed-off-by: LookAround <[email protected]> Signed-off-by: Jingchun Gao <[email protected]> Signed-off-by: zhenwenqi2024 <[email protected]> Co-authored-by: FENP <[email protected]> Co-authored-by: LookAround <[email protected]> Co-authored-by: Jingchun Gao <[email protected]> Co-authored-by: zhenwenqi2024 <[email protected]> Co-authored-by: Jingchun Gao <[email protected]>

…28718) Signed-off-by: QiuChunshuo <[email protected]> Signed-off-by: FENP <[email protected]> Signed-off-by: LookAround <[email protected]> Signed-off-by: Jingchun Gao <[email protected]> Signed-off-by: zhenwenqi2024 <[email protected]> Co-authored-by: FENP <[email protected]> Co-authored-by: LookAround <[email protected]> Co-authored-by: Jingchun Gao <[email protected]> Co-authored-by: zhenwenqi2024 <[email protected]> Co-authored-by: Jingchun Gao <[email protected]> (cherry picked from commit 2fd893b)

…28718) Signed-off-by: QiuChunshuo <[email protected]> Signed-off-by: FENP <[email protected]> Signed-off-by: LookAround <[email protected]> Signed-off-by: Jingchun Gao <[email protected]> Signed-off-by: zhenwenqi2024 <[email protected]> Co-authored-by: FENP <[email protected]> Co-authored-by: LookAround <[email protected]> Co-authored-by: Jingchun Gao <[email protected]> Co-authored-by: zhenwenqi2024 <[email protected]> Co-authored-by: Jingchun Gao <[email protected]>

1. fix vllm-project/vllm#28542 The model structure modifications we involved in are: - Qwen2.5-VL(still exist some patch) - Qwen2-VL - Qwen2 - DeepSeek series - Qwen-moe series 2. fix vllm-project/vllm#29121 the output token now type changed from np to `list[list[int]]` 3. fix vllm-project/vllm#29262 `xformers` backend for multimodal now has been deprecated 4. fix vllm-project/vllm#29342 5. fix vllm-project/vllm#28579 6. fix vllm-project/vllm#28718 7. fix vllm-project/vllm#28665 8. fix vllm-project/vllm#26847 vllm introduced the `optimization-level`, some default config has been changed, and the param `--enforce-eager` has been deprecated 9. fix http://github.com/vllm-project/vllm/pull/29223 it retuns tuple for sampler. 10. fix vllm-project/vllm#29471 we'll remove the related patch to avoid this kind of error. Co-authored-by: hfadzxy <[email protected]> Co-authored-by: wangli <[email protected]> - vLLM version: v0.11.2 --------- Signed-off-by: wangxiyuan <[email protected]> Signed-off-by: wangli <[email protected]> Signed-off-by: hfadzxy <[email protected]> Co-authored-by: wangli <[email protected]> Co-authored-by: hfadzxy <[email protected]>

pisceskkk requested review from ApostaC, LucasWilkinson, ProExpertProg, WoosukKwon, alexm-redhat, heheda12345, hmellor, houseroad, mgoin, njhill, pavanimajety, robertgshaw2-redhat, tlrmchlsmth, yewentao256, youkaichao, ywang96 and zhuohan123 as code owners November 14, 2025 09:02

mergify bot added the v1 label Nov 14, 2025

pisceskkk force-pushed the pcp_base branch from 14870a7 to ffdccd5 Compare November 14, 2025 09:06

gemini-code-assist bot reviewed Nov 14, 2025

View reviewed changes

pisceskkk force-pushed the pcp_base branch 5 times, most recently from 4254801 to 00678e8 Compare November 14, 2025 10:30

pisceskkk mentioned this pull request Nov 14, 2025

[Feature] Support Prefill Context Parallel (PCP) for GQA flashinfer #28723

Open

pisceskkk force-pushed the pcp_base branch from 00678e8 to 3cbf550 Compare November 14, 2025 10:40

luccafong reviewed Nov 14, 2025

View reviewed changes

vllm/engine/arg_utils.py Show resolved Hide resolved

FENP mentioned this pull request Nov 19, 2025

[Feature][Attention][PCP] Support PCP (Prefill Context Parallel) with MLA #28988

Open

18 tasks

LucasWilkinson reviewed Nov 19, 2025

View reviewed changes

vllm/v1/core/kv_cache_manager.py Show resolved Hide resolved

vllm/v1/core/kv_cache_utils.py Outdated Show resolved Hide resolved

vllm/model_executor/layers/fused_moe/config.py Outdated Show resolved Hide resolved

LucasWilkinson added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 19, 2025

mergify bot added the gpt-oss Related to GPT-OSS models label Nov 19, 2025

github-project-automation bot added this to gpt-oss Issues & Enhancements Nov 19, 2025

github-project-automation bot moved this to To Triage in gpt-oss Issues & Enhancements Nov 19, 2025

[format] remove unused comment and func

217ce8d

Signed-off-by: QiuChunshuo <[email protected]>

pisceskkk force-pushed the pcp_base branch from e56d3d9 to 217ce8d Compare November 19, 2025 06:18

gjc0824 force-pushed the pcp_base branch from 5242996 to b513397 Compare November 19, 2025 09:01

fix ci

26733bd

Signed-off-by: Jingchun Gao <[email protected]>

gjc0824 force-pushed the pcp_base branch from b513397 to 26733bd Compare November 19, 2025 09:12

Merge branch 'main' into pcp_base

a78bb2a

LucasWilkinson approved these changes Nov 19, 2025

View reviewed changes

github-project-automation bot moved this from To Triage to Ready in gpt-oss Issues & Enhancements Nov 19, 2025

LucasWilkinson merged commit 2fd893b into vllm-project:main Nov 19, 2025
63 checks passed

github-project-automation bot moved this from Ready to Done in gpt-oss Issues & Enhancements Nov 19, 2025

Livinfly mentioned this pull request Nov 20, 2025

[Bugfix] Fix block size in block_table with PCP #29094

Merged

5 tasks

LucasWilkinson mentioned this pull request Nov 24, 2025

[CI/Test Fix] Fix CP tests on Blackwell #29338

Merged

Potabk mentioned this pull request Dec 1, 2025

[Main] Upgrade vllm commit to 2025_12_01 vllm-project/vllm-ascend#4527

Closed

wangxiyuan mentioned this pull request Dec 1, 2025

upgrade vLLM to main vllm-project/vllm-ascend#4608

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Feature] Prefill Context Parallel (PCP) basic support #28718

[Feature] Prefill Context Parallel (PCP) basic support #28718

Uh oh!

pisceskkk commented Nov 14, 2025 •

edited by github-actions bot

Loading

Uh oh!

chatgpt-codex-connector bot commented Nov 14, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

luccafong commented Nov 14, 2025

Uh oh!

LucasWilkinson left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

LucasWilkinson left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Uh oh!

[Feature] Prefill Context Parallel (PCP) basic support #28718

[Feature] Prefill Context Parallel (PCP) basic support #28718

Uh oh!

Conversation

pisceskkk commented Nov 14, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Uh oh!

chatgpt-codex-connector bot commented Nov 14, 2025

💡 Codex Review

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

luccafong commented Nov 14, 2025

Uh oh!

LucasWilkinson left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

LucasWilkinson left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

pisceskkk commented Nov 14, 2025 •

edited by github-actions bot

Loading