Skip to content

Conversation

@pisceskkk
Copy link
Contributor

@pisceskkk pisceskkk commented Nov 14, 2025

Purpose

This PR, splited from full PR #26864, adds the basic supports for the Prefill Context Parallelism (PCP) feature, which corresponds to DCP. For specific implementation details, please refer to the RFC #25749.

TL;DR: PCP enhances long-sequence inference capabilities by partitioning the sequence dimension during the prefill stage.

The current implementation primarily includes the following changes:

  • Modified files such as block_tables.py to extend the KV cache storage based on DCP & PCP;
  • Added a communication group pcp_group for PCP;
  • Added necessary command-line arguments to control parallelism for PCP. Temporarily disabled PCP parameters until backend support is complete, then re-enable;
  • Added PCP-related parameters to the attention backend prototype class;

CC @LookAround0301 @FENP @LucasWilkinson

@chatgpt-codex-connector
Copy link

💡 Codex Review

parallel_group.add_argument(
"--data-parallel-size", "-dp", **parallel_kwargs["data_parallel_size"]
)
parallel_group.add_argument(
"--prefill-context-parallel-size",
"-pcp",
**parallel_kwargs["prefill_context_parallel_size"],
)
parallel_group.add_argument(
"--data-parallel-size", "-dp", **parallel_kwargs["data_parallel_size"]
)

P0 Badge Avoid duplicate --data-parallel-size argument registration

The CLI now calls parallel_group.add_argument("--data-parallel-size", …) twice in a row. argparse rejects duplicate option strings, so EngineArgs.add_cli_args() will raise ArgumentError: conflicting option string(s): --data-parallel-size before any command line can be parsed. This prevents vLLM from starting at all. One of the two registrations should be removed or renamed.


def update_moe_modules(moe_modules: list[FusedMoE], num_local_experts: int):
assert all(
module.moe_config.num_local_experts == num_local_experts
for module in moe_modules
), "All MoE modules must have the same number of experts"
for module in moe_modules:
module.moe_config.num_experts = num_local_experts * new_ep_size
module.global_num_experts = module.moe_config.num_experts
module.moe_parallel_config = FusedMoEParallelConfig.make(
tp_size_=get_tp_group().world_size,
dp_size_=get_dp_group().world_size,
vllm_parallel_config=parallel_config,

P1 Badge Pass new PCP argument to FusedMoEParallelConfig.make

FusedMoEParallelConfig.make now requires a pcp_size_ positional argument, but the call in update_moe_modules still passes only tp_size_ and dp_size_. Any MoE model will hit this code path and raise TypeError: make() missing 1 required positional argument: 'pcp_size_' when the worker adjusts MoE modules. Update the invocation to include the prefill-context size (or provide a default) so MoE models can initialize.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces basic support for Prefill Context Parallelism (PCP), aligning with the RFC. The changes are consistently applied across the codebase, including updates to parallel configuration, KV cache management, and attention backends. Renaming dcp_kv_cache_interleave_size to cp_kv_cache_interleave_size generalizes the context parallelism KV cache interleaving logic to support both decode and prefill context parallelism. Compatibility checks are in place to temporarily disable PCP for certain features like full CUDA graphs and hybrid attention, indicating a phased rollout of full support. The integration appears thorough and well-considered for the initial support phase.

@luccafong
Copy link
Collaborator

thanks for the PR, can you share a few combinations of PCP x DCP x TP in your summary?

Copy link
Collaborator

@LucasWilkinson LucasWilkinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks pretty good to me; running the CI, left a couple of final nits/comments

@LucasWilkinson LucasWilkinson added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 19, 2025
@mergify mergify bot added the gpt-oss Related to GPT-OSS models label Nov 19, 2025
Signed-off-by: Jingchun Gao <[email protected]>
Copy link
Collaborator

@LucasWilkinson LucasWilkinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-project-automation github-project-automation bot moved this from To Triage to Ready in gpt-oss Issues & Enhancements Nov 19, 2025
@LucasWilkinson LucasWilkinson merged commit 2fd893b into vllm-project:main Nov 19, 2025
63 checks passed
Victor49152 pushed a commit to Victor49152/vllm that referenced this pull request Nov 20, 2025
…28718)

Signed-off-by: QiuChunshuo <[email protected]>
Signed-off-by: FENP <[email protected]>
Signed-off-by: LookAround <[email protected]>
Signed-off-by: Jingchun Gao <[email protected]>
Signed-off-by: zhenwenqi2024 <[email protected]>
Co-authored-by: FENP <[email protected]>
Co-authored-by: LookAround <[email protected]>
Co-authored-by: Jingchun Gao <[email protected]>
Co-authored-by: zhenwenqi2024 <[email protected]>
Co-authored-by: Jingchun Gao <[email protected]>
LuminolT pushed a commit to LuminolT/vllm that referenced this pull request Nov 21, 2025
…28718)

Signed-off-by: QiuChunshuo <[email protected]>
Signed-off-by: FENP <[email protected]>
Signed-off-by: LookAround <[email protected]>
Signed-off-by: Jingchun Gao <[email protected]>
Signed-off-by: zhenwenqi2024 <[email protected]>
Co-authored-by: FENP <[email protected]>
Co-authored-by: LookAround <[email protected]>
Co-authored-by: Jingchun Gao <[email protected]>
Co-authored-by: zhenwenqi2024 <[email protected]>
Co-authored-by: Jingchun Gao <[email protected]>
Signed-off-by: LuminolT <[email protected]>
LookAround0301 pushed a commit to LookAround0301/vllm that referenced this pull request Nov 25, 2025
…28718)

Signed-off-by: QiuChunshuo <[email protected]>
Signed-off-by: FENP <[email protected]>
Signed-off-by: LookAround <[email protected]>
Signed-off-by: Jingchun Gao <[email protected]>
Signed-off-by: zhenwenqi2024 <[email protected]>
Co-authored-by: FENP <[email protected]>
Co-authored-by: LookAround <[email protected]>
Co-authored-by: Jingchun Gao <[email protected]>
Co-authored-by: zhenwenqi2024 <[email protected]>
Co-authored-by: Jingchun Gao <[email protected]>

(cherry picked from commit 2fd893b)
bigPYJ1151 pushed a commit that referenced this pull request Nov 25, 2025
Signed-off-by: QiuChunshuo <[email protected]>
Signed-off-by: FENP <[email protected]>
Signed-off-by: LookAround <[email protected]>
Signed-off-by: Jingchun Gao <[email protected]>
Signed-off-by: zhenwenqi2024 <[email protected]>
Co-authored-by: FENP <[email protected]>
Co-authored-by: LookAround <[email protected]>
Co-authored-by: Jingchun Gao <[email protected]>
Co-authored-by: zhenwenqi2024 <[email protected]>
Co-authored-by: Jingchun Gao <[email protected]>
Signed-off-by: jiang1.li <[email protected]>
bringlein pushed a commit to bringlein/vllm that referenced this pull request Nov 26, 2025
…28718)

Signed-off-by: QiuChunshuo <[email protected]>
Signed-off-by: FENP <[email protected]>
Signed-off-by: LookAround <[email protected]>
Signed-off-by: Jingchun Gao <[email protected]>
Signed-off-by: zhenwenqi2024 <[email protected]>
Co-authored-by: FENP <[email protected]>
Co-authored-by: LookAround <[email protected]>
Co-authored-by: Jingchun Gao <[email protected]>
Co-authored-by: zhenwenqi2024 <[email protected]>
Co-authored-by: Jingchun Gao <[email protected]>
LookAround0301 added a commit to LookAround0301/vllm that referenced this pull request Nov 28, 2025
…28718)

Signed-off-by: QiuChunshuo <[email protected]>
Signed-off-by: FENP <[email protected]>
Signed-off-by: LookAround <[email protected]>
Signed-off-by: Jingchun Gao <[email protected]>
Signed-off-by: zhenwenqi2024 <[email protected]>
Co-authored-by: FENP <[email protected]>
Co-authored-by: LookAround <[email protected]>
Co-authored-by: Jingchun Gao <[email protected]>
Co-authored-by: zhenwenqi2024 <[email protected]>
Co-authored-by: Jingchun Gao <[email protected]>
(cherry picked from commit 2fd893b)
devpatelio pushed a commit to SumanthRH/vllm that referenced this pull request Nov 29, 2025
…28718)

Signed-off-by: QiuChunshuo <[email protected]>
Signed-off-by: FENP <[email protected]>
Signed-off-by: LookAround <[email protected]>
Signed-off-by: Jingchun Gao <[email protected]>
Signed-off-by: zhenwenqi2024 <[email protected]>
Co-authored-by: FENP <[email protected]>
Co-authored-by: LookAround <[email protected]>
Co-authored-by: Jingchun Gao <[email protected]>
Co-authored-by: zhenwenqi2024 <[email protected]>
Co-authored-by: Jingchun Gao <[email protected]>
kitaekatt pushed a commit to kitaekatt/vllm that referenced this pull request Dec 1, 2025
…28718)

Signed-off-by: QiuChunshuo <[email protected]>
Signed-off-by: FENP <[email protected]>
Signed-off-by: LookAround <[email protected]>
Signed-off-by: Jingchun Gao <[email protected]>
Signed-off-by: zhenwenqi2024 <[email protected]>
Co-authored-by: FENP <[email protected]>
Co-authored-by: LookAround <[email protected]>
Co-authored-by: Jingchun Gao <[email protected]>
Co-authored-by: zhenwenqi2024 <[email protected]>
Co-authored-by: Jingchun Gao <[email protected]>
wangxiyuan added a commit to vllm-project/vllm-ascend that referenced this pull request Dec 2, 2025
1. fix vllm-project/vllm#28542
The model structure modifications we involved in are:
     - Qwen2.5-VL(still exist some patch)
     - Qwen2-VL
     - Qwen2
     - DeepSeek series
     - Qwen-moe series
2. fix vllm-project/vllm#29121
   the output token now  type changed from np to `list[list[int]]`

3. fix vllm-project/vllm#29262
    `xformers` backend for multimodal now has been deprecated
4. fix vllm-project/vllm#29342

5. fix vllm-project/vllm#28579
6. fix vllm-project/vllm#28718
7. fix vllm-project/vllm#28665
8. fix vllm-project/vllm#26847
vllm introduced the `optimization-level`, some default config has been
changed, and the param `--enforce-eager` has been deprecated
9. fix http://github.com/vllm-project/vllm/pull/29223 it retuns tuple
for sampler.
10. fix vllm-project/vllm#29471 we'll remove the
related patch to avoid this kind of error.

Co-authored-by: hfadzxy <[email protected]>
Co-authored-by: wangli <[email protected]>


- vLLM version: v0.11.2

---------

Signed-off-by: wangxiyuan <[email protected]>
Signed-off-by: wangli <[email protected]>
Signed-off-by: hfadzxy <[email protected]>
Co-authored-by: wangli <[email protected]>
Co-authored-by: hfadzxy <[email protected]>
ChenCangtao pushed a commit to ChenCangtao/vllm-ascend that referenced this pull request Dec 3, 2025
1. fix vllm-project/vllm#28542
The model structure modifications we involved in are:
     - Qwen2.5-VL(still exist some patch)
     - Qwen2-VL
     - Qwen2
     - DeepSeek series
     - Qwen-moe series
2. fix vllm-project/vllm#29121
   the output token now  type changed from np to `list[list[int]]`

3. fix vllm-project/vllm#29262
    `xformers` backend for multimodal now has been deprecated
4. fix vllm-project/vllm#29342

5. fix vllm-project/vllm#28579
6. fix vllm-project/vllm#28718
7. fix vllm-project/vllm#28665
8. fix vllm-project/vllm#26847
vllm introduced the `optimization-level`, some default config has been
changed, and the param `--enforce-eager` has been deprecated
9. fix http://github.com/vllm-project/vllm/pull/29223 it retuns tuple
for sampler.
10. fix vllm-project/vllm#29471 we'll remove the
related patch to avoid this kind of error.

Co-authored-by: hfadzxy <[email protected]>
Co-authored-by: wangli <[email protected]>


- vLLM version: v0.11.2

---------

Signed-off-by: wangxiyuan <[email protected]>
Signed-off-by: wangli <[email protected]>
Signed-off-by: hfadzxy <[email protected]>
Co-authored-by: wangli <[email protected]>
Co-authored-by: hfadzxy <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

gpt-oss Related to GPT-OSS models ready ONLY add when PR is ready to merge/full CI is needed v1

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

5 participants