[main] Support MTP shape with ACLgraph #4523

lilinsiman · 2025-11-28T03:07:15Z

What this PR does / why we need it?

Supports overlaying MTP with all default shapes of ACLgraph

Does this PR introduce any user-facing change?

no

How was this patch tested?

ut

vLLM version: v0.11.2
vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2

github-actions · 2025-11-28T03:07:24Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

gemini-code-assist

Code Review

This pull request adds support for MTP (Multi-Token Prediction) with ACLgraph by adjusting the graph capture sizes to be multiples of (num_speculative_tokens + 1). This is a necessary change for the FIA operator in FULL_DECODE_ONLY graph mode. The changes also remove a runtime check that is now obsolete.

The overall approach is correct, but I've found some issues in the implementation of the size adjustment logic in vllm_ascend/utils.py.

In update_default_aclgraph_sizes, the logic for num_speculative_tokens == 1 is inconsistent with the stated requirements, and a condition check for num_speculative_tokens > 1 uses an incorrect index. I've suggested a unified and corrected implementation for this block.
In update_aclgraph_sizes, there's a similar incorrect condition check.

I've provided detailed comments and suggestions to address these points. Addressing these will make the implementation more robust and correct.

gemini-code-assist · 2025-11-28T03:09:59Z

vllm_ascend/utils.py

+        if num_speculative_tokens > 1:
+            if original_sizes[0] < (num_speculative_tokens + 1) * max_num_seqs:
+                new_original_sizes = sorted(set(list(range(1, min(10, max_num_seqs + 1), 2)) + list(range(8, max_num_seqs + 1, 4))))
+                enlarged_sizes = [(num_speculative_tokens + 1) * sizes for sizes in new_original_sizes]
+                if enlarged_sizes[-1] < target_sizes:
+                    enlarged_sizes.append(target_sizes)
+                update_cudagraph_capture_sizes(vllm_config, enlarged_sizes)
+                logger.info(
+                    "Adjusted ACL full graphs: %s → %s for speculative decoding",
+                    original_sizes, enlarged_sizes)
+            else:
+                vllm_config.compilation_config.cudagraph_capture_sizes = original_sizes
+        if num_speculative_tokens == 1:
+            padding_sizes = original_sizes.copy()
+            if padding_sizes[-1] < target_sizes:
+                padding_sizes.append(target_sizes)
+                update_cudagraph_capture_sizes(vllm_config, padding_sizes)
+                logger.info(
+                    "Adjusted ACL full graphs: %s → %s for speculative decoding",
+                    original_sizes, padding_sizes)
+            else:
+                vllm_config.compilation_config.cudagraph_capture_sizes = original_sizes


There are a couple of issues in this logic block:

At line 478, original_sizes[0] is used in the condition. This seems incorrect as it checks the smallest capture size. It should probably be original_sizes[-1] to check if the largest capture size is sufficient, similar to the logic at line 491.

The logic for num_speculative_tokens == 1 (lines 489-498) does not ensure that capture sizes are multiples of num_speculative_tokens + 1 (which is 2). This contradicts the comment on lines 456-461 which states this is a requirement for the FIA operator. The logic for num_speculative_tokens > 1 correctly enforces this multiplication.

To address these issues and improve clarity, the logic for all num_speculative_tokens > 0 can be unified. Here is a suggested implementation:

if num_speculative_tokens > 0: # The check should be against the largest original size. Also, the logic is unified # for all num_speculative_tokens > 0 to enforce multiples of (num_speculative_tokens + 1). if not original_sizes or original_sizes[-1] < target_sizes: new_original_sizes = sorted(set(list(range(1, min(10, max_num_seqs + 1), 2)) + list(range(8, max_num_seqs + 1, 4)))) enlarged_sizes = [(num_speculative_tokens + 1) * size for size in new_original_sizes] if not enlarged_sizes or enlarged_sizes[-1] < target_sizes: enlarged_sizes.append(target_sizes) final_sizes = sorted(list(set(enlarged_sizes))) update_cudagraph_capture_sizes(vllm_config, final_sizes) logger.info( "Adjusted ACL full graphs: %s → %s for speculative decoding", original_sizes, final_sizes) else: vllm_config.compilation_config.cudagraph_capture_sizes = original_sizes

vllm_ascend/utils.py

Signed-off-by: lilinsiman <[email protected]>

yiz-liu · 2025-11-28T09:27:18Z

Please check vllm-project/vllm#28315

Sparkheart · 2025-12-03T14:07:08Z

vllm_ascend/utils.py

+        target_sizes = (num_speculative_tokens + 1) * max_num_seqs
+        original_sizes, vllm_config.compilation_config.cudagraph_capture_sizes = \
+            vllm_config.compilation_config.cudagraph_capture_sizes, None
+        assert len(original_sizes) > 0


Lack of clear error messages when assertions fail

Sparkheart · 2025-12-03T14:08:12Z

vllm_ascend/utils.py

+            if original_sizes[0] < (num_speculative_tokens + 1) * max_num_seqs:
+                new_original_sizes = sorted(
+                    set(
+                        list(range(1, min(10, max_num_seqs + 1), 2)) +


Hardcoded range parameters (1, 10, 2 / 8, 4) lack explanation.

Sparkheart · 2025-12-03T14:09:37Z

vllm_ascend/utils.py

+                new_original_sizes = sorted(
+                    set(
+                        list(range(1, min(10, max_num_seqs + 1), 2)) +
+                        list(range(8, max_num_seqs + 1, 4))))


The case where max_num_seqs < 8 was not considered.

Sparkheart · 2025-12-03T14:11:40Z

vllm_ascend/utils.py

+                update_cudagraph_capture_sizes(vllm_config, padding_sizes)
+                logger.info(
+                    "Adjusted ACL full graphs: %s → %s for speculative decoding",
+                    original_sizes, padding_sizes)


The comparison before and after the change is not intuitive enough; descriptions of the sizes before and after should be provided.

github-actions bot added the module:core label Nov 28, 2025

gemini-code-assist bot reviewed Nov 28, 2025

View reviewed changes

lilinsiman force-pushed the mtp_shape branch from 1bf7915 to 07a3814 Compare November 28, 2025 03:27

[main] Support MTP shape with ACLgraph

25fef14

Signed-off-by: lilinsiman <[email protected]>

lilinsiman force-pushed the mtp_shape branch from 07a3814 to 25fef14 Compare November 28, 2025 06:30

Sparkheart reviewed Dec 3, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[main] Support MTP shape with ACLgraph #4523

[main] Support MTP shape with ACLgraph #4523

lilinsiman commented Nov 28, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Nov 28, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Nov 28, 2025

Uh oh!

Uh oh!

yiz-liu commented Nov 28, 2025

Uh oh!

Sparkheart Dec 3, 2025

Uh oh!

Sparkheart Dec 3, 2025

Uh oh!

Sparkheart Dec 3, 2025

Uh oh!

Sparkheart Dec 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[main] Support MTP shape with ACLgraph #4523

Are you sure you want to change the base?

[main] Support MTP shape with ACLgraph #4523

Conversation

lilinsiman commented Nov 28, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

github-actions bot commented Nov 28, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Nov 28, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

yiz-liu commented Nov 28, 2025

Uh oh!

Sparkheart Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

Sparkheart Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

Sparkheart Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

Sparkheart Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

lilinsiman commented Nov 28, 2025 •

edited by github-actions bot

Loading