Skip to content
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions vllm/config/vllm.py
Original file line number Diff line number Diff line change
Expand Up @@ -346,6 +346,10 @@ def __post_init__(self):
or self.model_config.is_encoder_decoder
):
self.compilation_config.cudagraph_mode = CUDAGraphMode.PIECEWISE

# decode context parallel do not support full cudagraphs now.
if self.parallel_config.decode_context_parallel_size > 1:
self.compilation_config.cudagraph_mode = CUDAGraphMode.PIECEWISE
Comment on lines 351 to 357
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

This implementation unconditionally sets cudagraph_mode to PIECEWISE if decode context parallelism (DCP) is enabled. This is too aggressive as it will override a user's explicit choice to disable CUDA graphs (e.g., cudagraph_mode=NONE), which might be done for debugging purposes.

A better approach is to only downgrade the mode to PIECEWISE if a FULL CUDA graph mode was requested, as those are the ones incompatible with DCP. This change also adds a warning to inform the user about the automatic adjustment.

if self.parallel_config.decode_context_parallel_size > 1 and \
                    self.compilation_config.cudagraph_mode.has_full_cudagraphs():
                    logger.warning(
                        "Decode context parallel (DCP) is enabled, which is "
                        "incompatible with full CUDA graphs. Downgrading "
                        "cudagraph_mode from %s to PIECEWISE.",
                        self.compilation_config.cudagraph_mode.name)
                    self.compilation_config.cudagraph_mode = CUDAGraphMode.PIECEWISE

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These code snippets will only execute when cudagraph_mode is not explicitly set by users.

else:
self.compilation_config.cudagraph_mode = CUDAGraphMode.NONE

Expand Down