Skip to content

Change to enable full iteration CUDA graph for LLMs#14077

Merged
erhoo82 merged 18 commits intoNVIDIA-NeMo:mainfrom
vasunvidia:full_cuda_graph2
Aug 16, 2025
Merged

Change to enable full iteration CUDA graph for LLMs#14077
erhoo82 merged 18 commits intoNVIDIA-NeMo:mainfrom
vasunvidia:full_cuda_graph2

Conversation

@vasunvidia
Copy link
Contributor

@vasunvidia vasunvidia commented Jul 1, 2025

Important

The Update branch button must only be pressed in very rare occassions.
An outdated branch is never blocking the merge of a PR.
Please reach out to the automation team before pressing that button.

What does this PR do ?

Enables Full iteration CUDA graph for NeMo models.

Changelog

  • Add specific line by line info of high level changes in this PR.

Usage

  • You can potentially add a usage example below
# Add a code snippet demonstrating how to use this 

GitHub Actions CI

The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.

The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR.
To re-run CI remove and add the label again.
To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run".

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

  • New Feature
  • Bugfix
  • Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

  • Related to # (issue)

@vasunvidia vasunvidia force-pushed the full_cuda_graph2 branch 2 times, most recently from d12fabd to d7364e5 Compare July 12, 2025 15:31
@gautham-kollu gautham-kollu requested review from erhoo82 and removed request for erhoo82 July 16, 2025 16:27
@vasunvidia vasunvidia force-pushed the full_cuda_graph2 branch 3 times, most recently from 0c1a110 to b11fd52 Compare July 17, 2025 21:18
@github-actions github-actions bot added the NLP label Jul 29, 2025
@vasunvidia vasunvidia force-pushed the full_cuda_graph2 branch 2 times, most recently from 04e60ed to 71f2c6b Compare July 29, 2025 03:14
Copy link
Contributor

@katec846 katec846 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m not entirely sure what the final PR solution should look like, but for cases where pipeline parallelism is greater than 1, we currently have two options:

  1. Remove the sync() call in p2p_communication.py though we would need to justify that it's safe to remove it.
  2. Add assertions to ensure that config.batch_p2p_sync == False and config.variable_seq_lengths ==False

Also, for cases where VP > 1, we need to disable optim.overlap_param_gather on H100 to make it able to run.

@katec846 katec846 force-pushed the full_cuda_graph2 branch 4 times, most recently from 19ba6a2 to 027c7c7 Compare August 14, 2025 17:10
erhoo82
erhoo82 previously approved these changes Aug 14, 2025
vasunvidia and others added 12 commits August 15, 2025 07:28
Initial commit

Signed-off-by: Vasudevan Rengasamy <[email protected]>

Rename FullCGWrapper to FullCudaGraphWrapper

Signed-off-by: Vasudevan Rengasamy <[email protected]>

Change full_cuda_graph to enable_whole_network_cuda_graph

Signed-off-by: Vasudevan Rengasamy <[email protected]>

Change Full Iteration CUDA graph argument name

Change to check for partial CUDA graph condition with enable_cuda_graph flag

Apply isort and black reformatting

Signed-off-by: vasunvidia <[email protected]>
Signed-off-by: Vasudevan Rengasamy <[email protected]>

Use enable_cuda_graph+cuda_graph_scope for full iteration CG

Signed-off-by: Vasudevan Rengasamy <[email protected]>

Fix bug for VPP>1

Signed-off-by: Vasudevan Rengasamy <[email protected]>

Add comments.

Signed-off-by: Vasudevan Rengasamy <[email protected]>

Add comments.

Signed-off-by: Vasudevan Rengasamy <[email protected]>
Signed-off-by: vasunvidia <[email protected]>
Signed-off-by: Vasudevan Rengasamy <[email protected]>
Signed-off-by: Kate Cheng <[email protected]>
Signed-off-by: Vasudevan Rengasamy <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Signed-off-by: Vasudevan Rengasamy <[email protected]>
* Tutorial and doc update

Signed-off-by: Nune <[email protected]>

* Clear outputs from notebook

Signed-off-by: Nune <[email protected]>

* Update docs/source/asr/models.rst

Co-authored-by: Nithin Rao <[email protected]>
Signed-off-by: nune-tadevosyan <[email protected]>

---------

Signed-off-by: Nune <[email protected]>
Signed-off-by: nune-tadevosyan <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
Signed-off-by: Vasudevan Rengasamy <[email protected]>
Signed-off-by: Kate Cheng <[email protected]>
Signed-off-by: Vasudevan Rengasamy <[email protected]>
Signed-off-by: Kate Cheng <[email protected]>
Signed-off-by: Vasudevan Rengasamy <[email protected]>
This reverts commit 00751d5.

Signed-off-by: Vasudevan Rengasamy <[email protected]>
This reverts commit af83c2d.

Signed-off-by: Vasudevan Rengasamy <[email protected]>
Signed-off-by: Vasudevan Rengasamy <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants