fix: avoid AccumulateGrad stream mismatch by using default stream for DDP init by s-zx · Pull Request #21579 · Lightning-AI/pytorch-lightning

s-zx · 2026-03-10T21:45:53Z

Summary

Fixes the AccumulateGrad stream mismatch warning when training with Fabric + DDP, especially with gradient accumulation (no_backward_sync). The warning occurred because DDP was initialized inside torch.cuda.stream(torch.cuda.Stream()), creating AccumulateGrad nodes on a non-default stream that did not match subsequent forwards/backwards.

Root Cause

DDP setup used a custom CUDA stream context (from PR #17334) when wrapping the model. This caused the AccumulateGrad nodes to be created on a non-default stream. When backward ran on the default stream (or vice versa), PyTorch emitted:

The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient... To resolve the mismatch, ensure that DDP initialization is performed under the same stream as subsequent forwards.

Fix

Remove the custom stream context so DDP initialization runs on the default stream, matching subsequent forwards/backwards as recommended by PyTorch.

lightning_fabric/strategies/ddp.py: Remove torch.cuda.stream(torch.cuda.Stream()) context
lightning/pytorch/strategies/ddp.py: Same change
Update test to assert default stream is used (no custom stream)

Fixes #21567

📚 Documentation preview 📚: https://pytorch-lightning--21579.org.readthedocs.build/en/21579/

… DDP init DDP was previously wrapped in torch.cuda.stream(torch.cuda.Stream()) which caused the AccumulateGrad node to be created on a non-default stream. This triggered PyTorch's stream mismatch warning when running backward, especially with Fabric and gradient accumulation (no_backward_sync). The fix removes the custom stream context so DDP initialization runs on the default stream, matching subsequent forwards/backwards as recommended by PyTorch's warning message. Fixes Lightning-AI#21567 Signed-off-by: s-zx <[email protected]>

codecov · 2026-03-11T13:12:54Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 87%. Comparing base (283ce77) to head (015e645).

Additional details and impacted files

@@           Coverage Diff           @@
##           master   #21579   +/-   ##
=======================================
- Coverage      87%      87%   -0%     
=======================================
  Files         270      270           
  Lines       24078    24073    -5     
=======================================
- Hits        20863    20855    -8     
- Misses       3215     3218    +3

s-zx requested review from ethanwharris, justusschock, lantiga and tchaton as code owners March 10, 2026 21:45

github-actions bot added fabric lightning.fabric.Fabric pl Generic label for PyTorch Lightning package labels Mar 10, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: avoid AccumulateGrad stream mismatch by using default stream for DDP init#21579

fix: avoid AccumulateGrad stream mismatch by using default stream for DDP init#21579
s-zx wants to merge 1 commit intoLightning-AI:masterfrom
s-zx:fix/21567-accumulate-grad-stream-mismatch

s-zx commented Mar 10, 2026 •

edited by github-actions bot

Loading

Uh oh!

codecov bot commented Mar 11, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

s-zx commented Mar 10, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Root Cause

Fix

Uh oh!

codecov bot commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

s-zx commented Mar 10, 2026 •

edited by github-actions bot

Loading

codecov bot commented Mar 11, 2026 •

edited

Loading