Nccl reduce scatter, all gather by nastya236 · Pull Request #2727 · ml-explore/mlx

nastya236 · 2025-11-03T00:47:27Z

Proposed changes

Added nccl all_gather and reduce_scatter
Make majority of tests shared for all backends:

nccl doesn’t support some dtypes, so there’s an extra all_reduce test for ring/MPI. But the following are now shared across all backends: test_all_reduce, test_average_gradients, test_donation, test_shard_linear, test_all_gather.
In test_shard_linear, since we don’t have quantized matmuls on CUDA yet, the quantized variant runs only when CUDA is not available.

Test: mlx.launch -n 8 mlx/python/tests/nccl_test_distributed.py

awni · 2025-11-03T14:05:07Z

python/src/distributed.cpp

      )pbdoc");
+
+  m.def(
+      "reduce_scatter",


I'm wondering about the name. We use all_sum (instead of all_reduce) to indicate it's a sum. Maybe we should use sum_scatter here to be more consistent? Wdyt?

Yes, I would agree.. I think it will be more consistent.

I did the same as for AllReduce by adding a reduction op, let me know if you think that it is not needed and single sum_scatter is enough.

python/src/distributed.cpp

mlx/backend/cuda/distributed.cu

python/src/distributed.cpp

mlx/distributed/ops.cpp

mlx/backend/cuda/distributed.cu

mlx/distributed/mpi/mpi.cpp

Co-authored-by: Awni Hannun <awni.hannun@gmail.com>

…astya236/mlx into nccl-reduce-scatter-all-gather

awni

Looks great! Will merge when tests clear!

nastya236 · 2025-11-05T02:48:52Z

I fixed a typo, sorry about that. It should pass now. Thanks for reviewing!

awni · 2025-11-05T02:52:42Z

The tests failed since your last push. It looks like it's trying to initialize nccl on mac.. can you see the failures?

nastya236 · 2025-11-05T15:58:52Z

Finally everything is fixed :)

nastya236 added 3 commits November 1, 2025 01:18

Added reduce scatter and all gather for nccl

7019748

fix unused import, delete unused file

4575de1

small fix

f64f761

awni reviewed Nov 3, 2025

View reviewed changes

python/src/distributed.cpp Outdated Show resolved Hide resolved

awni reviewed Nov 3, 2025

View reviewed changes

python/src/distributed.cpp Outdated Show resolved Hide resolved

awni reviewed Nov 3, 2025

View reviewed changes

mlx/backend/cuda/distributed.cu Outdated Show resolved Hide resolved

deleted useless condition

45d0b70

awni reviewed Nov 3, 2025

View reviewed changes

mlx/backend/cuda/distributed.cu Outdated Show resolved Hide resolved

nastya236 and others added 5 commits November 3, 2025 23:17

fixed comments

7509250

fix bug in eval_gpu, renamed to sum_scatter, fix docs

86954d5

final fix docs

356ed48

remove and

87b5f89

Merge branch 'ml-explore:main' into nccl-reduce-scatter-all-gather

783b45e

awni reviewed Nov 4, 2025

View reviewed changes

python/src/distributed.cpp Outdated Show resolved Hide resolved

awni reviewed Nov 4, 2025

View reviewed changes

mlx/distributed/ops.cpp Outdated Show resolved Hide resolved

awni reviewed Nov 4, 2025

View reviewed changes

mlx/backend/cuda/distributed.cu Outdated Show resolved Hide resolved

awni reviewed Nov 4, 2025

View reviewed changes

mlx/distributed/mpi/mpi.cpp Outdated Show resolved Hide resolved

nastya236 and others added 3 commits November 4, 2025 15:34

Update mlx/distributed/mpi/mpi.cpp

7e1e66d

Co-authored-by: Awni Hannun <awni.hannun@gmail.com>

fix broken set input output

e0d83fb

Merge branch 'nccl-reduce-scatter-all-gather' of https://github.com/n…

aedf937

…astya236/mlx into nccl-reduce-scatter-all-gather

awni approved these changes Nov 4, 2025

View reviewed changes

nastya236 added 2 commits November 5, 2025 00:52

fixes set output

b1e814e

typo

cb01bf5

nastya236 added 2 commits November 5, 2025 11:16

fix typo

80bec96

no cpu, no gpu for reduce scatter

caeb5ec

awni merged commit 2777815 into ml-explore:main Nov 5, 2025
7 checks passed

BrewTestBot mentioned this pull request Nov 20, 2025

mlx 0.30.0 Homebrew/homebrew-core#255173

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nccl reduce scatter, all gather#2727

Nccl reduce scatter, all gather#2727
awni merged 16 commits intoml-explore:mainfrom
nastya236:nccl-reduce-scatter-all-gather

nastya236 commented Nov 3, 2025

Uh oh!

awni Nov 3, 2025

Uh oh!

nastya236 Nov 3, 2025

Uh oh!

nastya236 Nov 4, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

awni left a comment

Uh oh!

nastya236 commented Nov 5, 2025

Uh oh!

awni commented Nov 5, 2025

Uh oh!

nastya236 commented Nov 5, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

nastya236 commented Nov 3, 2025

Proposed changes

Uh oh!

awni Nov 3, 2025

Choose a reason for hiding this comment

Uh oh!

nastya236 Nov 3, 2025

Choose a reason for hiding this comment

Uh oh!

nastya236 Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

awni left a comment

Choose a reason for hiding this comment

Uh oh!

nastya236 commented Nov 5, 2025

Uh oh!

awni commented Nov 5, 2025

Uh oh!

nastya236 commented Nov 5, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants