Commit 9c3cef4
gloo: support ibverbs in cmake (pytorch#153425)
This updates the gloo submodule in PyTorch to a version that supports the new ibverbs backend that can be used with PyTorch.
Test plan:
```
sudo dnf install rdma-core-devel
USE_GLOO_IBVERBS=ON python setup.py develop
torchrun --nproc_per_node 2 ~/scripts/gloo_ibverbs_test.py
```
```py
"""
run with:
torchrun --nproc_per_node 2 ~/scripts/gloo_ibverbs_test.py
"""
import os
os.environ["GLOO_DEVICE_TRANSPORT"] = "IBVERBS"
import torch
import torch.distributed as dist
dist.init_process_group("gloo")
rank = dist.get_rank()
if rank == 0:
device = "cpu"
else:
device = "cuda"
print(device)
t = torch.full((10, 100), fill_value=(rank+1), device=device)
target = torch.full((10, 100), fill_value=3, device=device)
dist.all_reduce(t)
torch.testing.assert_close(t, target)
t = torch.full((10, 100), fill_value=(rank+1), device=device)
if rank == 0:
dist.send(t, dst=1)
else:
dist.recv(t, src=0)
torch.testing.assert_close(t, torch.full_like(t, 1))
```
Pull Request resolved: pytorch#153425
Approved by: https://github.com/fduwjj1 parent dde7058 commit 9c3cef4
File tree
5 files changed
+13
-2
lines changed- cmake
- third_party
- torch/csrc/distributed/c10d
5 files changed
+13
-2
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
331 | 331 | | |
332 | 332 | | |
333 | 333 | | |
| 334 | + | |
| 335 | + | |
| 336 | + | |
334 | 337 | | |
335 | 338 | | |
336 | 339 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1217 | 1217 | | |
1218 | 1218 | | |
1219 | 1219 | | |
| 1220 | + | |
| 1221 | + | |
| 1222 | + | |
| 1223 | + | |
1220 | 1224 | | |
1221 | 1225 | | |
1222 | 1226 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
188 | 188 | | |
189 | 189 | | |
190 | 190 | | |
| 191 | + | |
191 | 192 | | |
192 | 193 | | |
193 | 194 | | |
| |||
- .github/workflows/build-windows.yml+2-1
- README.md+24-2
- gloo/barrier_all_to_all.h+16-26
- gloo/barrier_all_to_one.h+21-44
- gloo/broadcast_one_to_all.h+14-49
- gloo/common/logging.h+10-3
- gloo/common/utils.cc+1
- gloo/test/base_test.cc+19
- gloo/test/base_test.h+6
- gloo/test/send_recv_test.cc+1-3
- gloo/test/tcp_test.cc+4
- gloo/transport/ibverbs/CMakeLists.txt+2
- gloo/transport/ibverbs/buffer.cc+26-18
- gloo/transport/ibverbs/buffer.h+11-3
- gloo/transport/ibverbs/context.cc+15-4
- gloo/transport/ibverbs/context.h+6
- gloo/transport/ibverbs/device.cc+36-9
- gloo/transport/ibverbs/device.h+1
- gloo/transport/ibverbs/pair.cc+212-69
- gloo/transport/ibverbs/pair.h+29-6
- gloo/transport/ibverbs/unbound_buffer.cc+260
- gloo/transport/ibverbs/unbound_buffer.h+107
- gloo/transport/tcp/device.cc+1-1
- media/gloo_100k_dark.svg+35
- media/gloo_100k_light.svg+35
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
151 | 151 | | |
152 | 152 | | |
153 | 153 | | |
154 | | - | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
155 | 158 | | |
156 | 159 | | |
157 | 160 | | |
| |||
0 commit comments