Optimizing Network Performance for Distributed DNN Training on GPU Clusters

- [ ] AllReduce selectedrows 
   - [ ] without csc
   - [ ] with csc
- [ ] Optimizing Network Performance for Distributed DNN Training on GPU Clusters
    - [x] Get the system arch and performance.
    - [x] Analysis the operator time and communication time.
    - [ ] Mixed precision.
       - [ ] On Bert.
       - [ ] On Resnet 50 on imagenet dataset.
    - [x] Dynamic(static) LA(lazy allreduce) overlap
       - [x] FUse allreduce tensor and analysis the performance.
       - [x] Implement the Hierarchical All-reduce.
    - [ ] CSC communication
       - [ ] resnet
       - [ ] bert
- [ ] Pserver sync from step to var

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimizing Network Performance for Distributed DNN Training on GPU Clusters #16061

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Optimizing Network Performance for Distributed DNN Training on GPU Clusters #16061

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions