Skip to content

Optimizing Network Performance for Distributed DNN Training on GPU Clusters #16061

@gongweibao

Description

@gongweibao
  • AllReduce selectedrows
    • without csc
    • with csc
  • Optimizing Network Performance for Distributed DNN Training on GPU Clusters
    • Get the system arch and performance.
    • Analysis the operator time and communication time.
    • Mixed precision.
      • On Bert.
      • On Resnet 50 on imagenet dataset.
    • Dynamic(static) LA(lazy allreduce) overlap
      • FUse allreduce tensor and analysis the performance.
      • Implement the Hierarchical All-reduce.
    • CSC communication
      • resnet
      • bert
  • Pserver sync from step to var

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions