Skip to content

split different comm method for mnist distributed training#18715

Merged
guru4elephant merged 4 commits intoPaddlePaddle:developfrom
guru4elephant:fix_timeout
Jul 22, 2019
Merged

split different comm method for mnist distributed training#18715
guru4elephant merged 4 commits intoPaddlePaddle:developfrom
guru4elephant:fix_timeout

Conversation

@guru4elephant
Copy link
Member

since there are many communication methods for running mnist, timeout happens a lot of time. I split the unit tests by communication methods in this pull request.
test=develop

@guru4elephant guru4elephant requested a review from sandyhouse July 21, 2019 14:49
test=develop
Copy link

@sandyhouse sandyhouse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The class TestDistMnistNCCL2MultiNCCLComm in test_dist_mnist_multi_comm.py is duplicated in test_dist_mnist_ring_allreduce.py.

@guru4elephant
Copy link
Member Author

The class TestDistMnistNCCL2MultiNCCLComm in test_dist_mnist_multi_comm.py is duplicated in test_dist_mnist_ring_allreduce.py.

The class TestDistMnistNCCL2MultiNCCLComm in test_dist_mnist_multi_comm.py is duplicated in test_dist_mnist_ring_allreduce.py.

fixed

Copy link

@sandyhouse sandyhouse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@guru4elephant guru4elephant merged commit ebf9797 into PaddlePaddle:develop Jul 22, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants