Skip to content

disable the ut test_dist_mnist_hallreduce temporarily#28129

Merged
sandyhouse merged 1 commit intoPaddlePaddle:developfrom
sandyhouse:disable_test_dist_mnist_hallreduce
Oct 20, 2020
Merged

disable the ut test_dist_mnist_hallreduce temporarily#28129
sandyhouse merged 1 commit intoPaddlePaddle:developfrom
sandyhouse:disable_test_dist_mnist_hallreduce

Conversation

@sandyhouse
Copy link
Copy Markdown

@sandyhouse sandyhouse commented Oct 20, 2020

PR types

Others

PR changes

Others

Describe

临时禁用单测test_dist_mnist_hallreduce
原因是CI系统机器只有2块GPU卡,而该单测创建的nccl rank数为4(4个进程),因此会出现单张GPU卡上存在多个rank的情况。但高版本nccl不支持这一情况: Using the same CUDA device multiple times as different ranks of the same NCCL communicator is not supported and may lead to hangs.

升级ci docker镜像的pr:#27589

@paddle-bot-old
Copy link
Copy Markdown

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

Copy link
Copy Markdown
Contributor

@gongweibao gongweibao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@sandyhouse sandyhouse merged commit cd37244 into PaddlePaddle:develop Oct 20, 2020
@sandyhouse sandyhouse deleted the disable_test_dist_mnist_hallreduce branch October 20, 2020 14:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants