Skip to content

Conversation

@jeng1220
Copy link
Collaborator

@jeng1220 jeng1220 commented Mar 7, 2024

PR Category

Others

PR Types

Bug fixes

Description

Not all CI/CD machines have 8 GPUs, so this PR adds skipif if number of GPUs is not enough

@paddle-bot
Copy link

paddle-bot bot commented Mar 7, 2024

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@paddle-bot paddle-bot bot added the contributor External developers label Mar 7, 2024
@jeng1220 jeng1220 added the NVIDIA label Mar 7, 2024
@onecatcn onecatcn requested a review from tianshuo78520a March 7, 2024 02:53
Comment on lines +30 to +31
if num_of_devices > paddle.device.cuda.device_count():
self.skipTest("number of GPUs is not enough")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think need this if-condition here because all the test-case inherited this base class only use 2-cards. And CI machines which run these test-cases only has two devices either.

Copy link
Collaborator Author

@jeng1220 jeng1220 Mar 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LiYuRio ,

Thanks for the response.

CI machines which run these test-cases only has two devices either.

Inside Baidu, it is true but for other developers, it might not be true. This line can give other developers a simple, more meaningful reason to show why unit tests cannot run, instead of a long bug message. Does it make sense?

Copy link
Contributor

@LiYuRio LiYuRio Mar 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, but if we add skipTest here, is it necessary to add skipTest to all distributed test case? Not only in communication test case.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LiYuRio ,
Yes, it will be best. I just found another test (test/legacy_test/op_test.py) has the same issue. This is all I have found for now.

@onecatcn onecatcn requested a review from LiYuRio March 11, 2024 04:36
@paddle-ci-bot
Copy link

paddle-ci-bot bot commented Mar 15, 2024

Sorry to inform you that 7a964fe's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually.

JZ-LIANG
JZ-LIANG previously approved these changes Mar 18, 2024
Copy link
Contributor

@JZ-LIANG JZ-LIANG left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jeng1220 jeng1220 force-pushed the bugfix_multigpu_unittest branch 2 times, most recently from 8036dcb to b2d0ac2 Compare April 25, 2024 03:35
@paddle-ci-bot
Copy link

paddle-ci-bot bot commented May 3, 2024

Sorry to inform you that b2d0ac2's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually.

@jeng1220 jeng1220 force-pushed the bugfix_multigpu_unittest branch from b2d0ac2 to 2378025 Compare May 9, 2024 02:10
@LiYuRio LiYuRio merged commit 8c5b4fa into PaddlePaddle:develop May 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contributor External developers NVIDIA

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants