Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions test/collective/test_communication_api_base.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,9 +22,14 @@
import tempfile
import unittest

import paddle


class CommunicationTestDistBase(unittest.TestCase):
def setUp(self, save_log_dir=None, num_of_devices=2, timeout=120, nnode=1):
if num_of_devices > paddle.device.cuda.device_count():
self.skipTest("number of GPUs is not enough")
Comment on lines +30 to +31
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think need this if-condition here because all the test-case inherited this base class only use 2-cards. And CI machines which run these test-cases only has two devices either.

Copy link
Collaborator Author

@jeng1220 jeng1220 Mar 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LiYuRio ,

Thanks for the response.

CI machines which run these test-cases only has two devices either.

Inside Baidu, it is true but for other developers, it might not be true. This line can give other developers a simple, more meaningful reason to show why unit tests cannot run, instead of a long bug message. Does it make sense?

Copy link
Contributor

@LiYuRio LiYuRio Mar 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, but if we add skipTest here, is it necessary to add skipTest to all distributed test case? Not only in communication test case.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LiYuRio ,
Yes, it will be best. I just found another test (test/legacy_test/op_test.py) has the same issue. This is all I have found for now.


self._python_interp = sys.executable
self._save_log_dir = save_log_dir
self._log_dir = tempfile.TemporaryDirectory()
Expand Down
7 changes: 7 additions & 0 deletions test/legacy_test/op_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -3199,6 +3199,13 @@ def check_grad_with_place(
python_api_info=python_api_info,
)
runtime_envs = get_subprocess_runtime_envs(place)

num_devices = len(
runtime_envs["CUDA_VISIBLE_DEVICES"].split(",")
)
if num_devices > paddle.device.cuda.device_count():
self.skipTest("number of GPUs is not enough")

start_command = get_subprocess_command(
runtime_envs["CUDA_VISIBLE_DEVICES"],
generated_grad_test_path,
Expand Down