Skip to content

Commit 2bcc33e

Browse files
wenyiz2021sreejithsreekumaran
authored andcommitted
Update test_chassis_reboot.py (sonic-net#15422)
Description of PR Summary: Fixes # (issue) Fix the issue when running dut.command("reboot"), when some T2 platform running the command itself can exceed ansible timeout we defined in ansible.cfg that is 60sec. In this case, test will error out saying "Host unreachable", but it's actually due to ansible command timeout. Approach What is the motivation for this PR? Avoid testcase failure that is command runtime dependent How did you do it? Thread the reboot command, so even it timeout in a single thread, main thread is fine How did you verify/test it? Before: 03/11/2024 12:04:25 test_chassis_reboot.chassis_cold_reboot L0027 INFO | Run cold reboot on <MultiAsicSonicHost 8800-lc4> 03/11/2024 12:04:25 base._run L0071 DEBUG | /var/src/sonic-mgmt_8800-1_66b4a53de4614bccc2e74f8c/tests/common/devices/multi_asic.py::_run_on_asics#135: [8800-lc4] AnsibleModule::command, args=["reboot"], kwargs={} 03/11/2024 12:05:24 transport._log L1873 DEBUG | EOF in transport thread 03/11/2024 12:05:24 __init__.pytest_runtest_call L0040 ERROR | Traceback (most recent call last): File "/usr/local/lib/python3.8/dist-packages/_pytest/python.py", line 1788, in runtest self.ihook.pytest_pyfunc_call(pyfuncitem=self) File "/usr/local/lib/python3.8/dist-packages/pluggy/_hooks.py", line 513, in __call__ return self._hookexec(self.name, self._hookimpls.copy(), kwargs, firstresult) File "/usr/local/lib/python3.8/dist-packages/pluggy/_manager.py", line 120, in _hookexec return self._inner_hookexec(hook_name, methods, kwargs, firstresult) File "/usr/local/lib/python3.8/dist-packages/pluggy/_callers.py", line 139, in _multicall raise exception.with_traceback(exception.__traceback__) File "/usr/local/lib/python3.8/dist-packages/pluggy/_callers.py", line 103, in _multicall res = hook_impl.function(*args) File "/usr/local/lib/python3.8/dist-packages/_pytest/python.py", line 194, in pytest_pyfunc_call result = testfunction(**testargs) File "/var/src/sonic-mgmt_t2-8800-1_66b4a53de4614bccc2e74f8c/tests/platform_tests/test_chassis_reboot.py", line 70, in test_parallel_reboot chassis_cold_reboot(dut, localhost) File "/var/src/sonic-mgmt_t2-8800-1_66b4a53de4614bccc2e74f8c/tests/platform_tests/test_chassis_reboot.py", line 28, in chassis_cold_reboot dut.command("reboot") File "/var/src/sonic-mgmt-t2-8800-1_66b4a53de4614bccc2e74f8c/tests/common/devices/multi_asic.py", line 135, in _run_on_asics return getattr(self.sonichost, self.multi_asic_attr)(*module_args, **complex_args) File "/var/src/sonic-mgmt-t2-8800-1_66b4a53de4614bccc2e74f8c/tests/common/devices/base.py", line 105, in _run res = self.module(*module_args, **complex_args)[self.hostname] File "/usr/local/lib/python3.8/dist-packages/pytest_ansible/module_dispatcher/v213.py", line 232, in _run raise AnsibleConnectionFailure( pytest_ansible.errors.AnsibleConnectionFailure: Host unreachable in the inventory After: ----------------------------------------------------------------------------------------- live log sessionfinish -----------------------------------------------------------------------------------------05:47:44 __init__.pytest_terminal_summary L0067 INFO | Can not get Allure report URL. Please check logs =============================================================================== 1 passed, 1 warning in 2017.89s (0:33:37) ================================================================================DEBUG:tests.conftest:[log_custom_msg] item: <Function test_parallel_reboot> DEBUG:tests.conftest:append custom_msg: {'dut_check_result': {'core_dump_check_pass': True, 'config_db_check_pass': False}} Any platform specific information? The issue is seen on Cisco T2 that takes more time to reboot. But is a general enhancement. co-authorized by: [email protected]
1 parent cb53b84 commit 2bcc33e

1 file changed

Lines changed: 7 additions & 7 deletions

File tree

tests/platform_tests/test_chassis_reboot.py

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@
55
import random
66
import logging
77
import time
8+
import concurrent.futures
89
from tests.common.helpers.assertions import pytest_assert
910
from tests.common.utilities import wait_until
1011
from tests.common.reboot import wait_for_startup,\
@@ -59,15 +60,16 @@ def test_parallel_reboot(duthosts, localhost, conn_graph_facts, xcvr_skip_list):
5960

6061
core_dumps = {}
6162
# Perform reboot on multiple LCs within 30sec
62-
for dut in duthosts:
63-
if dut.is_supervisor_node():
64-
continue
63+
executor = concurrent.futures.ThreadPoolExecutor(max_workers=8)
64+
for dut in duthosts.frontend_nodes:
6565

6666
# collect core dump before reboot
6767
core_dumps[dut.hostname] = get_core_dump(dut)
6868

6969
# Perform cold reboot on all linecards, with an internal within 30sec to mimic a parallel reboot scenario
70-
chassis_cold_reboot(dut, localhost)
70+
# Change this to threaded reboot, to avoid ansible command timeout in 60sec, we have seen some T2 platform
71+
# reboot exceed 60 sec, and causes test to error out
72+
executor.submit(chassis_cold_reboot, dut, localhost)
7173

7274
# Wait for 0 ~ 30sec
7375
rand_interval = random.randint(0, 30)
@@ -88,9 +90,7 @@ def test_parallel_reboot(duthosts, localhost, conn_graph_facts, xcvr_skip_list):
8890
"Not all BGP sessions are established on DUT")
8991

9092
# Check if new core dumps are generated
91-
for dut in duthosts:
92-
if dut.is_supervisor_node():
93-
continue
93+
for dut in duthosts.frontend_nodes:
9494
post_core_dump = get_core_dump(dut)
9595
new_core_dumps = (set(post_core_dump) - set(core_dumps[dut.hostname]))
9696

0 commit comments

Comments
 (0)