Skip to content

[action] [PR:15422] test_chassis_reboot.py apply threaded reboot to avoid ansible command timeout#15441

Merged
mssonicbld merged 1 commit intosonic-net:202405from
mssonicbld:cherry/202405/15422
Nov 7, 2024
Merged

[action] [PR:15422] test_chassis_reboot.py apply threaded reboot to avoid ansible command timeout#15441
mssonicbld merged 1 commit intosonic-net:202405from
mssonicbld:cherry/202405/15422

Conversation

@mssonicbld
Copy link
Collaborator

Description of PR

Summary:
Fixes # (issue)
Fix the issue when running dut.command("reboot"), when some T2 platform running the command itself can exceed ansible timeout we defined in ansible.cfg that is 60sec.
In this case, test will error out saying "Host unreachable", but it's actually due to ansible command timeout.

Type of change

  • Bug fix
  • Testbed and Framework(new/improvement)
  • Test case(new/improvement)

Back port request

  • 202012
  • 202205
  • 202305
  • 202311
  • 202405

Approach

What is the motivation for this PR?

Avoid testcase failure that is command runtime dependent

How did you do it?

Thread the reboot command, so even it timeout in a single thread, main thread is fine

How did you verify/test it?

Before:

03/11/2024 12:04:25 test_chassis_reboot.chassis_cold_reboot L0027 INFO | Run cold reboot on <MultiAsicSonicHost 8800-lc4>
03/11/2024 12:04:25 base._run L0071 DEBUG | /var/src/sonic-mgmt_8800-1_66b4a53de4614bccc2e74f8c/tests/common/devices/multi_asic.py::_run_on_asics#135: [8800-lc4] AnsibleModule::command, args=["reboot"], kwargs={}
03/11/2024 12:05:24 transport._log L1873 DEBUG | EOF in transport thread
03/11/2024 12:05:24 __init__.pytest_runtest_call L0040 ERROR | Traceback (most recent call last):
 File "/usr/local/lib/python3.8/dist-packages/_pytest/python.py", line 1788, in runtest
 self.ihook.pytest_pyfunc_call(pyfuncitem=self)
 File "/usr/local/lib/python3.8/dist-packages/pluggy/_hooks.py", line 513, in __call__
 return self._hookexec(self.name, self._hookimpls.copy(), kwargs, firstresult)
 File "/usr/local/lib/python3.8/dist-packages/pluggy/_manager.py", line 120, in _hookexec
 return self._inner_hookexec(hook_name, methods, kwargs, firstresult)
 File "/usr/local/lib/python3.8/dist-packages/pluggy/_callers.py", line 139, in _multicall
 raise exception.with_traceback(exception.__traceback__)
 File "/usr/local/lib/python3.8/dist-packages/pluggy/_callers.py", line 103, in _multicall
 res = hook_impl.function(*args)
 File "/usr/local/lib/python3.8/dist-packages/_pytest/python.py", line 194, in pytest_pyfunc_call
 result = testfunction(**testargs)
 File "/var/src/sonic-mgmt_t2-8800-1_66b4a53de4614bccc2e74f8c/tests/platform_tests/test_chassis_reboot.py", line 70, in test_parallel_reboot
 chassis_cold_reboot(dut, localhost)
 File "/var/src/sonic-mgmt_t2-8800-1_66b4a53de4614bccc2e74f8c/tests/platform_tests/test_chassis_reboot.py", line 28, in chassis_cold_reboot
 dut.command("reboot")
 File "/var/src/sonic-mgmt-t2-8800-1_66b4a53de4614bccc2e74f8c/tests/common/devices/multi_asic.py", line 135, in _run_on_asics
 return getattr(self.sonichost, self.multi_asic_attr)(*module_args, **complex_args)
 File "/var/src/sonic-mgmt-t2-8800-1_66b4a53de4614bccc2e74f8c/tests/common/devices/base.py", line 105, in _run
 res = self.module(*module_args, **complex_args)[self.hostname]
 File "/usr/local/lib/python3.8/dist-packages/pytest_ansible/module_dispatcher/v213.py", line 232, in _run
 raise AnsibleConnectionFailure(
pytest_ansible.errors.AnsibleConnectionFailure: Host unreachable in the inventory

After:

----------------------------------------------------------------------------------------- live log sessionfinish -----------------------------------------------------------------------------------------05:47:44 __init__.pytest_terminal_summary L0067 INFO | Can not get Allure report URL. Please check logs
=============================================================================== 1 passed, 1 warning in 2017.89s (0:33:37) ================================================================================DEBUG:tests.conftest:[log_custom_msg] item: <Function test_parallel_reboot>
DEBUG:tests.conftest:append custom_msg: {'dut_check_result': {'core_dump_check_pass': True, 'config_db_check_pass': False}}

Any platform specific information?

The issue is seen on Cisco T2 that takes more time to reboot. But is a general enhancement.

Supported testbed topology if it's a new test case?

Documentation

Description of PR
Summary:
Fixes # (issue)
Fix the issue when running dut.command("reboot"), when some T2 platform running the command itself can exceed ansible timeout we defined in ansible.cfg that is 60sec.
In this case, test will error out saying "Host unreachable", but it's actually due to ansible command timeout.

Approach
What is the motivation for this PR?
Avoid testcase failure that is command runtime dependent

How did you do it?
Thread the reboot command, so even it timeout in a single thread, main thread is fine

How did you verify/test it?
Before:

03/11/2024 12:04:25 test_chassis_reboot.chassis_cold_reboot  L0027 INFO   | Run cold reboot on <MultiAsicSonicHost 8800-lc4>
03/11/2024 12:04:25 base._run                                L0071 DEBUG  | /var/src/sonic-mgmt_8800-1_66b4a53de4614bccc2e74f8c/tests/common/devices/multi_asic.py::_run_on_asics#135: [8800-lc4] AnsibleModule::command, args=["reboot"], kwargs={}
03/11/2024 12:05:24 transport._log                           L1873 DEBUG  | EOF in transport thread
03/11/2024 12:05:24 __init__.pytest_runtest_call             L0040 ERROR  | Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/_pytest/python.py", line 1788, in runtest
    self.ihook.pytest_pyfunc_call(pyfuncitem=self)
  File "/usr/local/lib/python3.8/dist-packages/pluggy/_hooks.py", line 513, in __call__
    return self._hookexec(self.name, self._hookimpls.copy(), kwargs, firstresult)
  File "/usr/local/lib/python3.8/dist-packages/pluggy/_manager.py", line 120, in _hookexec
    return self._inner_hookexec(hook_name, methods, kwargs, firstresult)
  File "/usr/local/lib/python3.8/dist-packages/pluggy/_callers.py", line 139, in _multicall
    raise exception.with_traceback(exception.__traceback__)
  File "/usr/local/lib/python3.8/dist-packages/pluggy/_callers.py", line 103, in _multicall
    res = hook_impl.function(*args)
  File "/usr/local/lib/python3.8/dist-packages/_pytest/python.py", line 194, in pytest_pyfunc_call
    result = testfunction(**testargs)
  File "/var/src/sonic-mgmt_t2-8800-1_66b4a53de4614bccc2e74f8c/tests/platform_tests/test_chassis_reboot.py", line 70, in test_parallel_reboot
    chassis_cold_reboot(dut, localhost)
  File "/var/src/sonic-mgmt_t2-8800-1_66b4a53de4614bccc2e74f8c/tests/platform_tests/test_chassis_reboot.py", line 28, in chassis_cold_reboot
    dut.command("reboot")
  File "/var/src/sonic-mgmt-t2-8800-1_66b4a53de4614bccc2e74f8c/tests/common/devices/multi_asic.py", line 135, in _run_on_asics
    return getattr(self.sonichost, self.multi_asic_attr)(*module_args, **complex_args)
  File "/var/src/sonic-mgmt-t2-8800-1_66b4a53de4614bccc2e74f8c/tests/common/devices/base.py", line 105, in _run
    res = self.module(*module_args, **complex_args)[self.hostname]
  File "/usr/local/lib/python3.8/dist-packages/pytest_ansible/module_dispatcher/v213.py", line 232, in _run
    raise AnsibleConnectionFailure(
pytest_ansible.errors.AnsibleConnectionFailure: Host unreachable in the inventory
After:

----------------------------------------------------------------------------------------- live log sessionfinish -----------------------------------------------------------------------------------------05:47:44 __init__.pytest_terminal_summary         L0067 INFO   | Can not get Allure report URL. Please check logs
=============================================================================== 1 passed, 1 warning in 2017.89s (0:33:37) ================================================================================DEBUG:tests.conftest:[log_custom_msg] item: <Function test_parallel_reboot>
DEBUG:tests.conftest:append custom_msg: {'dut_check_result': {'core_dump_check_pass': True, 'config_db_check_pass': False}}
Any platform specific information?
The issue is seen on Cisco T2 that takes more time to reboot. But is a general enhancement.

co-authorized by: jianquanye@microsoft.com
@mssonicbld mssonicbld requested a review from prgeor as a code owner November 7, 2024 10:51
@mssonicbld
Copy link
Collaborator Author

/azp run

@mssonicbld
Copy link
Collaborator Author

Original PR: #15422

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld mssonicbld merged commit 0d51ab9 into sonic-net:202405 Nov 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants