Skip to content

[action] [PR:20539] Exit pytest with error code 16 if ptfhost is unreachable#20601

Merged
mssonicbld merged 1 commit intosonic-net:202505from
mssonicbld:cherry/202505/20539
Sep 10, 2025
Merged

[action] [PR:20539] Exit pytest with error code 16 if ptfhost is unreachable#20601
mssonicbld merged 1 commit intosonic-net:202505from
mssonicbld:cherry/202505/20539

Conversation

@mssonicbld
Copy link
Collaborator

Description of PR

Summary:
Fixes # (issue)

Type of change

  • Bug fix
  • Testbed and Framework(new/improvement)
  • New Test case
  • Skipped for non-supported platforms
  • Test case improvement

Back port request

  • 202205
  • 202305
  • 202311
  • 202405
  • 202411
  • 202505

Approach

What is the motivation for this PR?

On dualtor testbed, in very early setup, it will try to fixture run_icmp_responder_session, if ptf is unreachable, the script doesn't know about it and still use ptfhost.copy to copy file from local to pfthost.
In this PR, the script will capture this exception and ensure to exit pytest early, no need to run any more cases on this unhealthy testbed, which wastes time and also avoids uploading many noise failed test results.
In ElasticTest, if ptfhost unreachable on one testbed, case failed on this testbed, and will pick up another testbed to run, it will generate many flaky results. It's better to exit pytest early and this testbed will be kicked out and no more other flaky results generated.

Similar PR was filed before #10243

Test log before:

____________ ERROR at setup of test_ecn_during_encap_on_standby[6] _____________

duthosts = [<MultiAsicSonicHost str3-8101c1-05>, <MultiAsicSonicHost str3-8101c1-06>]
duthost = <MultiAsicSonicHost str3-8101c1-05>
ptfhost = <tests.common.devices.ptf.PTFHost object at 0x7f94d040e8b0>
tbinfo = {'auto_recover': 'True', 'comment': 'yawenni', 'conf-name': 'vms66-dual-t0-8101c1-03', 'duts': ['str3-8101c1-05', 'str3-8101c1-06'], ...}

 @pytest.fixture(scope="session", autouse=True)
 def run_icmp_responder_session(duthosts, duthost, ptfhost, tbinfo):
 """Run icmp_responder on ptfhost session-wise on dualtor testbeds with active-active ports."""
 # No vlan is available on non-t0 testbed, so skip this fixture
 if "dualtor-mixed" not in tbinfo["topo"]["name"] and "dualtor-aa" not in tbinfo["topo"]["name"]:
 logger.info("Skip running icmp_responder at session level, "
 "it is only for dualtor testbed with active-active mux ports.")
 yield
 return
 
 global icmp_responder_session_started
 
 update_linkmgrd_probe_interval(duthosts, tbinfo, PROBER_INTERVAL_MS)
 duthosts.shell("config save -y")
 
 duthost = duthosts[0]
 logger.debug("Copy icmp_responder.py to ptfhost '{0}'".format(ptfhost.hostname))
> ptfhost.copy(src=os.path.join(SCRIPTS_SRC_DIR, ICMP_RESPONDER_PY), dest=OPT_DIR)

duthost = <MultiAsicSonicHost str3-8101c1-05>
duthosts = [<MultiAsicSonicHost str3-8101c1-05>, <MultiAsicSonicHost str3-8101c1-06>]
ptfhost = <tests.common.devices.ptf.PTFHost object at 0x7f94d040e8b0>
tbinfo = {'auto_recover': 'True', 'comment': 'yawenni', 'conf-name': 'vms66-dual-t0-8101c1-03', 'duts': ['str3-8101c1-05', 'str3-8101c1-06'], ...}

common/fixtures/ptfhost_utils.py:322: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
common/devices/base.py:105: in _run
 res = self.module(*module_args, **complex_args)[self.hostname]
 complex_args = {'dest': '/opt', 'src': 'scripts/icmp_responder.py'}
 filename = '/var/src/sonic-mgmt_vms66-dual-t0-8101c1-03/tests/common/fixtures/ptfhost_utils.py'
 function_name = 'run_icmp_responder_session'
 index = 0
 line_number = 322
 lines = [' ptfhost.copy(src=os.path.join(SCRIPTS_SRC_DIR, ICMP_RESPONDER_PY), dest=OPT_DIR)\n']
 module_args = []
 module_async = False
 module_ignore_errors = False
 previous_frame = <frame at 0x11df64e0, file '/var/src/sonic-mgmt_vms66-dual-t0-8101c1-03/tests/common/fixtures/ptfhost_utils.py', line 322, code run_icmp_responder_session>
 self = <tests.common.devices.ptf.PTFHost object at 0x7f94d040e8b0>
 verbose = True
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <pytest_ansible.module_dispatcher.v213.ModuleDispatcherV213 object at 0x7f94cb842ee0>
module_args = ()
complex_args = {'dest': '/opt', 'src': 'scripts/icmp_responder.py'}
hosts = [vms66-7], extra_hosts = [], no_hosts = False
args = ['pytest-ansible', 'vms66-7', '--connection=smart', '--become', '--become-method=sudo', '--become-user=root', ...]
verbosity = None, verbosity_syntax = '-vvvvv', argument = 'module-path'
arg_value = ['/var/src/sonic-mgmt_vms66-dual-t0-8101c1-03/ansible/library']
callback = <pytest_ansible.module_dispatcher.v213.ResultAccumulator object at 0x7f94cb997850>

 def _run(self, *module_args, **complex_args):
 """Execute an ansible adhoc command returning the result in a AdhocResult object."""
 # Assemble module argument string
 if module_args:
 complex_args.update({"_raw_params": " ".join(module_args)})
 
 # Assert hosts matching the provided pattern exist
 hosts = self.options["inventory_manager"].list_hosts()
 if "extra_inventory_manager" in self.options:
 extra_hosts = self.options["extra_inventory_manager"].list_hosts()
 else:
 extra_hosts = []
 no_hosts = False
 if len(hosts + extra_hosts) == 0:
 no_hosts = True
 warnings.warn("provided hosts list is empty, only localhost is available")
 
 self.options["inventory_manager"].subset(self.options.get("subset"))
 hosts = self.options["inventory_manager"].list_hosts(
 self.options["host_pattern"],
 )
 if "extra_inventory_manager" in self.options:
 self.options["extra_inventory_manager"].subset(self.options.get("subset"))
 extra_hosts = self.options["extra_inventory_manager"].list_hosts()
 else:
 extra_hosts = []
 if len(hosts + extra_hosts) == 0 and not no_hosts:
 raise ansible.errors.AnsibleError(
 "Specified hosts and/or --limit does not match any hosts.",
 )
 
 # Pass along cli options
 args = ["pytest-ansible"]
 verbosity = None
DEBUG:tests.conftest:[log_custom_msg] item: <Function test_ecn_during_encap_on_standby[6]>
INFO:root:Can not get Allure report URL. Please check logs
 for verbosity_syntax in ("-v", "-vv", "-vvv", "-vvvv", "-vvvvv"):
 if verbosity_syntax in sys.argv:
 verbosity = verbosity_syntax
 break
 if verbosity is not None:
 args.append(verbosity_syntax)
 args.extend([self.options["host_pattern"]])
 for argument in (
 "connection",
 "user",
 "become",
 "become_method",
 "become_user",
 "module_path",
 ):
 arg_value = self.options.get(argument)
 argument = argument.replace("_", "-")
 
 if arg_value in (None, False):
 continue
 
 if arg_value is True:
 args.append(f"--{argument}")
 else:
 args.append(f"--{argument}={arg_value}")
 
 # Use Ansible's own adhoc cli to parse the fake command line we created and then save it
 # into Ansible's global context
 adhoc = AdHocCLI(args)
 adhoc.parse()
 
 # And now we'll never speak of this again
 del adhoc
 
 # Initialize callbacks to capture module JSON responses
 callback = ResultAccumulator()
 
 kwargs = {
 "inventory": self.options["inventory_manager"],
 "variable_manager": self.options["variable_manager"],
 "loader": self.options["loader"],
 "stdout_callback": callback,
 "passwords": {"conn_pass": None, "become_pass": None},
 }
 
 kwargs_extra = {}
 # If we have an extra inventory, do the same that we did for the inventory
 if "extra_inventory_manager" in self.options:
 callback_extra = ResultAccumulator()
 
 kwargs_extra = {
 "inventory": self.options["extra_inventory_manager"],
 "variable_manager": self.options["extra_variable_manager"],
 "loader": self.options["extra_loader"],
 "stdout_callback": callback_extra,
 "passwords": {"conn_pass": None, "become_pass": None},
 }
 
 # create a pseudo-play to execute the specified module via a single task
 play_ds = {
 "name": "pytest-ansible",
 "hosts": self.options["host_pattern"],
 "become": self.options.get("become"),
 "become_user": self.options.get("become_user"),
 "gather_facts": "no",
 "tasks": [
 {
 "action": {
 "module": self.options["module_name"],
 "args": complex_args,
 },
 },
 ],
 }
 
 play = Play().load(
 play_ds,
 variable_manager=self.options["variable_manager"],
 loader=self.options["loader"],
 )
 play_extra = None
 if "extra_inventory_manager" in self.options:
 play_extra = Play().load(
 play_ds,
 variable_manager=self.options["extra_variable_manager"],
 loader=self.options["extra_loader"],
 )
 
 if HAS_CUSTOM_LOADER_SUPPORT:
 # Load the collection finder, unsupported, may change in future
 init_plugin_loader(COLLECTIONS_PATHS)
 
 # now create a task queue manager to execute the play
 tqm = None
 try:
 tqm = TaskQueueManager(**kwargs)
 tqm.run(play)
 finally:
 if tqm:
 tqm.cleanup()
 
 if "extra_inventory_manager" in self.options:
 tqm_extra = None
 try:
 tqm_extra = TaskQueueManager(**kwargs_extra)
 tqm_extra.run(play_extra)
 finally:
 if tqm_extra:
 tqm_extra.cleanup()
 
 # Raise exception if host(s) unreachable
 # FIXME - if multiple hosts were involved, should an exception be raised?
 if callback.unreachable:
> raise AnsibleConnectionFailure(
 "Host unreachable in the inventory",
 dark=callback.unreachable,
 contacted=callback.contacted,
 )
E pytest_ansible.errors.AnsibleConnectionFailure: Host unreachable in the inventory

arg_value = ['/var/src/sonic-mgmt_vms66-dual-t0-8101c1-03/ansible/library']
args = ['pytest-ansible', 'vms66-7', '--connection=smart', '--become', '--become-method=sudo', '--become-user=root', ...]
argument = 'module-path'
callback = <pytest_ansible.module_dispatcher.v213.ResultAccumulator object at 0x7f94cb997850>
complex_args = {'dest': '/opt', 'src': 'scripts/icmp_responder.py'}
extra_hosts = []
hosts = [vms66-7]
kwargs = {'inventory': <ansible.inventory.manager.InventoryManager object at 0x7f94d040ef70>, 'loader': <ansible.parsing.datalo...ss': None}, 'stdout_callback': <pytest_ansible.module_dispatcher.v213.ResultAccumulator object at 0x7f94cb997850>, ...}
kwargs_extra = {}
module_args = ()
no_hosts = False
play = pytest-ansible
play_ds = {'become': True, 'become_user': 'root', 'gather_facts': 'no', 'hosts': 'vms66-7', ...}
play_extra = None
self = <pytest_ansible.module_dispatcher.v213.ModuleDispatcherV213 object at 0x7f94cb842ee0>
tqm = <ansible.executor.task_queue_manager.TaskQueueManager object at 0x7f94d44868e0>
verbosity = None
verbosity_syntax = '-vvvvv'

Test log after:

 if callback.unreachable:
> raise AnsibleConnectionFailure(
 "Host unreachable in the inventory",
 dark=callback.unreachable,
 contacted=callback.contacted,
 )
E pytest_ansible.errors.AnsibleConnectionFailure: Host unreachable in the inventory

/usr/local/lib/python3.8/dist-packages/pytest_ansible/module_dispatcher/v213.py:232: AnsibleConnectionFailure

During handling of the above exception, another exception occurred:

duthosts = [<MultiAsicSonicHost str2-8101c1-01>, <MultiAsicSonicHost str2-8101c1-02>], duthost = <MultiAsicSonicHost str2-8101c1-01>, ptfhost = <tests.common.devices.ptf.PTFHost object at 0x7fc316a756a0>
tbinfo = {'auto_recover': 'True', 'comment': 'yawenni', 'conf-name': 'vms18-dual-t0-8101c1-01', 'duts': ['str2-8101c1-01', 'str2-8101c1-02'], ...}
request = <SubRequest 'run_icmp_responder_session' for <Function test_lldp[str2-8101c1-01-None]>>

 @pytest.fixture(scope="session", autouse=True)
 def run_icmp_responder_session(duthosts, duthost, ptfhost, tbinfo, request):
 """Run icmp_responder on ptfhost session-wise on dualtor testbeds with active-active ports."""
 # No vlan is available on non-t0 testbed, so skip this fixture
 if "dualtor-mixed" not in tbinfo["topo"]["name"] and "dualtor-aa" not in tbinfo["topo"]["name"]:
 logger.info("Skip running icmp_responder at session level, "
 "it is only for dualtor testbed with active-active mux ports.")
 yield
 return
 
 global icmp_responder_session_started
 
 update_linkmgrd_probe_interval(duthosts, tbinfo, PROBER_INTERVAL_MS)
 duthosts.shell("config save -y")
 
 duthost = duthosts[0]
 logger.debug("Copy icmp_responder.py to ptfhost '{0}'".format(ptfhost.hostname))
 try:
 ptfhost.copy(src=os.path.join(SCRIPTS_SRC_DIR, ICMP_RESPONDER_PY), dest=OPT_DIR)
 except AnsibleConnectionFailure as e:
 logger.error("Failed to copy files to ptfhost.")
 request.config.cache.set("ptfhost_unreachable", True)
> pt_assert(False, "!!! ptfhost unreachable !!! Exception: {}".format(repr(e)))
E Failed: !!! ptfhost unreachable !!! Exception: Host unreachable in the inventory

common/fixtures/ptfhost_utils.py:334: Failed

How did you do it?

Capture exception in run_icmp_responder_session , when ptf becomes unreachable, this is the first failed fixture. set session.exitstatus to 16 and make run_test.sh aware of this failure and exit pipeline early.

How did you verify/test it?

use run_test.sh to test when ptf is unreachable.

Any platform specific information?

Supported testbed topology if it's a new test case?

Documentation

…0539)

What is the motivation for this PR?
On dualtor testbed, in very early setup, it will try to fixture run_icmp_responder_session, if ptf is unreachable, the script doesn't know about it and still use ptfhost.copy to copy file from local to pfthost.
In this PR, the script will capture this exception and ensure to exit pytest early, no need to run any more cases on this unhealthy testbed, which wastes time and also avoids uploading many noise failed test results.
In ElasticTest, if ptfhost unreachable on one testbed, case failed on this testbed, and will pick up another testbed to run, it will generate many flaky results. It's better to exit pytest early and this testbed will be kicked out and no more other flaky results generated.

Similar PR was filed before sonic-net#10243

How did you do it?
Capture exception in run_icmp_responder_session , when ptf becomes unreachable, this is the first failed fixture. set session.exitstatus to 16 and make run_test.sh aware of this failure and exit pipeline early.

How did you verify/test it?
use run_test.sh to test when ptf is unreachable.

Signed-off-by: Zhaohui Sun <zhaohuisun@microsoft.com>
@mssonicbld
Copy link
Collaborator Author

Original PR: #20539

@mssonicbld
Copy link
Collaborator Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld mssonicbld merged commit 79ed14a into sonic-net:202505 Sep 10, 2025
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants