Skip to content

Exit pytest with error code 16 if ptfhost is unreachable#20539

Merged
wangxin merged 3 commits intosonic-net:masterfrom
ZhaohuiS:ZhaohuiS/ptf_unreachable
Sep 9, 2025
Merged

Exit pytest with error code 16 if ptfhost is unreachable#20539
wangxin merged 3 commits intosonic-net:masterfrom
ZhaohuiS:ZhaohuiS/ptf_unreachable

Conversation

@ZhaohuiS
Copy link
Contributor

@ZhaohuiS ZhaohuiS commented Sep 5, 2025

Description of PR

Summary:
Fixes # (issue)

Type of change

  • Bug fix
  • Testbed and Framework(new/improvement)
  • New Test case
    • Skipped for non-supported platforms
  • Test case improvement

Back port request

  • 202205
  • 202305
  • 202311
  • 202405
  • 202411
  • 202505

Approach

What is the motivation for this PR?

On dualtor testbed, in very early setup, it will try to fixture run_icmp_responder_session, if ptf is unreachable, the script doesn't know about it and still use ptfhost.copy to copy file from local to pfthost.
In this PR, the script will capture this exception and ensure to exit pytest early, no need to run any more cases on this unhealthy testbed, which wastes time and also avoids uploading many noise failed test results.
In ElasticTest, if ptfhost unreachable on one testbed, case failed on this testbed, and will pick up another testbed to run, it will generate many flaky results. It's better to exit pytest early and this testbed will be kicked out and no more other flaky results generated.

Similar PR was filed before #10243

Test log before:

____________ ERROR at setup of test_ecn_during_encap_on_standby[6] _____________

duthosts = [<MultiAsicSonicHost str3-8101c1-05>, <MultiAsicSonicHost str3-8101c1-06>]
duthost = <MultiAsicSonicHost str3-8101c1-05>
ptfhost = <tests.common.devices.ptf.PTFHost object at 0x7f94d040e8b0>
tbinfo = {'auto_recover': 'True', 'comment': 'yawenni', 'conf-name': 'vms66-dual-t0-8101c1-03', 'duts': ['str3-8101c1-05', 'str3-8101c1-06'], ...}

    @pytest.fixture(scope="session", autouse=True)
    def run_icmp_responder_session(duthosts, duthost, ptfhost, tbinfo):
        """Run icmp_responder on ptfhost session-wise on dualtor testbeds with active-active ports."""
        # No vlan is available on non-t0 testbed, so skip this fixture
        if "dualtor-mixed" not in tbinfo["topo"]["name"] and "dualtor-aa" not in tbinfo["topo"]["name"]:
            logger.info("Skip running icmp_responder at session level, "
                        "it is only for dualtor testbed with active-active mux ports.")
            yield
            return
    
        global icmp_responder_session_started
    
        update_linkmgrd_probe_interval(duthosts, tbinfo, PROBER_INTERVAL_MS)
        duthosts.shell("config save -y")
    
        duthost = duthosts[0]
        logger.debug("Copy icmp_responder.py to ptfhost '{0}'".format(ptfhost.hostname))
>       ptfhost.copy(src=os.path.join(SCRIPTS_SRC_DIR, ICMP_RESPONDER_PY), dest=OPT_DIR)

duthost    = <MultiAsicSonicHost str3-8101c1-05>
duthosts   = [<MultiAsicSonicHost str3-8101c1-05>, <MultiAsicSonicHost str3-8101c1-06>]
ptfhost    = <tests.common.devices.ptf.PTFHost object at 0x7f94d040e8b0>
tbinfo     = {'auto_recover': 'True', 'comment': 'yawenni', 'conf-name': 'vms66-dual-t0-8101c1-03', 'duts': ['str3-8101c1-05', 'str3-8101c1-06'], ...}

common/fixtures/ptfhost_utils.py:322: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
common/devices/base.py:105: in _run
    res = self.module(*module_args, **complex_args)[self.hostname]
        complex_args = {'dest': '/opt', 'src': 'scripts/icmp_responder.py'}
        filename   = '/var/src/sonic-mgmt_vms66-dual-t0-8101c1-03/tests/common/fixtures/ptfhost_utils.py'
        function_name = 'run_icmp_responder_session'
        index      = 0
        line_number = 322
        lines      = ['    ptfhost.copy(src=os.path.join(SCRIPTS_SRC_DIR, ICMP_RESPONDER_PY), dest=OPT_DIR)\n']
        module_args = []
        module_async = False
        module_ignore_errors = False
        previous_frame = <frame at 0x11df64e0, file '/var/src/sonic-mgmt_vms66-dual-t0-8101c1-03/tests/common/fixtures/ptfhost_utils.py', line 322, code run_icmp_responder_session>
        self       = <tests.common.devices.ptf.PTFHost object at 0x7f94d040e8b0>
        verbose    = True
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <pytest_ansible.module_dispatcher.v213.ModuleDispatcherV213 object at 0x7f94cb842ee0>
module_args = ()
complex_args = {'dest': '/opt', 'src': 'scripts/icmp_responder.py'}
hosts = [vms66-7], extra_hosts = [], no_hosts = False
args = ['pytest-ansible', 'vms66-7', '--connection=smart', '--become', '--become-method=sudo', '--become-user=root', ...]
verbosity = None, verbosity_syntax = '-vvvvv', argument = 'module-path'
arg_value = ['/var/src/sonic-mgmt_vms66-dual-t0-8101c1-03/ansible/library']
callback = <pytest_ansible.module_dispatcher.v213.ResultAccumulator object at 0x7f94cb997850>

    def _run(self, *module_args, **complex_args):
        """Execute an ansible adhoc command returning the result in a AdhocResult object."""
        # Assemble module argument string
        if module_args:
            complex_args.update({"_raw_params": " ".join(module_args)})
    
        # Assert hosts matching the provided pattern exist
        hosts = self.options["inventory_manager"].list_hosts()
        if "extra_inventory_manager" in self.options:
            extra_hosts = self.options["extra_inventory_manager"].list_hosts()
        else:
            extra_hosts = []
        no_hosts = False
        if len(hosts + extra_hosts) == 0:
            no_hosts = True
            warnings.warn("provided hosts list is empty, only localhost is available")
    
        self.options["inventory_manager"].subset(self.options.get("subset"))
        hosts = self.options["inventory_manager"].list_hosts(
            self.options["host_pattern"],
        )
        if "extra_inventory_manager" in self.options:
            self.options["extra_inventory_manager"].subset(self.options.get("subset"))
            extra_hosts = self.options["extra_inventory_manager"].list_hosts()
        else:
            extra_hosts = []
        if len(hosts + extra_hosts) == 0 and not no_hosts:
            raise ansible.errors.AnsibleError(
                "Specified hosts and/or --limit does not match any hosts.",
            )
    
        # Pass along cli options
        args = ["pytest-ansible"]
        verbosity = None
DEBUG:tests.conftest:[log_custom_msg] item: <Function test_ecn_during_encap_on_standby[6]>
INFO:root:Can not get Allure report URL. Please check logs
        for verbosity_syntax in ("-v", "-vv", "-vvv", "-vvvv", "-vvvvv"):
            if verbosity_syntax in sys.argv:
                verbosity = verbosity_syntax
                break
        if verbosity is not None:
            args.append(verbosity_syntax)
        args.extend([self.options["host_pattern"]])
        for argument in (
            "connection",
            "user",
            "become",
            "become_method",
            "become_user",
            "module_path",
        ):
            arg_value = self.options.get(argument)
            argument = argument.replace("_", "-")
    
            if arg_value in (None, False):
                continue
    
            if arg_value is True:
                args.append(f"--{argument}")
            else:
                args.append(f"--{argument}={arg_value}")
    
        # Use Ansible's own adhoc cli to parse the fake command line we created and then save it
        # into Ansible's global context
        adhoc = AdHocCLI(args)
        adhoc.parse()
    
        # And now we'll never speak of this again
        del adhoc
    
        # Initialize callbacks to capture module JSON responses
        callback = ResultAccumulator()
    
        kwargs = {
            "inventory": self.options["inventory_manager"],
            "variable_manager": self.options["variable_manager"],
            "loader": self.options["loader"],
            "stdout_callback": callback,
            "passwords": {"conn_pass": None, "become_pass": None},
        }
    
        kwargs_extra = {}
        # If we have an extra inventory, do the same that we did for the inventory
        if "extra_inventory_manager" in self.options:
            callback_extra = ResultAccumulator()
    
            kwargs_extra = {
                "inventory": self.options["extra_inventory_manager"],
                "variable_manager": self.options["extra_variable_manager"],
                "loader": self.options["extra_loader"],
                "stdout_callback": callback_extra,
                "passwords": {"conn_pass": None, "become_pass": None},
            }
    
        # create a pseudo-play to execute the specified module via a single task
        play_ds = {
            "name": "pytest-ansible",
            "hosts": self.options["host_pattern"],
            "become": self.options.get("become"),
            "become_user": self.options.get("become_user"),
            "gather_facts": "no",
            "tasks": [
                {
                    "action": {
                        "module": self.options["module_name"],
                        "args": complex_args,
                    },
                },
            ],
        }
    
        play = Play().load(
            play_ds,
            variable_manager=self.options["variable_manager"],
            loader=self.options["loader"],
        )
        play_extra = None
        if "extra_inventory_manager" in self.options:
            play_extra = Play().load(
                play_ds,
                variable_manager=self.options["extra_variable_manager"],
                loader=self.options["extra_loader"],
            )
    
        if HAS_CUSTOM_LOADER_SUPPORT:
            # Load the collection finder, unsupported, may change in future
            init_plugin_loader(COLLECTIONS_PATHS)
    
        # now create a task queue manager to execute the play
        tqm = None
        try:
            tqm = TaskQueueManager(**kwargs)
            tqm.run(play)
        finally:
            if tqm:
                tqm.cleanup()
    
        if "extra_inventory_manager" in self.options:
            tqm_extra = None
            try:
                tqm_extra = TaskQueueManager(**kwargs_extra)
                tqm_extra.run(play_extra)
            finally:
                if tqm_extra:
                    tqm_extra.cleanup()
    
        # Raise exception if host(s) unreachable
        # FIXME - if multiple hosts were involved, should an exception be raised?
        if callback.unreachable:
>           raise AnsibleConnectionFailure(
                "Host unreachable in the inventory",
                dark=callback.unreachable,
                contacted=callback.contacted,
            )
E           pytest_ansible.errors.AnsibleConnectionFailure: Host unreachable in the inventory

arg_value  = ['/var/src/sonic-mgmt_vms66-dual-t0-8101c1-03/ansible/library']
args       = ['pytest-ansible', 'vms66-7', '--connection=smart', '--become', '--become-method=sudo', '--become-user=root', ...]
argument   = 'module-path'
callback   = <pytest_ansible.module_dispatcher.v213.ResultAccumulator object at 0x7f94cb997850>
complex_args = {'dest': '/opt', 'src': 'scripts/icmp_responder.py'}
extra_hosts = []
hosts      = [vms66-7]
kwargs     = {'inventory': <ansible.inventory.manager.InventoryManager object at 0x7f94d040ef70>, 'loader': <ansible.parsing.datalo...ss': None}, 'stdout_callback': <pytest_ansible.module_dispatcher.v213.ResultAccumulator object at 0x7f94cb997850>, ...}
kwargs_extra = {}
module_args = ()
no_hosts   = False
play       = pytest-ansible
play_ds    = {'become': True, 'become_user': 'root', 'gather_facts': 'no', 'hosts': 'vms66-7', ...}
play_extra = None
self       = <pytest_ansible.module_dispatcher.v213.ModuleDispatcherV213 object at 0x7f94cb842ee0>
tqm        = <ansible.executor.task_queue_manager.TaskQueueManager object at 0x7f94d44868e0>
verbosity  = None
verbosity_syntax = '-vvvvv'

Test log after:

        if callback.unreachable:
>           raise AnsibleConnectionFailure(
                "Host unreachable in the inventory",
                dark=callback.unreachable,
                contacted=callback.contacted,
            )
E           pytest_ansible.errors.AnsibleConnectionFailure: Host unreachable in the inventory

/usr/local/lib/python3.8/dist-packages/pytest_ansible/module_dispatcher/v213.py:232: AnsibleConnectionFailure

During handling of the above exception, another exception occurred:

duthosts = [<MultiAsicSonicHost str2-8101c1-01>, <MultiAsicSonicHost str2-8101c1-02>], duthost = <MultiAsicSonicHost str2-8101c1-01>, ptfhost = <tests.common.devices.ptf.PTFHost object at 0x7fc316a756a0>
tbinfo = {'auto_recover': 'True', 'comment': 'yawenni', 'conf-name': 'vms18-dual-t0-8101c1-01', 'duts': ['str2-8101c1-01', 'str2-8101c1-02'], ...}
request = <SubRequest 'run_icmp_responder_session' for <Function test_lldp[str2-8101c1-01-None]>>

    @pytest.fixture(scope="session", autouse=True)
    def run_icmp_responder_session(duthosts, duthost, ptfhost, tbinfo, request):
        """Run icmp_responder on ptfhost session-wise on dualtor testbeds with active-active ports."""
        # No vlan is available on non-t0 testbed, so skip this fixture
        if "dualtor-mixed" not in tbinfo["topo"]["name"] and "dualtor-aa" not in tbinfo["topo"]["name"]:
            logger.info("Skip running icmp_responder at session level, "
                        "it is only for dualtor testbed with active-active mux ports.")
            yield
            return
    
        global icmp_responder_session_started
    
        update_linkmgrd_probe_interval(duthosts, tbinfo, PROBER_INTERVAL_MS)
        duthosts.shell("config save -y")
    
        duthost = duthosts[0]
        logger.debug("Copy icmp_responder.py to ptfhost '{0}'".format(ptfhost.hostname))
        try:
            ptfhost.copy(src=os.path.join(SCRIPTS_SRC_DIR, ICMP_RESPONDER_PY), dest=OPT_DIR)
        except AnsibleConnectionFailure as e:
            logger.error("Failed to copy files to ptfhost.")
            request.config.cache.set("ptfhost_unreachable", True)
>           pt_assert(False, "!!! ptfhost unreachable !!! Exception: {}".format(repr(e)))
E           Failed: !!! ptfhost unreachable !!! Exception: Host unreachable in the inventory

common/fixtures/ptfhost_utils.py:334: Failed

How did you do it?

Capture exception in run_icmp_responder_session , when ptf becomes unreachable, this is the first failed fixture. set session.exitstatus to 16 and make run_test.sh aware of this failure and exit pipeline early.

How did you verify/test it?

use run_test.sh to test when ptf is unreachable.

Any platform specific information?

Supported testbed topology if it's a new test case?

Documentation

Signed-off-by: Zhaohui Sun <zhaohuisun@microsoft.com>
@ZhaohuiS ZhaohuiS requested review from a team and wangxin as code owners September 5, 2025 10:08
@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@ZhaohuiS ZhaohuiS requested a review from lolyu September 5, 2025 10:09
Copy link
Collaborator

@lolyu lolyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks Zhaohui

except BaseException as e:
logger.error("Failed to copy files to ptfhost.")
request.config.cache.set("ptfhost_unreachable", True)
pt_assert(False, "!!! ptfhost unreachable !!! Exception: {}".format(repr(e)))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you know the Exception is definitely PTF unreachable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wangxin most of time, the unreachable PTF to cause copy file failure, but you are right, I change words to exception.
Please review it again, thanks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wangxin Thank you for your suggestion.
It turns out pytest_ansible.errors.AnsibleConnectionFailure works, but ansible.errors.AnsibleConnectionFailure doesn't work.

Correct:
from pytest_ansible.errors import AnsibleConnectionFailure

Wrong:
from ansible.errors import AnsibleConnectionFailure

Signed-off-by: Zhaohui Sun <zhaohuisun@microsoft.com>
@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

ptfhost.copy(src=os.path.join(SCRIPTS_SRC_DIR, ICMP_RESPONDER_PY), dest=OPT_DIR)
try:
ptfhost.copy(src=os.path.join(SCRIPTS_SRC_DIR, ICMP_RESPONDER_PY), dest=OPT_DIR)
except BaseException as e:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only exception AnsibleConnectionFailure means that the PTF is unreachable. It is better to capture this AnsibleConnectionFailure exception here and set "ptfhost_exception" to True. For other exceptions, they could be different issues and should not be treated as ptf unreachable.

Signed-off-by: Zhaohui Sun <zhaohuisun@microsoft.com>
@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@wangxin wangxin merged commit e81625f into sonic-net:master Sep 9, 2025
20 checks passed
mssonicbld pushed a commit to mssonicbld/sonic-mgmt that referenced this pull request Sep 10, 2025
…0539)

What is the motivation for this PR?
On dualtor testbed, in very early setup, it will try to fixture run_icmp_responder_session, if ptf is unreachable, the script doesn't know about it and still use ptfhost.copy to copy file from local to pfthost.
In this PR, the script will capture this exception and ensure to exit pytest early, no need to run any more cases on this unhealthy testbed, which wastes time and also avoids uploading many noise failed test results.
In ElasticTest, if ptfhost unreachable on one testbed, case failed on this testbed, and will pick up another testbed to run, it will generate many flaky results. It's better to exit pytest early and this testbed will be kicked out and no more other flaky results generated.

Similar PR was filed before sonic-net#10243

How did you do it?
Capture exception in run_icmp_responder_session , when ptf becomes unreachable, this is the first failed fixture. set session.exitstatus to 16 and make run_test.sh aware of this failure and exit pipeline early.

How did you verify/test it?
use run_test.sh to test when ptf is unreachable.

Signed-off-by: Zhaohui Sun <zhaohuisun@microsoft.com>
@mssonicbld
Copy link
Collaborator

Cherry-pick PR to 202505: #20601

mssonicbld pushed a commit that referenced this pull request Sep 10, 2025
What is the motivation for this PR?
On dualtor testbed, in very early setup, it will try to fixture run_icmp_responder_session, if ptf is unreachable, the script doesn't know about it and still use ptfhost.copy to copy file from local to pfthost.
In this PR, the script will capture this exception and ensure to exit pytest early, no need to run any more cases on this unhealthy testbed, which wastes time and also avoids uploading many noise failed test results.
In ElasticTest, if ptfhost unreachable on one testbed, case failed on this testbed, and will pick up another testbed to run, it will generate many flaky results. It's better to exit pytest early and this testbed will be kicked out and no more other flaky results generated.

Similar PR was filed before #10243

How did you do it?
Capture exception in run_icmp_responder_session , when ptf becomes unreachable, this is the first failed fixture. set session.exitstatus to 16 and make run_test.sh aware of this failure and exit pipeline early.

How did you verify/test it?
use run_test.sh to test when ptf is unreachable.

Signed-off-by: Zhaohui Sun <zhaohuisun@microsoft.com>
xixuej pushed a commit to xixuej/sonic-mgmt that referenced this pull request Sep 17, 2025
…0539)

What is the motivation for this PR?
On dualtor testbed, in very early setup, it will try to fixture run_icmp_responder_session, if ptf is unreachable, the script doesn't know about it and still use ptfhost.copy to copy file from local to pfthost.
In this PR, the script will capture this exception and ensure to exit pytest early, no need to run any more cases on this unhealthy testbed, which wastes time and also avoids uploading many noise failed test results.
In ElasticTest, if ptfhost unreachable on one testbed, case failed on this testbed, and will pick up another testbed to run, it will generate many flaky results. It's better to exit pytest early and this testbed will be kicked out and no more other flaky results generated.

Similar PR was filed before sonic-net#10243

How did you do it?
Capture exception in run_icmp_responder_session , when ptf becomes unreachable, this is the first failed fixture. set session.exitstatus to 16 and make run_test.sh aware of this failure and exit pipeline early.

How did you verify/test it?
use run_test.sh to test when ptf is unreachable.

Signed-off-by: Zhaohui Sun <zhaohuisun@microsoft.com>
vidyac86 pushed a commit to vidyac86/sonic-mgmt that referenced this pull request Oct 23, 2025
…0539)

What is the motivation for this PR?
On dualtor testbed, in very early setup, it will try to fixture run_icmp_responder_session, if ptf is unreachable, the script doesn't know about it and still use ptfhost.copy to copy file from local to pfthost.
In this PR, the script will capture this exception and ensure to exit pytest early, no need to run any more cases on this unhealthy testbed, which wastes time and also avoids uploading many noise failed test results.
In ElasticTest, if ptfhost unreachable on one testbed, case failed on this testbed, and will pick up another testbed to run, it will generate many flaky results. It's better to exit pytest early and this testbed will be kicked out and no more other flaky results generated.

Similar PR was filed before sonic-net#10243

How did you do it?
Capture exception in run_icmp_responder_session , when ptf becomes unreachable, this is the first failed fixture. set session.exitstatus to 16 and make run_test.sh aware of this failure and exit pipeline early.

How did you verify/test it?
use run_test.sh to test when ptf is unreachable.

Signed-off-by: Zhaohui Sun <zhaohuisun@microsoft.com>
opcoder0 pushed a commit to opcoder0/sonic-mgmt that referenced this pull request Dec 8, 2025
…0539)

What is the motivation for this PR?
On dualtor testbed, in very early setup, it will try to fixture run_icmp_responder_session, if ptf is unreachable, the script doesn't know about it and still use ptfhost.copy to copy file from local to pfthost.
In this PR, the script will capture this exception and ensure to exit pytest early, no need to run any more cases on this unhealthy testbed, which wastes time and also avoids uploading many noise failed test results.
In ElasticTest, if ptfhost unreachable on one testbed, case failed on this testbed, and will pick up another testbed to run, it will generate many flaky results. It's better to exit pytest early and this testbed will be kicked out and no more other flaky results generated.

Similar PR was filed before sonic-net#10243

How did you do it?
Capture exception in run_icmp_responder_session , when ptf becomes unreachable, this is the first failed fixture. set session.exitstatus to 16 and make run_test.sh aware of this failure and exit pipeline early.

How did you verify/test it?
use run_test.sh to test when ptf is unreachable.

Signed-off-by: Zhaohui Sun <zhaohuisun@microsoft.com>
gshemesh2 pushed a commit to gshemesh2/sonic-mgmt that referenced this pull request Dec 16, 2025
…0539)

What is the motivation for this PR?
On dualtor testbed, in very early setup, it will try to fixture run_icmp_responder_session, if ptf is unreachable, the script doesn't know about it and still use ptfhost.copy to copy file from local to pfthost.
In this PR, the script will capture this exception and ensure to exit pytest early, no need to run any more cases on this unhealthy testbed, which wastes time and also avoids uploading many noise failed test results.
In ElasticTest, if ptfhost unreachable on one testbed, case failed on this testbed, and will pick up another testbed to run, it will generate many flaky results. It's better to exit pytest early and this testbed will be kicked out and no more other flaky results generated.

Similar PR was filed before sonic-net#10243

How did you do it?
Capture exception in run_icmp_responder_session , when ptf becomes unreachable, this is the first failed fixture. set session.exitstatus to 16 and make run_test.sh aware of this failure and exit pipeline early.

How did you verify/test it?
use run_test.sh to test when ptf is unreachable.

Signed-off-by: Zhaohui Sun <zhaohuisun@microsoft.com>
Signed-off-by: Guy Shemesh <gshemesh@nvidia.com>
AharonMalkin pushed a commit to AharonMalkin/sonic-mgmt that referenced this pull request Dec 16, 2025
…0539)

What is the motivation for this PR?
On dualtor testbed, in very early setup, it will try to fixture run_icmp_responder_session, if ptf is unreachable, the script doesn't know about it and still use ptfhost.copy to copy file from local to pfthost.
In this PR, the script will capture this exception and ensure to exit pytest early, no need to run any more cases on this unhealthy testbed, which wastes time and also avoids uploading many noise failed test results.
In ElasticTest, if ptfhost unreachable on one testbed, case failed on this testbed, and will pick up another testbed to run, it will generate many flaky results. It's better to exit pytest early and this testbed will be kicked out and no more other flaky results generated.

Similar PR was filed before sonic-net#10243

How did you do it?
Capture exception in run_icmp_responder_session , when ptf becomes unreachable, this is the first failed fixture. set session.exitstatus to 16 and make run_test.sh aware of this failure and exit pipeline early.

How did you verify/test it?
use run_test.sh to test when ptf is unreachable.

Signed-off-by: Zhaohui Sun <zhaohuisun@microsoft.com>
Signed-off-by: Aharon Malkin <amalkin@nvidia.com>
gshemesh2 pushed a commit to gshemesh2/sonic-mgmt that referenced this pull request Dec 21, 2025
…0539)

What is the motivation for this PR?
On dualtor testbed, in very early setup, it will try to fixture run_icmp_responder_session, if ptf is unreachable, the script doesn't know about it and still use ptfhost.copy to copy file from local to pfthost.
In this PR, the script will capture this exception and ensure to exit pytest early, no need to run any more cases on this unhealthy testbed, which wastes time and also avoids uploading many noise failed test results.
In ElasticTest, if ptfhost unreachable on one testbed, case failed on this testbed, and will pick up another testbed to run, it will generate many flaky results. It's better to exit pytest early and this testbed will be kicked out and no more other flaky results generated.

Similar PR was filed before sonic-net#10243

How did you do it?
Capture exception in run_icmp_responder_session , when ptf becomes unreachable, this is the first failed fixture. set session.exitstatus to 16 and make run_test.sh aware of this failure and exit pipeline early.

How did you verify/test it?
use run_test.sh to test when ptf is unreachable.

Signed-off-by: Zhaohui Sun <zhaohuisun@microsoft.com>
Signed-off-by: Guy Shemesh <gshemesh@nvidia.com>
venu-nexthop pushed a commit to venu-nexthop/sonic-mgmt that referenced this pull request Jan 13, 2026
…0539)

What is the motivation for this PR?
On dualtor testbed, in very early setup, it will try to fixture run_icmp_responder_session, if ptf is unreachable, the script doesn't know about it and still use ptfhost.copy to copy file from local to pfthost.
In this PR, the script will capture this exception and ensure to exit pytest early, no need to run any more cases on this unhealthy testbed, which wastes time and also avoids uploading many noise failed test results.
In ElasticTest, if ptfhost unreachable on one testbed, case failed on this testbed, and will pick up another testbed to run, it will generate many flaky results. It's better to exit pytest early and this testbed will be kicked out and no more other flaky results generated.

Similar PR was filed before sonic-net#10243

How did you do it?
Capture exception in run_icmp_responder_session , when ptf becomes unreachable, this is the first failed fixture. set session.exitstatus to 16 and make run_test.sh aware of this failure and exit pipeline early.

How did you verify/test it?
use run_test.sh to test when ptf is unreachable.

Signed-off-by: Zhaohui Sun <zhaohuisun@microsoft.com>
gshemesh2 pushed a commit to gshemesh2/sonic-mgmt that referenced this pull request Jan 26, 2026
…0539)

What is the motivation for this PR?
On dualtor testbed, in very early setup, it will try to fixture run_icmp_responder_session, if ptf is unreachable, the script doesn't know about it and still use ptfhost.copy to copy file from local to pfthost.
In this PR, the script will capture this exception and ensure to exit pytest early, no need to run any more cases on this unhealthy testbed, which wastes time and also avoids uploading many noise failed test results.
In ElasticTest, if ptfhost unreachable on one testbed, case failed on this testbed, and will pick up another testbed to run, it will generate many flaky results. It's better to exit pytest early and this testbed will be kicked out and no more other flaky results generated.

Similar PR was filed before sonic-net#10243

How did you do it?
Capture exception in run_icmp_responder_session , when ptf becomes unreachable, this is the first failed fixture. set session.exitstatus to 16 and make run_test.sh aware of this failure and exit pipeline early.

How did you verify/test it?
use run_test.sh to test when ptf is unreachable.

Signed-off-by: Zhaohui Sun <zhaohuisun@microsoft.com>
Signed-off-by: Guy Shemesh <gshemesh@nvidia.com>
lakshmi-nexthop pushed a commit to lakshmi-nexthop/sonic-mgmt that referenced this pull request Jan 28, 2026
…0539)

What is the motivation for this PR?
On dualtor testbed, in very early setup, it will try to fixture run_icmp_responder_session, if ptf is unreachable, the script doesn't know about it and still use ptfhost.copy to copy file from local to pfthost.
In this PR, the script will capture this exception and ensure to exit pytest early, no need to run any more cases on this unhealthy testbed, which wastes time and also avoids uploading many noise failed test results.
In ElasticTest, if ptfhost unreachable on one testbed, case failed on this testbed, and will pick up another testbed to run, it will generate many flaky results. It's better to exit pytest early and this testbed will be kicked out and no more other flaky results generated.

Similar PR was filed before sonic-net#10243

How did you do it?
Capture exception in run_icmp_responder_session , when ptf becomes unreachable, this is the first failed fixture. set session.exitstatus to 16 and make run_test.sh aware of this failure and exit pipeline early.

How did you verify/test it?
use run_test.sh to test when ptf is unreachable.

Signed-off-by: Zhaohui Sun <zhaohuisun@microsoft.com>
Signed-off-by: Lakshmi Yarramaneni <lakshmi@nexthop.ai>
ytzur1 pushed a commit to ytzur1/sonic-mgmt that referenced this pull request Feb 2, 2026
…0539)

What is the motivation for this PR?
On dualtor testbed, in very early setup, it will try to fixture run_icmp_responder_session, if ptf is unreachable, the script doesn't know about it and still use ptfhost.copy to copy file from local to pfthost.
In this PR, the script will capture this exception and ensure to exit pytest early, no need to run any more cases on this unhealthy testbed, which wastes time and also avoids uploading many noise failed test results.
In ElasticTest, if ptfhost unreachable on one testbed, case failed on this testbed, and will pick up another testbed to run, it will generate many flaky results. It's better to exit pytest early and this testbed will be kicked out and no more other flaky results generated.

Similar PR was filed before sonic-net#10243

How did you do it?
Capture exception in run_icmp_responder_session , when ptf becomes unreachable, this is the first failed fixture. set session.exitstatus to 16 and make run_test.sh aware of this failure and exit pipeline early.

How did you verify/test it?
use run_test.sh to test when ptf is unreachable.

Signed-off-by: Zhaohui Sun <zhaohuisun@microsoft.com>
Signed-off-by: Yael Tzur <ytzur@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants