Skip to content

test_qos_sai: prevent cascading failures after fixture error#23283

Draft
darius-nexthop wants to merge 2 commits intosonic-net:masterfrom
nexthop-ai:skip-qos-tests-setup
Draft

test_qos_sai: prevent cascading failures after fixture error#23283
darius-nexthop wants to merge 2 commits intosonic-net:masterfrom
nexthop-ai:skip-qos-tests-setup

Conversation

@darius-nexthop
Copy link
Contributor

Description of PR

Summary:
Prevent cascading ERROR statuses in tests/qos/test_qos_sai.py when the testParameter fixture setup fails, by skipping remaining tests in the affected parameter set.

Fixes #23282

Type of change

  • Bug fix
  • Testbed and Framework(new/improvement)
  • New Test case
    • Skipped for non-supported platforms
  • Test case improvement

Back port request

  • 202205
  • 202305
  • 202311
  • 202405
  • 202411
  • 202505
  • 202511

Approach

What is the motivation for this PR?

A single infrastructure/fixture failure in testParameter currently causes all subsequent tests in the same parameter set to report as ERROR, which obscures the true root cause and significantly inflates the reported failure count for QoS SAI runs.

How did you do it?

Added pytest hooks (pytest_runtest_makereport and pytest_runtest_setup) that detect fixture setup failures per parameter set and track them in a small global map; once a parameter set is known-bad, subsequent tests in that set are short-circuited and reported as SKIPPED instead of re-running the broken setup and cascading more ERRORs.

How did you verify/test it?

Ran tests/qos/test_qos_sai.py on a setup where testParameter reliably fails during fixture setup and confirmed that only the first testParameter for each parameter set reports ERROR, while all remaining tests in those parameter sets are reported as SKIPPED, reducing the output from ~130+ cascading ERRORs to a small, fixed set of ERRORs plus clean SKIPPED statuses.

Any platform specific information?

N/A

Supported testbed topology if it's a new test case?

Documentation

When testParameter fixture setup fails, all subsequent tests in that
parameter set are currently reported as ERROR, which obscures the root
cause and tanks pass rate.

Add pytest hooks to detect fixture setup failure in test_qos_sai
parameter sets and skim remaining tests in that parameter set as
SKIPPED.

This preserves clear ERRORs for the root cause while avoiding cascading
failures and reducing the blast radius in the test report.

Signed-off-by: Darius Grassi <[email protected]>
@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

Signed-off-by: Darius Grassi <[email protected]>
@mssonicbld
Copy link
Collaborator

/azp run

Copy link
Collaborator

@StormLiangMS StormLiangMS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@darius-nexthop⚠️ Good idea, implementation concern

Template: ✅ OK
DCO: ✅ signed
CI: ✅ all green

Code Review:

[Important] Module-level _fixture_failures dict persists across test runs

conftest.py:178:

_fixture_failures = {}

This module-level dict is never cleared between pytest sessions. If conftest.py is imported once and pytest runs multiple times (e.g., in a long-lived worker process, or with pytest-xdist), stale entries from a previous run will cause tests to be skipped in subsequent runs even if the fixture is now healthy.

Suggestion: Clear the dict at session start using a pytest_sessionstart hook:

def pytest_sessionstart(session):
    _fixture_failures.clear()

[Important] Parameter set extraction is fragile

params = test_name.split('[')[1].rstrip(']')
param_set = params.split('-')[0]

This assumes the first --delimited token is the fixture parameter set. But test names can be like testQosSaiDscpEcn[single_asic-ptf_mode] or testParameter[multi-asic] where multi-asic contains a hyphen. "multi-asic".split('-')[0]"multi", losing the actual parameter set.

Suggestion: Use a more robust extraction — either match known parameter set names explicitly, or use the full parameterization string as the key.

[Minor] testParameter exclusion is hardcoded

if 'testParameter' in item.name:
    return

If the test is renamed or another "seed" test is added, this breaks. Consider using a pytest marker like @pytest.mark.fixture_seed instead.

Overall: The concept is sound — cascading fixture failures waste test time. The implementation needs the session-start cleanup and more robust parameter parsing.

Copy link
Collaborator

@StormLiangMS StormLiangMS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@darius-nexthop — The module-level _fixture_failures = {} dict is never cleared between pytest sessions. In long-lived workers or xdist, stale entries will incorrectly skip tests in subsequent runs.

Please add a pytest_sessionstart hook to clear it:

def pytest_sessionstart(session):
    _fixture_failures.clear()

Also, the parameter set extraction (split("-")[0]) breaks on names like multi-asic. Please use a more robust parsing approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: test_qos_sai: cascading failures after fixture error

3 participants