[mmu probing] pr03.probe: Add probing executors and executor registry by XuChen-MSFT · Pull Request #22541 · sonic-net/sonic-mgmt

XuChen-MSFT · 2026-02-23T13:33:49Z

Description of PR

Summary:

Implement platform-specific executors and factory pattern for executor management:

Executor Registry:

ExecutorRegistry: Factory for environment-specific executor instantiation Supports 'physical' and 'sim' environments with lazy loading

Physical Executors:

PfcXoffProbingExecutor: PFC Xoff detection on physical hardware via SAI
IngressDropProbingExecutor: Ingress drop detection on physical hardware

Simulation Executors:

SimPfcXoffProbingExecutor: Mock PFC Xoff detection for unit/integration tests
SimIngressDropProbingExecutor: Mock ingress drop detection for testing

All executors implement the ProbingExecutorProtocol interface, enabling algorithm-executor separation and testability through simulation.

Fixes # (issue)

Type of change

Back port request

Approach

What is the motivation for this PR?

qos refactoring

How did you do it?

How did you verify/test it?

Any platform specific information?

Supported testbed topology if it's a new test case?

Documentation

relevant PRs:
[mmu probing] pr01.docs: Add MMU threshold probing framework design
[mmu probing] pr02.probe: Add core probing algorithms with essential data structures
[mmu probing] pr03.probe: Add probing executors and executor registry
[mmu probing] pr04.probe: Add observer pattern for metrics tracking
[mmu probing] pr05.probe: Add stream manager and buffer occupancy controller
[mmu probing] pr06.probe: Add base framework and all probing implementations
[mmu probing] pr07.test: Add comprehensive unit tests for probe framework
[mmu probing] pr08.test: Add integration tests for end-to-end probing workflows
[mmu probing] pr09.test: Add production probe test and infrastructure updates

mssonicbld · 2026-02-23T13:33:57Z

/azp run

azure-pipelines · 2026-02-23T13:34:13Z

Azure Pipelines successfully started running 1 pipeline(s).

yxieca

Deep review done. Executor registry and executors look correct; no issues found.

mssonicbld · 2026-02-24T15:46:13Z

/azp run

azure-pipelines · 2026-02-24T15:46:29Z

Azure Pipelines successfully started running 1 pipeline(s).

yxieca

Re-reviewed updates; no new issues found.

wsycqyz

lğtm

StormLiangMS

Review — PR #22541 (MMU Probing: Executors)

🔴 Bug

Missing self.observer None guard in ingress_drop_probing_executor.py
One verbose trace block has:

if self.verbose:
    self.observer.trace(...)

But other verbose blocks correctly use self.verbose and self.observer. If observer is None, this crashes. Should be if self.verbose and self.observer:.

⚠️ Design

Class-level mutable state in executor_registry.py — _registry: Dict = {} and _loaded_modules: set = set() are shared across tests. clear_registry() must be called reliably; if a test fails before cleanup, subsequent tests see stale state.
_ensure_loaded uses bare module name — importlib.import_module("pfc_xoff_probing_executor") relies on module being on sys.path. If probe directory isn't on path, this silently fails. Consider explicit package prefix.
Inconsistent result patterns between PfcXoff and IngressDrop executors — PfcXoff uses explicit all_true/all_false checks. IngressDrop uses (True, results[0]) then checks len(set(results)) > 1. Both should use the same pattern.

Minor

Sim executor's noisy executor applies noise to value not threshold — semantically, hardware noise shifts the threshold, not packet count
Sim intermittent executor raises exceptions instead of returning (False, False) like physical executors
Docstring parameter name mismatch: create() docstring says env but parameter is executor_env

mssonicbld · 2026-03-18T08:41:00Z

/azp run

azure-pipelines · 2026-03-18T08:41:16Z

Azure Pipelines successfully started running 1 pipeline(s).

XuChen-MSFT · 2026-03-18T08:41:48Z

Added BadSpot sim executors for anti-oscillation testing (dcc1c40):

SimPfcXoffProbingExecutorBadSpot: registered as (pfc_xoff, sim, bad_spot)
SimIngressDropProbingExecutorBadSpot: registered as (ingress_drop, sim, bad_spot)

These executors deterministically fail verification at specific packet count values, used to reproduce and validate the range algorithm anti-oscillation backtrack fix in PR #22540 (036f27c).

Related:

UT coverage: PR [mmu probing] pr07.test: Add comprehensive unit tests for probe framework #22545 (d4fff81) — 8 BadSpot executor tests + 7 backtrack scenario tests
IT coverage: PR [mmu probing] pr08.test: Add integration tests for end-to-end probing workflows #22546 (4b59778) — 2 oscillation integration tests (PFC + Ingress Drop)

mssonicbld · 2026-03-21T23:51:40Z

/azp run

2 new tests (order 8817-8818) covering the observer None guard fix: - test_check_verbose_observer_none_guard_result_path: verifies L281 guard - test_check_verbose_observer_none_guard_exception_path: verifies L290 guard Both tests bypass the L162 assert by nullifying observer mid-execution via counter read side_effect, then verify no AttributeError crash. Related: PR sonic-net#22541 fix (observer None guard) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

azure-pipelines · 2026-03-21T23:51:53Z

Azure Pipelines successfully started running 1 pipeline(s).

6 UTs for IngressDropProbingExecutor (order 8819-8824): single detected/not-detected, multi all/none/inconsistent-TF/FT 6 UTs for PfcXoffProbingExecutor (order 8719-8724): same 6 input combinations Covers all result analysis code paths for the consistency pattern unification in PR sonic-net#22541. All 6 scenarios produce identical outcomes. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

mssonicbld · 2026-03-22T13:15:45Z

/azp run

azure-pipelines · 2026-03-22T13:15:58Z

Azure Pipelines successfully started running 1 pipeline(s).

XuChen-MSFT · 2026-03-22T13:16:02Z

@StormLiangMS Re: Inconsistent result patterns between PfcXoff and IngressDrop

Unified both executors to the concise set-dedup pattern (dd592b4):

return_result = (True, results[0])
# Multiple attempts: check consistency (set dedup detects mixed True/False)
if len(results) > 1 and len(set(results)) > 1:
    return_result = (False, False)

PfcXoff previously used an 8-line explicit all_true/all_false/all_equal pattern. Both produce identical outcomes for all 6 input combinations (verified by 12 new UTs in PR #22545 e2fca74).

XuChen-MSFT · 2026-03-22T13:36:56Z

@StormLiangMS Re: Sim noise applies to value not threshold

Mathematically equivalent — (value + noise) >= threshold produces the same result as value >= (threshold - noise). The noise offset is on opposite sides of the comparison but the detection outcome is identical for all inputs. No change needed.

2 bug reproduction tests (failure_rate=1.0): - test_pfc_xoff_intermittent_returns_tuple_not_exception (order 12) - test_ingress_drop_intermittent_returns_tuple_not_exception (order 14) Verify check() returns (False, False) on failure, not raise Exception. 2 success path tests (failure_rate=0.0): - test_pfc_xoff_intermittent_success_path (order 13) - test_ingress_drop_intermittent_success_path (order 15) Verify normal detection works when no failure occurs. Related: PR sonic-net#22541 fix (intermittent raise → return) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

mssonicbld · 2026-03-22T13:52:10Z

/azp run

azure-pipelines · 2026-03-22T13:52:22Z

Azure Pipelines successfully started running 1 pipeline(s).

XuChen-MSFT · 2026-03-22T13:52:28Z

@StormLiangMS Re: Sim intermittent raises exceptions instead of returning (False, False)

Fixed (2210ae6): raise Exception(error_msg) → return (False, False) in both sim intermittent executors.

Physical executors catch hardware errors and return (False, False). Algorithm callers expect a tuple and check if not success: without try/except — raising crashes the algorithm instead of triggering graceful backtrack/retry.

UT coverage: PR #22545 (74b3efb) — 4 new tests (failure_rate=1.0 verifies return contract, failure_rate=0.0 verifies success path).

mssonicbld · 2026-03-22T14:00:27Z

/azp run

XuChen-MSFT · 2026-03-22T14:00:33Z

@StormLiangMS Re: Docstring parameter name mismatch

Fixed (4838e0a): create() docstring now uses executor_env matching the actual parameter name (was env).

azure-pipelines · 2026-03-22T14:00:41Z

Azure Pipelines successfully started running 1 pipeline(s).

Implement platform-specific executors and factory pattern for executor management: Executor Registry: - ExecutorRegistry: Factory for environment-specific executor instantiation Supports 'physical' and 'sim' environments with lazy loading Physical Executors: - PfcXoffProbingExecutor: PFC Xoff detection on physical hardware via SAI - IngressDropProbingExecutor: Ingress drop detection on physical hardware Simulation Executors: - SimPfcXoffProbingExecutor: Mock PFC Xoff detection for unit/integration tests - SimIngressDropProbingExecutor: Mock ingress drop detection for testing All executors implement the ProbingExecutorProtocol interface, enabling algorithm-executor separation and testability through simulation. Signed-off-by: Xu Chen <xuchen3@microsoft.com>

Signed-off-by: Xu Chen <xuchen3@microsoft.com>

New scenario executors that deterministically fail verification at specific packet count values. Used to reproduce and test the range algorithm anti-oscillation backtrack behavior. - SimPfcXoffProbingExecutorBadSpot: registered as (pfc_xoff, sim, bad_spot) - SimIngressDropProbingExecutorBadSpot: registered as (ingress_drop, sim, bad_spot) Both track bad_hit_count for test assertions. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Xu Chen <xuchen3@microsoft.com>

@StormLiangMS

…xception paths Lines 281 and 290 used 'if self.verbose:' without 'and self.observer', causing AttributeError when observer is None. All other 7 verbose blocks in the same file correctly use 'if self.verbose and self.observer:'. Addresses @StormLiangMS review: missing self.observer None guard. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Xu Chen <xuchen3@microsoft.com>

@StormLiangMS

…p executors PfcXoff used explicit all_true/all_false/all_equal pattern (8 lines). IngressDrop used concise set-dedup pattern (3 lines). Both produce identical outcomes for all input combinations. Unified to the concise pattern with descriptive comment: return_result = (True, results[0]) # Multiple attempts: check consistency (set dedup detects mixed True/False) if len(results) > 1 and len(set(results)) > 1: return_result = (False, False) UT coverage: PR sonic-net#22545 (e2fca74) — 12 new tests covering all 6 input combinations for both executors. Addresses @StormLiangMS review: inconsistent result patterns. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Xu Chen <xuchen3@microsoft.com>

@StormLiangMS

…ing Exception Physical executors catch hardware errors and return (False, False). Algorithm callers expect a tuple and check 'if not success:' without try/except. Raising Exception crashes the algorithm instead of triggering graceful retry/backtrack. Changed: raise Exception(error_msg) → return (False, False) Files: sim_pfc_xoff_probing_executor.py, sim_ingress_drop_probing_executor.py UT coverage: PR sonic-net#22545 (74b3efb) — 4 new tests verifying the return contract. Addresses @StormLiangMS review: sim intermittent raises exceptions instead of returning (False, False) like physical executors. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Xu Chen <xuchen3@microsoft.com>

@StormLiangMS

Addresses @StormLiangMS review: docstring parameter name mismatch. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Xu Chen <xuchen3@microsoft.com>

mssonicbld · 2026-03-24T04:59:42Z

/azp run

2 new tests (order 8817-8818) covering the observer None guard fix: - test_check_verbose_observer_none_guard_result_path: verifies L281 guard - test_check_verbose_observer_none_guard_exception_path: verifies L290 guard Both tests bypass the L162 assert by nullifying observer mid-execution via counter read side_effect, then verify no AttributeError crash. Related: PR sonic-net#22541 fix (observer None guard) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Xu Chen <xuchen3@microsoft.com>

6 UTs for IngressDropProbingExecutor (order 8819-8824): single detected/not-detected, multi all/none/inconsistent-TF/FT 6 UTs for PfcXoffProbingExecutor (order 8719-8724): same 6 input combinations Covers all result analysis code paths for the consistency pattern unification in PR sonic-net#22541. All 6 scenarios produce identical outcomes. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Xu Chen <xuchen3@microsoft.com>

2 bug reproduction tests (failure_rate=1.0): - test_pfc_xoff_intermittent_returns_tuple_not_exception (order 12) - test_ingress_drop_intermittent_returns_tuple_not_exception (order 14) Verify check() returns (False, False) on failure, not raise Exception. 2 success path tests (failure_rate=0.0): - test_pfc_xoff_intermittent_success_path (order 13) - test_ingress_drop_intermittent_success_path (order 15) Verify normal detection works when no failure occurs. Related: PR sonic-net#22541 fix (intermittent raise → return) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Xu Chen <xuchen3@microsoft.com>

azure-pipelines · 2026-03-24T04:59:56Z

Azure Pipelines successfully started running 1 pipeline(s).

StormLiangMS

⚠️ Approve with 2 findings

[Critical] importlib.import_module without package context will fail
executor_registry.py:146

importlib.import_module(module_path)

Uses bare module names like "pfc_xoff_probing_executor" without specifying the package. This works in unit tests because conftest.py does sys.path.insert(0, probe_dir), but may fail in production PTF runs if the probe directory is not on sys.path.

Suggested fix:

importlib.import_module(f".{module_path}", package=__package__)

or construct absolute imports:

importlib.import_module(f"tests.saitests.probe.{module_path}")

[Low] Incomplete docstring in SimPfcXoffProbingExecutor.check()
sim_pfc_xoff_probing_executor.py:1115-1129

Missing method description line, "Args:" header, and src_port parameter documentation. The docstring jumps straight to dst_port.

@StormLiangMS

Added method description, Args header, and src_port parameter documentation. Docstring was jumping straight to dst_port without method description. Addresses @StormLiangMS review (2026-03-25): incomplete docstring. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Xu Chen <xuchen3@microsoft.com>

mssonicbld · 2026-03-26T08:14:35Z

/azp run

XuChen-MSFT · 2026-03-26T08:14:45Z

@StormLiangMS Re: [Critical] importlib.import_module without package context

This was addressed in a previous reply — bare module import is intentional for PTF compatibility:

PTF runner adds test directories to sys.path before execution. Using relative imports (f".{module_path}") or absolute paths (f"tests.saitests.probe.{module_path}") would break in PTF containers where the directory structure differs from the repo layout.
ImportError is caught and re-raised with the module name (L142-144), so failures are not silent.
This follows the same pattern used by all existing SAI test files.

Re: [Low] Incomplete docstring

Fixed (f4b831b): Added method description, Args: header, and src_port parameter to SimPfcXoffProbingExecutor.check() docstring.

azure-pipelines · 2026-03-26T08:14:49Z

Azure Pipelines successfully started running 1 pipeline(s).

XuChen-MSFT requested review from StormLiangMS, bingwang-ms, kperumalbfn, wsycqyz and yxieca February 23, 2026 13:34

yxieca previously approved these changes Feb 23, 2026

View reviewed changes

XuChen-MSFT dismissed yxieca’s stale review via bb7a77b February 24, 2026 15:46

yxieca previously approved these changes Feb 24, 2026

View reviewed changes

wsycqyz previously approved these changes Feb 25, 2026

View reviewed changes

StormLiangMS reviewed Mar 17, 2026

View reviewed changes

XuChen-MSFT dismissed stale reviews from wsycqyz and yxieca via dcc1c40 March 18, 2026 08:40

XuChen-MSFT and others added 7 commits March 24, 2026 12:59

fix pre-commit errors

37c1aee

Signed-off-by: Xu Chen <xuchen3@microsoft.com>

fix: docstring param name env → executor_env in create()

08f5c9c

Addresses @StormLiangMS review: docstring parameter name mismatch. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Xu Chen <xuchen3@microsoft.com>

XuChen-MSFT force-pushed the xuchen3/mmu_probe/pr03-executors branch from 4838e0a to 08f5c9c Compare March 24, 2026 04:59

StormLiangMS reviewed Mar 25, 2026

View reviewed changes

Conversation

XuChen-MSFT commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description of PR

Type of change

Back port request

Approach

What is the motivation for this PR?

How did you do it?

How did you verify/test it?

Any platform specific information?

Supported testbed topology if it's a new test case?

Documentation

Uh oh!

mssonicbld commented Feb 23, 2026

Uh oh!

azure-pipelines bot commented Feb 23, 2026

Uh oh!

yxieca left a comment

Choose a reason for hiding this comment

Uh oh!

mssonicbld commented Feb 24, 2026

Uh oh!

azure-pipelines bot commented Feb 24, 2026

Uh oh!

yxieca left a comment

Choose a reason for hiding this comment

Uh oh!

wsycqyz left a comment

Choose a reason for hiding this comment

Uh oh!

StormLiangMS left a comment

Choose a reason for hiding this comment

Review — PR #22541 (MMU Probing: Executors)

🔴 Bug

⚠️ Design

Minor

Uh oh!

mssonicbld commented Mar 18, 2026

Uh oh!

azure-pipelines bot commented Mar 18, 2026

Uh oh!

XuChen-MSFT commented Mar 18, 2026

Uh oh!

mssonicbld commented Mar 21, 2026

Uh oh!

azure-pipelines bot commented Mar 21, 2026

Uh oh!

mssonicbld commented Mar 22, 2026

Uh oh!

azure-pipelines bot commented Mar 22, 2026

Uh oh!

XuChen-MSFT commented Mar 22, 2026

Uh oh!

XuChen-MSFT commented Mar 22, 2026

Uh oh!

mssonicbld commented Mar 22, 2026

Uh oh!

azure-pipelines bot commented Mar 22, 2026

Uh oh!

XuChen-MSFT commented Mar 22, 2026

Uh oh!

mssonicbld commented Mar 22, 2026

Uh oh!

XuChen-MSFT commented Mar 22, 2026

Uh oh!

azure-pipelines bot commented Mar 22, 2026

Uh oh!

mssonicbld commented Mar 24, 2026

Uh oh!

azure-pipelines bot commented Mar 24, 2026

Uh oh!

StormLiangMS left a comment

Choose a reason for hiding this comment

Uh oh!

mssonicbld commented Mar 26, 2026

Uh oh!

XuChen-MSFT commented Mar 26, 2026

Uh oh!

azure-pipelines bot commented Mar 26, 2026

XuChen-MSFT commented Feb 23, 2026 •

edited

Loading