[mmu probing] pr03.probe: Add probing executors and executor registry#22541
[mmu probing] pr03.probe: Add probing executors and executor registry#22541XuChen-MSFT wants to merge 8 commits intosonic-net:masterfrom
Conversation
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
yxieca
left a comment
There was a problem hiding this comment.
Deep review done. Executor registry and executors look correct; no issues found.
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
yxieca
left a comment
There was a problem hiding this comment.
Re-reviewed updates; no new issues found.
StormLiangMS
left a comment
There was a problem hiding this comment.
Review — PR #22541 (MMU Probing: Executors)
🔴 Bug
Missing self.observer None guard in ingress_drop_probing_executor.py
One verbose trace block has:
if self.verbose:
self.observer.trace(...)But other verbose blocks correctly use self.verbose and self.observer. If observer is None, this crashes. Should be if self.verbose and self.observer:.
⚠️ Design
-
Class-level mutable state in
executor_registry.py—_registry: Dict = {}and_loaded_modules: set = set()are shared across tests.clear_registry()must be called reliably; if a test fails before cleanup, subsequent tests see stale state. -
_ensure_loadeduses bare module name —importlib.import_module("pfc_xoff_probing_executor")relies on module being onsys.path. If probe directory isn't on path, this silently fails. Consider explicit package prefix. -
Inconsistent result patterns between PfcXoff and IngressDrop executors — PfcXoff uses explicit
all_true/all_falsechecks. IngressDrop uses(True, results[0])then checkslen(set(results)) > 1. Both should use the same pattern.
Minor
- Sim executor's noisy executor applies noise to
valuenotthreshold— semantically, hardware noise shifts the threshold, not packet count - Sim intermittent executor raises exceptions instead of returning
(False, False)like physical executors - Docstring parameter name mismatch:
create()docstring saysenvbut parameter isexecutor_env
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
Added BadSpot sim executors for anti-oscillation testing (
These executors deterministically fail verification at specific packet count values, used to reproduce and validate the range algorithm anti-oscillation backtrack fix in PR #22540 ( Related:
|
|
/azp run |
2 new tests (order 8817-8818) covering the observer None guard fix: - test_check_verbose_observer_none_guard_result_path: verifies L281 guard - test_check_verbose_observer_none_guard_exception_path: verifies L290 guard Both tests bypass the L162 assert by nullifying observer mid-execution via counter read side_effect, then verify no AttributeError crash. Related: PR sonic-net#22541 fix (observer None guard) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Azure Pipelines successfully started running 1 pipeline(s). |
6 UTs for IngressDropProbingExecutor (order 8819-8824): single detected/not-detected, multi all/none/inconsistent-TF/FT 6 UTs for PfcXoffProbingExecutor (order 8719-8724): same 6 input combinations Covers all result analysis code paths for the consistency pattern unification in PR sonic-net#22541. All 6 scenarios produce identical outcomes. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
@StormLiangMS Re: Inconsistent result patterns between PfcXoff and IngressDrop Unified both executors to the concise set-dedup pattern ( return_result = (True, results[0])
# Multiple attempts: check consistency (set dedup detects mixed True/False)
if len(results) > 1 and len(set(results)) > 1:
return_result = (False, False)PfcXoff previously used an 8-line explicit |
|
@StormLiangMS Re: Sim noise applies to value not threshold Mathematically equivalent — |
2 bug reproduction tests (failure_rate=1.0): - test_pfc_xoff_intermittent_returns_tuple_not_exception (order 12) - test_ingress_drop_intermittent_returns_tuple_not_exception (order 14) Verify check() returns (False, False) on failure, not raise Exception. 2 success path tests (failure_rate=0.0): - test_pfc_xoff_intermittent_success_path (order 13) - test_ingress_drop_intermittent_success_path (order 15) Verify normal detection works when no failure occurs. Related: PR sonic-net#22541 fix (intermittent raise → return) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
@StormLiangMS Re: Sim intermittent raises exceptions instead of returning (False, False) Fixed ( Physical executors catch hardware errors and return UT coverage: PR #22545 ( |
|
/azp run |
|
@StormLiangMS Re: Docstring parameter name mismatch Fixed ( |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Implement platform-specific executors and factory pattern for executor management: Executor Registry: - ExecutorRegistry: Factory for environment-specific executor instantiation Supports 'physical' and 'sim' environments with lazy loading Physical Executors: - PfcXoffProbingExecutor: PFC Xoff detection on physical hardware via SAI - IngressDropProbingExecutor: Ingress drop detection on physical hardware Simulation Executors: - SimPfcXoffProbingExecutor: Mock PFC Xoff detection for unit/integration tests - SimIngressDropProbingExecutor: Mock ingress drop detection for testing All executors implement the ProbingExecutorProtocol interface, enabling algorithm-executor separation and testability through simulation. Signed-off-by: Xu Chen <xuchen3@microsoft.com>
Signed-off-by: Xu Chen <xuchen3@microsoft.com>
New scenario executors that deterministically fail verification at specific packet count values. Used to reproduce and test the range algorithm anti-oscillation backtrack behavior. - SimPfcXoffProbingExecutorBadSpot: registered as (pfc_xoff, sim, bad_spot) - SimIngressDropProbingExecutorBadSpot: registered as (ingress_drop, sim, bad_spot) Both track bad_hit_count for test assertions. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Xu Chen <xuchen3@microsoft.com>
…xception paths Lines 281 and 290 used 'if self.verbose:' without 'and self.observer', causing AttributeError when observer is None. All other 7 verbose blocks in the same file correctly use 'if self.verbose and self.observer:'. Addresses @StormLiangMS review: missing self.observer None guard. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Xu Chen <xuchen3@microsoft.com>
…p executors
PfcXoff used explicit all_true/all_false/all_equal pattern (8 lines).
IngressDrop used concise set-dedup pattern (3 lines). Both produce
identical outcomes for all input combinations.
Unified to the concise pattern with descriptive comment:
return_result = (True, results[0])
# Multiple attempts: check consistency (set dedup detects mixed True/False)
if len(results) > 1 and len(set(results)) > 1:
return_result = (False, False)
UT coverage: PR sonic-net#22545 (e2fca74) — 12 new tests covering all 6 input
combinations for both executors.
Addresses @StormLiangMS review: inconsistent result patterns.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Xu Chen <xuchen3@microsoft.com>
…ing Exception Physical executors catch hardware errors and return (False, False). Algorithm callers expect a tuple and check 'if not success:' without try/except. Raising Exception crashes the algorithm instead of triggering graceful retry/backtrack. Changed: raise Exception(error_msg) → return (False, False) Files: sim_pfc_xoff_probing_executor.py, sim_ingress_drop_probing_executor.py UT coverage: PR sonic-net#22545 (74b3efb) — 4 new tests verifying the return contract. Addresses @StormLiangMS review: sim intermittent raises exceptions instead of returning (False, False) like physical executors. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Xu Chen <xuchen3@microsoft.com>
Addresses @StormLiangMS review: docstring parameter name mismatch. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Xu Chen <xuchen3@microsoft.com>
4838e0a to
08f5c9c
Compare
|
/azp run |
2 new tests (order 8817-8818) covering the observer None guard fix: - test_check_verbose_observer_none_guard_result_path: verifies L281 guard - test_check_verbose_observer_none_guard_exception_path: verifies L290 guard Both tests bypass the L162 assert by nullifying observer mid-execution via counter read side_effect, then verify no AttributeError crash. Related: PR sonic-net#22541 fix (observer None guard) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Xu Chen <xuchen3@microsoft.com>
6 UTs for IngressDropProbingExecutor (order 8819-8824): single detected/not-detected, multi all/none/inconsistent-TF/FT 6 UTs for PfcXoffProbingExecutor (order 8719-8724): same 6 input combinations Covers all result analysis code paths for the consistency pattern unification in PR sonic-net#22541. All 6 scenarios produce identical outcomes. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Xu Chen <xuchen3@microsoft.com>
2 bug reproduction tests (failure_rate=1.0): - test_pfc_xoff_intermittent_returns_tuple_not_exception (order 12) - test_ingress_drop_intermittent_returns_tuple_not_exception (order 14) Verify check() returns (False, False) on failure, not raise Exception. 2 success path tests (failure_rate=0.0): - test_pfc_xoff_intermittent_success_path (order 13) - test_ingress_drop_intermittent_success_path (order 15) Verify normal detection works when no failure occurs. Related: PR sonic-net#22541 fix (intermittent raise → return) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Xu Chen <xuchen3@microsoft.com>
|
Azure Pipelines successfully started running 1 pipeline(s). |
StormLiangMS
left a comment
There was a problem hiding this comment.
[Critical] importlib.import_module without package context will fail
executor_registry.py:146
importlib.import_module(module_path)Uses bare module names like "pfc_xoff_probing_executor" without specifying the package. This works in unit tests because conftest.py does sys.path.insert(0, probe_dir), but may fail in production PTF runs if the probe directory is not on sys.path.
Suggested fix:
importlib.import_module(f".{module_path}", package=__package__)or construct absolute imports:
importlib.import_module(f"tests.saitests.probe.{module_path}")[Low] Incomplete docstring in SimPfcXoffProbingExecutor.check()
sim_pfc_xoff_probing_executor.py:1115-1129
Missing method description line, "Args:" header, and src_port parameter documentation. The docstring jumps straight to dst_port.
Added method description, Args header, and src_port parameter documentation. Docstring was jumping straight to dst_port without method description. Addresses @StormLiangMS review (2026-03-25): incomplete docstring. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Xu Chen <xuchen3@microsoft.com>
|
/azp run |
|
@StormLiangMS Re: [Critical] importlib.import_module without package context This was addressed in a previous reply — bare module import is intentional for PTF compatibility:
Re: [Low] Incomplete docstring Fixed ( |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Description of PR
Summary:
Implement platform-specific executors and factory pattern for executor management:
Executor Registry:
Physical Executors:
Simulation Executors:
All executors implement the ProbingExecutorProtocol interface, enabling algorithm-executor separation and testability through simulation.
Fixes # (issue)
Type of change
Back port request
Approach
What is the motivation for this PR?
qos refactoring
How did you do it?
How did you verify/test it?
Any platform specific information?
Supported testbed topology if it's a new test case?
Documentation
relevant PRs:
[mmu probing] pr01.docs: Add MMU threshold probing framework design
[mmu probing] pr02.probe: Add core probing algorithms with essential data structures
[mmu probing] pr03.probe: Add probing executors and executor registry
[mmu probing] pr04.probe: Add observer pattern for metrics tracking
[mmu probing] pr05.probe: Add stream manager and buffer occupancy controller
[mmu probing] pr06.probe: Add base framework and all probing implementations
[mmu probing] pr07.test: Add comprehensive unit tests for probe framework
[mmu probing] pr08.test: Add integration tests for end-to-end probing workflows
[mmu probing] pr09.test: Add production probe test and infrastructure updates