[mmu probing] pr09.test: Add production probe test and infrastructure updates by XuChen-MSFT · Pull Request #22547 · sonic-net/sonic-mgmt

XuChen-MSFT · 2026-02-23T14:05:37Z

Description of PR

Summary:

Enable probe framework for production testbed usage with infrastructure updates and physical hardware test cases.

Infrastructure Updates:

tests/conftest.py:
- Add --enable_qos_ptf_pdb option for PTF debugging with pdb breakpoint
- Add --ingress_drop_probing option to switch between PFC/Drop probing modes
tests/ptf_runner.py:
- Add 'probe' subdirectory support alongside 'py3'
- Add test_subdir parameter for flexible PTF test location
- Enable probe tests to run via PTF runner infrastructure
tests/qos/qos_sai_base.py (QosSaiBase refactoring):
- Move replaceNonExistentPortId() from TestQosSai to base class
- Move updateTestPortIdIp() from TestQosSai to base class
- Add bufferConfig to dut_qos_maps fixture for all devices
- Enable probe tests to access buffer configuration
- Shared utility methods for port ID/IP management
tests/qos/test_qos_sai.py:
- Remove replaceNonExistentPortId() (moved to base)
- Remove updateTestPortIdIp() (moved to base)
- Reduce code duplication

Production Test Cases:

tests/qos/test_qos_probe.py (NEW - 544 lines):
- TestQosProbe class for physical testbed probing
- test_pfc_xoff_probing: PFC Xoff threshold detection on hardware
- test_ingress_drop_probing: Ingress drop threshold detection
- test_headroom_pool_probing: Headroom pool size probing
- Integrates with existing QoS test infrastructure
- Uses physical executors for real hardware validation
- Validates probe framework on production testbeds

This PR completes the probe framework integration, enabling threshold probing tests to run on physical SONiC testbeds alongside existing QoS tests.

Fixes # (issue)

Type of change

Back port request

Approach

What is the motivation for this PR?

qos refactoring

How did you do it?

How did you verify/test it?

Any platform specific information?

Supported testbed topology if it's a new test case?

Documentation

relevant PRs:
[mmu probing] pr01.docs: Add MMU threshold probing framework design
[mmu probing] pr02.probe: Add core probing algorithms with essential data structures
[mmu probing] pr03.probe: Add probing executors and executor registry
[mmu probing] pr04.probe: Add observer pattern for metrics tracking
[mmu probing] pr05.probe: Add stream manager and buffer occupancy controller
[mmu probing] pr06.probe: Add base framework and all probing implementations
[mmu probing] pr07.test: Add comprehensive unit tests for probe framework
[mmu probing] pr08.test: Add integration tests for end-to-end probing workflows
[mmu probing] pr09.test: Add production probe test and infrastructure updates

mssonicbld · 2026-02-23T14:05:45Z

/azp run

azure-pipelines · 2026-02-23T14:06:00Z

Azure Pipelines successfully started running 1 pipeline(s).

yxieca · 2026-02-23T21:34:00Z

Found a couple blocking issues:

Typo in key:
has a leading space. Likely should be .
Type error in updateTestPortIdIp():
passes a set; the helper mutates/indexes the list. Use a list instead (e.g., or keep as list).

These likely explain the static analysis failure. Please fix and re-run checks.

mssonicbld · 2026-02-24T14:45:40Z

/azp run

azure-pipelines · 2026-02-24T14:45:57Z

Azure Pipelines successfully started running 1 pipeline(s).

mssonicbld · 2026-02-24T15:16:11Z

/azp run

azure-pipelines · 2026-02-24T15:16:27Z

Azure Pipelines successfully started running 1 pipeline(s).

XuChen-MSFT · 2026-02-25T03:03:51Z

Regarding to PR test failure:
As below test log, these 9 PRs have been successfully validated on physical hardware platforms.
After merging, platform-specific integration testing will begin across various ASICs/SKUs/platforms, include KVM test. To prevent KVM test failures in CI during this validation phase, conditional marks will be added temporarily. Progress will be tracked via GitHub issues - platforms will be enabled incrementally as validation completes.

$ ./run_tests.sh -c qos/test_qos_probe.py::TestQosProbe::testQosPfcXoffProbe -t t0,any -n testbed-bjw2-can-t0-7260-2 -i ../ansible/bjw2,../ansible/veos -r -u -m individual -l info -k debug  -e "--skip_sanity --disable_loganalyzer --py_saithrift_url=${saithrift_bjw_brcm_202511}"
... omitted ...
--------------------------------------------- generated xml file: /var/src/sonic-mgmt-int/tests/logs/qos/test_qos_probe.py::TestQosProbe::testQosPfcXoffProbe.xml ---------------------------------------------
------------------------------------------------------------------------------------------- live log sessionfinish --------------------------------------------------------------------------------------------
25/02/2026 02:30:19 __init__.pytest_terminal_summary         L0067 INFO   | Can not get Allure report URL. Please check logs
=========================================================================================== short test summary info ===========================================================================================
SKIPPED [2] qos/test_qos_probe.py:81: Additional DSCPs are not supported on non-dual ToR ports
SKIPPED [4] qos/test_qos_probe.py:51: single_dut_multi_asic is not supported on T0 topologies
SKIPPED [12] qos/test_qos_probe.py:51: multi-dut is not supported on T0 topologies
=========================================================================== 2 passed, 18 skipped, 3 warnings in 1801.66s (0:30:01) ============================================================================
DEBUG:tests.conftest:[log_custom_msg] item: <Function testQosPfcXoffProbe[multi_dut_shortlink_to_longlink-xoff_4]>
INFO:root:Can not get Allure report URL. Please check logs
xuchen3@xuchen3-env-bj5:/var/src/sonic-mgmt-int/tests$ git log --oneline -n 10
ea5a3f102f (HEAD -> xuchen3/internal/mmu-probing.2-25.r2, origin/xuchen3/internal/mmu-probing.2-25.r2) sonic-mgmt__pr-22547__commit-4__-mmu-probing--pr09.test--Add-production-probe-test-and-infrastructure-updates.diff
84c350241f sonic-mgmt__pr-22546__commit-2__-mmu-probing--pr08.test--Add-integration-tests-for-end-to-end-probing-workflows.diff
7dd190f742 sonic-mgmt__pr-22545__commit-2__-mmu-probing--pr07.test--Add-comprehensive-unit-tests-for-probe-framework.diff
5cd33c2aa8 sonic-mgmt__pr-22544__commit-2__-mmu-probing--pr06.probe--Add-base-framework-and-all-probing-implementations.diff
83cf44f8b7 sonic-mgmt__pr-22543__commit-2__-mmu-probing--pr05.probe--Add-stream-manager-and-buffer-occupancy-controller.diff
ba639a8d7c sonic-mgmt__pr-22542__commit-2__-mmu-probing--pr04.probe--Add-observer-pattern-for-metrics-tracking.diff
bea8e30963 sonic-mgmt__pr-22541__commit-2__-mmu-probing--pr03.probe--Add-probing-executors-and-executor-registry.diff
9828290935 sonic-mgmt__pr-22540__commit-2__-mmu-probing--pr02.probe--Add-core-probing-algorithms-with-essential-data-structures.diff
3954ce3d97 sonic-mgmt__pr-22539__commit-1__-mmu-probing--pr01.docs--Add-MMU-threshold-probing-framework-design.diff
0b98d36716 (origin/internal, origin/HEAD) Merged PR 19235: Set dpu-pattern arg for smartswitch nightly runs
xuchen3@xuchen3-env-bj5:/var/src/sonic-mgmt-int/tests$

mssonicbld · 2026-02-25T05:30:50Z

/azp run

azure-pipelines · 2026-02-25T05:31:05Z

Azure Pipelines successfully started running 1 pipeline(s).

XuChen-MSFT · 2026-02-25T05:35:46Z

Regarding to PR test failure: As below test log, these 9 PRs have been successfully validated on physical hardware platforms. After merging, platform-specific integration testing will begin across various ASICs/SKUs/platforms, include KVM test. To prevent KVM test failures in CI during this validation phase, conditional marks will be added temporarily. Progress will be tracked via GitHub issues - platforms will be enabled incrementally as validation completes.

$ ./run_tests.sh -c qos/test_qos_probe.py::TestQosProbe::testQosPfcXoffProbe -t t0,any -n testbed-bjw2-can-t0-7260-2 -i ../ansible/bjw2,../ansible/veos -r -u -m individual -l info -k debug  -e "--skip_sanity --disable_loganalyzer --py_saithrift_url=${saithrift_bjw_brcm_202511}"
... omitted ...
--------------------------------------------- generated xml file: /var/src/sonic-mgmt-int/tests/logs/qos/test_qos_probe.py::TestQosProbe::testQosPfcXoffProbe.xml ---------------------------------------------
------------------------------------------------------------------------------------------- live log sessionfinish --------------------------------------------------------------------------------------------
25/02/2026 02:30:19 __init__.pytest_terminal_summary         L0067 INFO   | Can not get Allure report URL. Please check logs
=========================================================================================== short test summary info ===========================================================================================
SKIPPED [2] qos/test_qos_probe.py:81: Additional DSCPs are not supported on non-dual ToR ports
SKIPPED [4] qos/test_qos_probe.py:51: single_dut_multi_asic is not supported on T0 topologies
SKIPPED [12] qos/test_qos_probe.py:51: multi-dut is not supported on T0 topologies
=========================================================================== 2 passed, 18 skipped, 3 warnings in 1801.66s (0:30:01) ============================================================================
DEBUG:tests.conftest:[log_custom_msg] item: <Function testQosPfcXoffProbe[multi_dut_shortlink_to_longlink-xoff_4]>
INFO:root:Can not get Allure report URL. Please check logs
xuchen3@xuchen3-env-bj5:/var/src/sonic-mgmt-int/tests$ git log --oneline -n 10
ea5a3f102f (HEAD -> xuchen3/internal/mmu-probing.2-25.r2, origin/xuchen3/internal/mmu-probing.2-25.r2) sonic-mgmt__pr-22547__commit-4__-mmu-probing--pr09.test--Add-production-probe-test-and-infrastructure-updates.diff
84c350241f sonic-mgmt__pr-22546__commit-2__-mmu-probing--pr08.test--Add-integration-tests-for-end-to-end-probing-workflows.diff
7dd190f742 sonic-mgmt__pr-22545__commit-2__-mmu-probing--pr07.test--Add-comprehensive-unit-tests-for-probe-framework.diff
5cd33c2aa8 sonic-mgmt__pr-22544__commit-2__-mmu-probing--pr06.probe--Add-base-framework-and-all-probing-implementations.diff
83cf44f8b7 sonic-mgmt__pr-22543__commit-2__-mmu-probing--pr05.probe--Add-stream-manager-and-buffer-occupancy-controller.diff
ba639a8d7c sonic-mgmt__pr-22542__commit-2__-mmu-probing--pr04.probe--Add-observer-pattern-for-metrics-tracking.diff
bea8e30963 sonic-mgmt__pr-22541__commit-2__-mmu-probing--pr03.probe--Add-probing-executors-and-executor-registry.diff
9828290935 sonic-mgmt__pr-22540__commit-2__-mmu-probing--pr02.probe--Add-core-probing-algorithms-with-essential-data-structures.diff
3954ce3d97 sonic-mgmt__pr-22539__commit-1__-mmu-probing--pr01.docs--Add-MMU-threshold-probing-framework-design.diff
0b98d36716 (origin/internal, origin/HEAD) Merged PR 19235: Set dpu-pattern arg for smartswitch nightly runs
xuchen3@xuchen3-env-bj5:/var/src/sonic-mgmt-int/tests$

Added conditional mark for qos/test_qos_probe.py to skip MMU threshold probing tests on platforms pending validation. Created 9 GitHub issues to track joint debugging progress for each platform/ASIC type:

VS platform (asic_type: vs) - Issue Enhancement: Enable MMU threshold probing test on VS platform #22599
Cisco-8000 GB (Cisco-8102-C64, Cisco-8101-O8C48, Cisco-8102-28FH-DPU-O) - Issue Enhancement: Enable MMU threshold probing test on Cisco-8000 GB platforms #22601
Cisco-8000 GR/GR2 (Cisco-8101-O32, Cisco-8101-O8V48) - Issue Enhancement: Enable MMU threshold probing test on Cisco-8000 GR/GR2 platforms #22602
Broadcom TD3 (Arista-7050CX3-32C-C32, Arista-7050CX3-32S-C32) - Issue Enhancement: Enable MMU threshold probing test on Broadcom Trident 3 platforms #22603
Broadcom TH (Arista-7060CX-32S-C32, Arista-7060CX-32S-Q32) - Issue Enhancement: Enable MMU threshold probing test on Broadcom Tomahawk platforms #22604
Broadcom TH2 (Arista-7260CX3-C64, Arista-7260CX3-D108C8, Arista-7260CX3-D108C10) - Issue Enhancement: Enable MMU threshold probing test on Broadcom Tomahawk 2 platforms #22605
Broadcom TH5 (Arista-7060X6-16PE-384C-B-O128S2, Arista-7060X6-64PE-B-O128) - Issue Enhancement: Enable MMU threshold probing test on Broadcom Tomahawk 5 platforms #22606
Mellanox SPC1 (Mellanox-SN2700) - Issue Enhancement: Enable MMU threshold probing test on Mellanox Spectrum 1 platforms #22607
Mellanox SPC3 (Mellanox-SN4600C-C64) - Issue Enhancement: Enable MMU threshold probing test on Mellanox Spectrum 3 platforms #22608

Each issue tracks the validation work to ensure MMU threshold probing tests run successfully without corner cases. Once validation completes for a platform, the corresponding conditional skip will be removed.

Conditional mark location: tests/common/plugins/conditional_mark/tests_mark_conditions.yaml line 3930

XuChen-MSFT · 2026-02-27T14:20:43Z

Found a couple blocking issues:

Typo in key:
has a leading space. Likely should be .

Type error in updateTestPortIdIp():
passes a set; the helper mutates/indexes the list. Use a list instead (e.g., or keep as list).

These likely explain the static analysis failure. Please fix and re-run checks.

@yxieca Thanks for the review.
The static analysis failures were primarily due to flake8 formatting issues (e.g., line length, unused imports) and test configuration, which have all been resolved.
regarding the tests, I've added a conditional mark to temporarily skip the newly created python code on the VS platform pending further integration. I've opened tracking issues to manage the debugging and validation

…ic-net#22547)  #### Why I did it In the case of ASIC detection failures on Broadcom (or if the ASIC couldn't be detected in time), the `/dev/shm` partition in the syncd container will be only 64MB, which might cause issues if syncd/Broadcom SAI library needs more space than that. ##### Work item tracking - Microsoft ADO **(number only)**: #### How I did it Since using a larger `/dev/shm` on its own doesn't cause any issues, bump up the default to 512MB. This should be enough for most platforms. #### How to verify it  #### Which release branch to backport (provide reason below if selected)  - [ ] 201811 - [ ] 201911 - [ ] 202006 - [ ] 202012 - [ ] 202106 - [ ] 202111 - [ ] 202205 - [ ] 202211 - [ ] 202305 #### Tested branch (Please provide the tested image version)  - [ ]  - [ ]  #### Description for the changelog   #### Link to config_db schema for YANG module changes  #### A picture of a cute animal (not mandatory but encouraged)

StormLiangMS

Review — PR #22547 (MMU Probing: Production Tests + Infrastructure)

This is the top of the 9-PR MMU probing stack. I reviewed the full series (#22539–#22547). Overall the framework is well-architected — binary search threshold probing with pluggable algorithms/executors/observers is a solid design. However, I found 2 ship-blockers in this PR and several issues across the stack.

🔴 Ship-Blockers

1. Typo in config key — " breakout" with leading space
In test_qos_probe.py (~line 449):

qosConfig = dutQosConfig["param"][portSpeedCableLength][" breakout"]

Note the leading space in " breakout". This will KeyError at runtime for any breakout SKU running testQosIngressDropProbe. Compare with the correct ["breakout"] (no space) used in testQosPfcXoffProbe.

2. set() passed where list is required
In qos_sai_base.py (~line 174):

pytest_assert(self.replaceNonExistentPortId(testPortIds, set(portIds)), ...)

replaceNonExistentPortId does portIds[idx] = freePorts.pop(0) — index assignment on a set raises TypeError. Sets don't support indexing.

⚠️ Design Issues

find_cell_size() duplicated in 3 test methods — identical recursive search helper in testQosPfcXoffProbe, testQosIngressDropProbe, and testQosHeadroomPoolProbe. Should be a class method or standalone utility.
src_port_vlans may be unbound — in testQosHeadroomPoolProbe, it's set only inside the if platform_asic == "broadcom-dnx" block but appears to be referenced later unconditionally for DNX platforms.
in_py3 variable name misleading in ptf_runner.py — now True for both py3 and probe directories. Name should reflect the broader meaning.

❓ Question

tests_mark_conditions.yaml — some skip conditions include GitHub issue URLs in the condition string (e.g., "asic_type in ['vs'] and https://github.com/..."). Does the conditional_mark plugin actually parse these? The URL portion would be a NameError in Python eval().

Cross-Stack Issues (from #22540, #22541, #22544)

See individual PR reviews for details, but the most important ones:

#22540: Infinite loop in lower-bound algorithm when current reaches 1 (max(1//2, 1) == 1 forever)
#22540: Point algorithm continues sending incremental traffic after a failure (corrupted buffer state)
#22541: Missing self.observer None guard in ingress_drop_probing_executor.py verbose trace block
#22544: continue on PG failure skips buffer cleanup — next PG probes with corrupted buffer state
#22544: Massive code duplication in _create_algorithms() between PfcXoff and IngressDrop classes (~120 identical lines each)

mssonicbld · 2026-03-17T07:18:19Z

/azp run

azure-pipelines · 2026-03-17T07:18:25Z

Azure Pipelines will not run the associated pipelines, because the pull request was updated after the run command was issued. Review the pull request again and issue a new run command.

XuChen-MSFT · 2026-03-17T07:20:51Z

@StormLiangMS Thanks for the thorough review across the entire 9-PR stack. Addressed all items for this PR below.

✅ Fixed (4 commits pushed)

1. " breakout" typo (Ship-Blocker) — Fixed. Removed the leading space in the config key at line 206 of test_qos_probe.py. (cbdfaab)

2. set(portIds) type error (Ship-Blocker) — Changed to list(portIds) in qos_sai_base.py. Note: this is a pre-existing bug inherited from test_qos_sai.py (introduced in PR #8149 by @vmittal-msft). The set() was intended for deduplication but set doesn't support item assignment which replaceNonExistentPortId() requires. It rarely triggered in practice because the index-assignment path only executes when a port is invalid and needs replacement — uncommon on most testbeds. (d74eeb7)

3. find_cell_size() duplication — Extracted the 3 identical nested definitions into a single @staticmethod on TestQosProbe. (9a97b3c)

4. in_py3 variable name — Renamed to in_subdir in ptf_runner.py with updated docstring. The boolean originally only indicated py3/; now it also covers the newly added probe/ subdirectory for the MMU threshold probing framework. (6ce1d4c)

ℹ️ Acknowledged — Deferring to Validation Phase

5. src_port_vlans potentially unbound — Good catch. Both the assignment (L330-359) and usage (L544-546) are guarded by the same platform_asic == "broadcom-dnx" condition, so it won't cause a NameError at runtime. That said, the logic here is intentionally kept consistent with the original testQosSaiHeadroomPoolSize in test_qos_sai.py — the code was ported to preserve the existing behavior.

I'd prefer not to refactor this path preemptively at this stage for two reasons:

The current hardware validation (Broadcom TD3/TH2, Cisco Q201L, Mellanox SPC1/SPC3) has not surfaced any issue with this pattern.
There are 9 platform-specific tracking issues (Enhancement: Enable MMU threshold probing test on VS platform #22599–Enhancement: Enable MMU threshold probing test on Mellanox Spectrum 3 platforms #22608) for joint debugging across all ASIC types. As each SKU goes through validation, any edge case in this area will be caught and fixed with a concrete reproduction — which leads to more accurate fixes than speculative changes.

Will revisit once the cross-platform validation rounds are complete.

ℹ️ By Design — No Change Needed

6. URL in tests_mark_conditions.yaml — This is by design. The conditional_mark plugin's update_issue_status() function (in __init__.py) extracts URLs from condition strings via regex, queries GitHub issue status (open/closed), and replaces them with True/False before eval(). So "asic_type in ['vs'] and https://...#22599" becomes "asic_type in ['vs'] and True" when the issue is open. This is a well-established pattern used in 50+ entries across the YAML file.

Cross-Stack Issues (#22540, #22541, #22544)

Will address in the respective PRs separately — thanks for flagging them.

Enable probe framework for production testbed usage with infrastructure updates and physical hardware test cases. Infrastructure Updates: 1. tests/conftest.py: - Add --enable_qos_ptf_pdb option for PTF debugging with pdb breakpoint - Add --ingress_drop_probing option to switch between PFC/Drop probing modes 2. tests/ptf_runner.py: - Add 'probe' subdirectory support alongside 'py3' - Add test_subdir parameter for flexible PTF test location - Enable probe tests to run via PTF runner infrastructure 3. tests/qos/qos_sai_base.py (QosSaiBase refactoring): - Move replaceNonExistentPortId() from TestQosSai to base class - Move updateTestPortIdIp() from TestQosSai to base class - Add bufferConfig to dut_qos_maps fixture for all devices - Enable probe tests to access buffer configuration - Shared utility methods for port ID/IP management 4. tests/qos/test_qos_sai.py: - Remove replaceNonExistentPortId() (moved to base) - Remove updateTestPortIdIp() (moved to base) - Reduce code duplication Production Test Cases: 5. tests/qos/test_qos_probe.py (NEW - 544 lines): - TestQosProbe class for physical testbed probing - test_pfc_xoff_probing: PFC Xoff threshold detection on hardware - test_ingress_drop_probing: Ingress drop threshold detection - test_headroom_pool_probing: Headroom pool size probing - Integrates with existing QoS test infrastructure - Uses physical executors for real hardware validation - Validates probe framework on production testbeds This PR completes the probe framework integration, enabling threshold probing tests to run on physical SONiC testbeds alongside existing QoS tests. Signed-off-by: Xu Chen <[email protected]>

Signed-off-by: Xu Chen <[email protected]>

Fix KeyError in testQosIngressDropProbe for breakout SKUs. Line 206 had [" breakout"] (with leading space) instead of ["breakout"]. Co-authored-by: Copilot <[email protected]> Signed-off-by: Xu Chen <[email protected]>

@staticmethod

Deduplicate find_cell_size() which was identically defined 3 times as nested functions inside testQosPfcXoffProbe, testQosIngressDropProbe, and testQosHeadroomPoolProbe. Now a single @staticmethod on TestQosProbe. Co-authored-by: Copilot <[email protected]> Signed-off-by: Xu Chen <[email protected]>

set() does not support index assignment (portIds[idx] = ...) which replaceNonExistentPortId() uses internally. This is a pre-existing bug from PR sonic-net#8149 (test_qos_sai.py L345), moved here during refactoring. The set() was likely intended for deduplication but breaks item assignment. In practice it rarely triggered because most testbeds have all valid ports, so the index assignment path was never reached. Co-authored-by: Copilot <[email protected]> Signed-off-by: Xu Chen <[email protected]>

The boolean originally indicated whether a test file was in the py3/ subdirectory. With the addition of the probe/ subdirectory for the MMU threshold probing framework, the variable now indicates whether the test is in any subdirectory (py3 or probe). Rename for clarity. Co-authored-by: Copilot <[email protected]> Signed-off-by: Xu Chen <[email protected]>

mssonicbld · 2026-03-23T04:38:46Z

/azp run

azure-pipelines · 2026-03-23T04:38:59Z

Azure Pipelines successfully started running 1 pipeline(s).

tests/qos/test_qos_probe.py

StormLiangMS

⚠️ Approve with 4 findings

[High] Broadcom-specific bcmcmd runs on all platforms
test_qos_probe.py:632-647

testQosHeadroomPoolProbe runs bcmcmd "knetctrl netif show" without first checking sonic_asic_type == "broadcom". The ASIC type check at line 653 only guards TD2/TD3 filtering logic, not the initial bcmcmd invocations. This will crash on Mellanox/Cisco with "command not found".

Suggested fix: Wrap all bcmcmd-related code in if dutTestParams["basicParams"]["sonic_asic_type"] == "broadcom":, and provide a skip or alternative path for non-Broadcom platforms.

[Medium] max() on potentially empty dict
test_qos_probe.py:707

max(xpe_to_testports.keys(), ...)

If xpe_to_testports is empty (bcmcmd fails, unexpected output, all ports filtered), max() raises ValueError.

Suggested fix:

if not xpe_to_testports:
    pytest.skip("No available test ports found for probing")

[Low] set → list fix in replaceNonExistentPortId
qos_sai_base.py:189

Changed from set(portIds) to list(portIds). This is actually a bug fix — the method uses portIds[idx] = ... which requires list indexing. Just verify no other callers still pass sets.

[Low] Unused --ingress_drop_probing CLI option
conftest.py:46

The option is defined with parser.addoption("--ingress_drop_probing", ...) but never consumed via getoption. Both test methods run as separate parametrized tests regardless. Consider implementing the gating logic or removing the unused option.

@StormLiangMS

bcmcmd commands (knetctrl netif show, show pmap) are Broadcom-specific and crash on Mellanox/Cisco with 'command not found'. Wrapped entire XPE port mapping logic in 'if sonic_asic_type == broadcom:' guard. Non-Broadcom platforms use all available test ports directly. Addresses @StormLiangMS review (2026-03-25): bcmcmd runs on all platforms. Co-authored-by: Copilot <[email protected]> Signed-off-by: Xu Chen <[email protected]>

@StormLiangMS

max() on empty dict raises ValueError. Skip test gracefully when no available test ports are found after XPE mapping. Addresses @StormLiangMS review (2026-03-25): max() on potentially empty dict. Co-authored-by: Copilot <[email protected]> Signed-off-by: Xu Chen <[email protected]>

After bcmcmd guard, variables like bcmport_to_sonicport, xpe_to_bcmports are only defined inside the broadcom branch. Logger.info must only reference variables available in both branches. Co-authored-by: Copilot <[email protected]> Signed-off-by: Xu Chen <[email protected]>

mssonicbld · 2026-03-26T13:53:54Z

/azp run

XuChen-MSFT · 2026-03-26T13:54:08Z

@StormLiangMS Re: 4 findings

[High] bcmcmd on non-Broadcom — Fixed (d001e4f): Wrapped entire XPE port mapping logic in if sonic_asic_type == "broadcom":. Non-Broadcom platforms use src_testPortIds directly.

[Medium] max() on empty dict — Fixed (3d8746f): Added if not xpe_to_testports: pytest.skip(...) guard before max().

[Low] set→list in replaceNonExistentPortId — Already fixed in earlier commit (d74eeb7 / c2ff13f). No other callers pass sets — verified by searching all replaceNonExistentPortId call sites.

[Low] Unused --ingress_drop_probing CLI option — This is a pre-reserved parameter for future gating logic (switching between PFC/IngressDrop probing modes via CLI). Currently both tests run as separate methods. Will implement the gating or remove when the probing mode selection is finalized.

azure-pipelines · 2026-03-26T13:54:10Z

Azure Pipelines successfully started running 1 pipeline(s).

XuChen-MSFT requested review from StormLiangMS, bingwang-ms, kperumalbfn, wsycqyz and yxieca February 23, 2026 14:06

XuChen-MSFT requested review from wangxin and xwjiang-ms February 25, 2026 05:36

StormLiangMS reviewed Mar 17, 2026

View reviewed changes

XuChen-MSFT mentioned this pull request Mar 23, 2026

[mmu probing] Refactor: extract _create_algorithms() and _run_algorithms() to ProbingBase #23190

Open

XuChen-MSFT and others added 9 commits March 23, 2026 12:37

fix pre-commit errors

3c1fda8

Signed-off-by: Xu Chen <[email protected]>

fix pre-commit errors

d365ea6

Signed-off-by: Xu Chen <[email protected]>

fix pre-commit errors

0bfc4af

Signed-off-by: Xu Chen <[email protected]>

Add conditional mark for test_qos_probe.py pending platform validation

df2f8b2

Signed-off-by: Xu Chen <[email protected]>

fix: remove leading space typo in breakout config key

453cc95

Fix KeyError in testQosIngressDropProbe for breakout SKUs. Line 206 had [" breakout"] (with leading space) instead of ["breakout"]. Co-authored-by: Copilot <[email protected]> Signed-off-by: Xu Chen <[email protected]>

XuChen-MSFT force-pushed the xuchen3/mmu_probe/pr09-production branch from 6ce1d4c to 98db457 Compare March 23, 2026 04:38

github-advanced-security bot found potential problems Mar 23, 2026

View reviewed changes

tests/qos/test_qos_probe.py Show resolved Hide resolved

tests/qos/test_qos_probe.py Show resolved Hide resolved

tests/qos/test_qos_probe.py Show resolved Hide resolved

tests/qos/test_qos_probe.py Show resolved Hide resolved

tests/qos/test_qos_probe.py Show resolved Hide resolved

StormLiangMS reviewed Mar 25, 2026

View reviewed changes

XuChen-MSFT and others added 3 commits March 26, 2026 21:52

Conversation

XuChen-MSFT commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description of PR

Type of change

Back port request

Approach

What is the motivation for this PR?

How did you do it?

How did you verify/test it?

Any platform specific information?

Supported testbed topology if it's a new test case?

Documentation

Uh oh!

mssonicbld commented Feb 23, 2026

Uh oh!

azure-pipelines bot commented Feb 23, 2026

Uh oh!

yxieca commented Feb 23, 2026

Uh oh!

mssonicbld commented Feb 24, 2026

Uh oh!

azure-pipelines bot commented Feb 24, 2026

Uh oh!

mssonicbld commented Feb 24, 2026

Uh oh!

azure-pipelines bot commented Feb 24, 2026

Uh oh!

XuChen-MSFT commented Feb 25, 2026

Uh oh!

mssonicbld commented Feb 25, 2026

Uh oh!

azure-pipelines bot commented Feb 25, 2026

Uh oh!

XuChen-MSFT commented Feb 25, 2026

Uh oh!

XuChen-MSFT commented Feb 27, 2026

Uh oh!

StormLiangMS left a comment

Choose a reason for hiding this comment

Review — PR #22547 (MMU Probing: Production Tests + Infrastructure)

🔴 Ship-Blockers

⚠️ Design Issues

❓ Question

Cross-Stack Issues (from #22540, #22541, #22544)

Uh oh!

mssonicbld commented Mar 17, 2026

Uh oh!

azure-pipelines bot commented Mar 17, 2026

Uh oh!

XuChen-MSFT commented Mar 17, 2026

✅ Fixed (4 commits pushed)

ℹ️ Acknowledged — Deferring to Validation Phase

ℹ️ By Design — No Change Needed

Cross-Stack Issues (#22540, #22541, #22544)

Uh oh!

mssonicbld commented Mar 23, 2026

Uh oh!

azure-pipelines bot commented Mar 23, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

StormLiangMS left a comment

Choose a reason for hiding this comment

Uh oh!

mssonicbld commented Mar 26, 2026

Uh oh!

XuChen-MSFT commented Mar 26, 2026

Uh oh!

azure-pipelines bot commented Mar 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

XuChen-MSFT commented Feb 23, 2026 •

edited

Loading