Skip to content

[mmu probing] pr08.test: Add integration tests for end-to-end probing workflows#22546

Merged
yxieca merged 8 commits intosonic-net:masterfrom
XuChen-MSFT:xuchen3/mmu_probe/pr08-integration-tests
Mar 24, 2026
Merged

[mmu probing] pr08.test: Add integration tests for end-to-end probing workflows#22546
yxieca merged 8 commits intosonic-net:masterfrom
XuChen-MSFT:xuchen3/mmu_probe/pr08-integration-tests

Conversation

@XuChen-MSFT
Copy link
Contributor

@XuChen-MSFT XuChen-MSFT commented Feb 23, 2026

Description of PR

Summary:

Implement comprehensive integration tests for complete probing workflows using simulation executors for reproducible end-to-end testing.

Test Infrastructure:

  • init.py: Integration test module initialization
  • conftest.py: Shared pytest fixtures for integration testing
  • pytest.ini: Pytest configuration for integration test suite
  • probe_test_helper.py: Helper utilities and test orchestration
    • Simulation environment setup
    • PTF mock integration
    • Test scenario builders
    • Assertion helpers for threshold validation

Integration Test Suites:

  1. test_pfc_xoff_probing.py (883 lines):

    • End-to-end PFC Xoff threshold detection workflows
    • Tests all three algorithm phases (UpperBound → LowerBound → ThresholdRange)
    • Validates observer metrics collection
    • Tests buffer state management
    • Multi-port probing scenarios
  2. test_ingress_drop_probing.py (575 lines):

    • End-to-end ingress drop threshold detection workflows
    • Tests algorithm sequence (UpperBound → LowerBound → ThresholdPoint)
    • Validates drop detection accuracy
    • Tests traffic pattern variations
  3. test_headroom_pool_probing.py (632 lines):

    • End-to-end headroom pool size probing workflows (N→1 pattern)
    • Multi-priority-group iteration testing
    • Tests PG-level threshold detection
    • Validates pool size calculation

All integration tests use simulation executors to ensure deterministic, reproducible results without requiring physical hardware, enabling CI/CD pipeline integration.

Fixes # (issue)

Type of change

  • Bug fix
  • Testbed and Framework(new/improvement)
  • New Test case
    • Skipped for non-supported platforms
  • Test case improvement

Back port request

  • 202205
  • 202305
  • 202311
  • 202405
  • 202411
  • 202505
  • 202511

Approach

What is the motivation for this PR?

qos refactoring

How did you do it?

How did you verify/test it?

Any platform specific information?

Supported testbed topology if it's a new test case?

Documentation

relevant PRs:
[mmu probing] pr01.docs: Add MMU threshold probing framework design
[mmu probing] pr02.probe: Add core probing algorithms with essential data structures
[mmu probing] pr03.probe: Add probing executors and executor registry
[mmu probing] pr04.probe: Add observer pattern for metrics tracking
[mmu probing] pr05.probe: Add stream manager and buffer occupancy controller
[mmu probing] pr06.probe: Add base framework and all probing implementations
[mmu probing] pr07.test: Add comprehensive unit tests for probe framework
[mmu probing] pr08.test: Add integration tests for end-to-end probing workflows
[mmu probing] pr09.test: Add production probe test and infrastructure updates

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@yxieca
Copy link
Collaborator

yxieca commented Feb 23, 2026

Blocking issues:

  1. Typo in key: qosConfig = dutQosConfig['param'][portSpeedCableLength][' breakout'] has a leading space. Likely should be ['breakout'].

  2. Type error in updateTestPortIdIp(): replaceNonExistentPortId(testPortIds, set(portIds)) passes a set; the helper mutates/indexes the list. Use a list instead (e.g., list(portIds) or keep as list).

These likely explain the Pre_test Static Analysis failure. Please fix and re-run checks.

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@XuChen-MSFT
Copy link
Contributor Author

Below is sample test log for running this integration tests for probing workflows.
(be able to run in any environment with pytest installation)

$  cd /mnt/c/ws/repo/sonic-mgmt-int/sonic-mgmt-int/tests/saitests/mock/it && python3 -m pytest . -v
============================================================================================= test session starts ==============================================================================================
platform linux -- Python 3.8.10, pytest-8.3.5, pluggy-1.5.0 -- /usr/bin/python3
cachedir: .pytest_cache
rootdir: /mnt/c/ws/repo/sonic-mgmt-int/sonic-mgmt-int/tests/saitests/mock/it
configfile: pytest.ini
plugins: cov-5.0.0, order-1.3.0
collected 62 items

test_headroom_pool_probing.py::TestHeadroomPoolProbing::test_headroom_pool_2_pgs_normal Warning: Too many PGs (2) for src ports (1)
Warning: Too many DSCPs (2) for src ports (1)
Platform-specific: packet_length=64, cell_occupancy=1
Probing uses: packet_length=64, cell_occupancy=1
Traffic setup completed: 2 flows (1 src ports × 2 PGs -> 1 dst)
================================================================================
[headroom_pool] Starting Headroom Pool Size probing
  Traffic pattern: N src -> 1 dst
  pool_size=200000
  precision_target_ratio=0.005
  enable_precise_detection=True
  executor_env=sim
================================================================================
Flow configs: 2 flows

============================================================
PG #1/2: src=24, dst=28, pg=3
============================================================

[PFC XOFF] Probing threshold...

Upper Bound Probing

| Iter     | Lower     | Candidate | Upper     | Step  | PfcXoff      | Time(s)  | Total(s)  |
|----------|-----------|-----------|-----------|-------|--------------|----------|-----------|
| 1.1.1    | NA        | NA        | 200000    | init  | reached      | 0.00     | 0.00      |
  PFC Upper bound = 200000

Lower Bound Probing

| Iter     | Lower     | Candidate | Upper     | Step  | PfcXoff      | Time(s)  | Total(s)  |
|----------|-----------|-----------|-----------|-------|--------------|----------|-----------|
| 1.2.1    | 100000    | NA        | 200000    | init  | reached      | 0.00     | 0.00      |
| 1.2.2    | 50000     | NA        | 200000    | /2    | reached      | 0.00     | 0.00      |
| 1.2.3    | 25000     | NA        | 200000    | /2    | reached      | 0.00     | 0.00      |
| 1.2.4    | 12500     | NA        | 200000    | /2    | reached      | 0.00     | 0.00      |
| 1.2.5    | 6250      | NA        | 200000    | /2    | reached      | 0.00     | 0.00      |
| 1.2.6    | 3125      | NA        | 200000    | /2    | reached      | 0.00     | 0.00      |
| 1.2.7    | 1562      | NA        | 200000    | /2    | reached      | 0.00     | 0.00      |
| 1.2.8    | 781       | NA        | 200000    | /2    | reached      | 0.00     | 0.00      |
| 1.2.9    | 390       | NA        | 200000    | /2    | unreached    | 0.00     | 0.00      |
  PFC Lower bound = 390

Threshold Range Probing

| Iter     | Lower     | Candidate | Upper     | Step  | PfcXoff      | Time(s)  | Total(s)  |
|----------|-----------|-----------|-----------|-------|--------------|----------|-----------|
| 1.3.1    | 390       | 100195    | 200000    | init  | reached      | 0.00     | 0.00      |
| 1.3.2    | 390       | 50292     | 100195    | <-U   | reached      | 0.00     | 0.00      |
| 1.3.3    | 390       | 25341     | 50292     | <-U   | reached      | 0.00     | 0.00      |
| 1.3.4    | 390       | 12865     | 25341     | <-U   | reached      | 0.00     | 0.00      |
| 1.3.5    | 390       | 6627      | 12865     | <-U   | reached      | 0.00     | 0.00      |
| 1.3.6    | 390       | 3508      | 6627      | <-U   | reached      | 0.00     | 0.00      |
| 1.3.7    | 390       | 1949      | 3508      | <-U   | reached      | 0.00     | 0.00      |
| 1.3.8    | 390       | 1169      | 1949      | <-U   | reached      | 0.00     | 0.00      |
| 1.3.9    | 390       | 779       | 1169      | <-U   | reached      | 0.00     | 0.00      |
| 1.3.10   | 390       | 584       | 779       | <-U   | reached      | 0.00     | 0.00      |
| 1.3.11   | 390       | 487       | 584       | <-U   | unreached    | 0.00     | 0.00      |
| 1.3.12   | 488       | 536       | 584       | L->   | skipped      | 0.00     | 0.00      |
  PFC Range = [488, 584]

Threshold Point Probing

| Iter     | Lower     | Candidate | Upper     | Step  | PfcXoff      | Time(s)  | Total(s)  |
|----------|-----------|-----------|-----------|-------|--------------|----------|-----------|
| 1.4.1    | 489       | 489       | 584       | init  | unreached    | 0.00     | 0.00      |
| 1.4.2    | 491       | 491       | 584       | +2    | unreached    | 0.00     | 0.00      |
| 1.4.3    | 493       | 493       | 584       | +2    | unreached    | 0.00     | 0.00      |
| 1.4.4    | 495       | 495       | 584       | +2    | unreached    | 0.00     | 0.00      |
| 1.4.5    | 497       | 497       | 584       | +2    | unreached    | 0.00     | 0.00      |
| 1.4.6    | 499       | 499       | 584       | +2    | unreached    | 0.00     | 0.00      |
| 1.4.7    | 501       | 501       | 584       | +2    | unreached    | 0.00     | 0.00      |
| 1.4.8    | 503       | 503       | 584       | +2    | unreached    | 0.00     | 0.00      |
| 1.4.9    | 505       | 505       | 584       | +2    | unreached    | 0.00     | 0.00      |
| 1.4.10   | 507       | 507       | 584       | +2    | unreached    | 0.00     | 0.00      |
| 1.4.11   | 509       | 509       | 584       | +2    | unreached    | 0.00     | 0.00      |
| 1.4.12   | 511       | 511       | 584       | +2    | unreached    | 0.00     | 0.00      |
| 1.4.13   | 513       | 513       | 584       | +2    | unreached    | 0.00     | 0.00      |
| 1.4.14   | 515       | 515       | 584       | +2    | unreached    | 0.00     | 0.00      |
| 1.4.15   | 517       | 517       | 584       | +2    | unreached    | 0.00     | 0.00      |
| 1.4.16   | 519       | 519       | 584       | +2    | unreached    | 0.00     | 0.00      |
| 1.4.17   | 521       | 521       | 584       | +2    | unreached    | 0.00     | 0.00      |
| 1.4.18   | 523       | 523       | 584       | +2    | unreached    | 0.00     | 0.00      |
| 1.4.19   | 525       | 525       | 584       | +2    | unreached    | 0.00     | 0.00      |
| 1.4.20   | 527       | 527       | 584       | +2    | unreached    | 0.00     | 0.00      |
| 1.4.21   | 529       | 529       | 584       | +2    | unreached    | 0.00     | 0.00      |
| 1.4.22   | 531       | 531       | 584       | +2    | unreached    | 0.00     | 0.00      |
| 1.4.23   | 533       | 533       | 584       | +2    | unreached    | 0.00     | 0.00      |
| 1.4.24   | 535       | 535       | 584       | +2    | unreached    | 0.00     | 0.00      |
| 1.4.25   | 537       | 537       | 584       | +2    | unreached    | 0.00     | 0.00      |
| 1.4.26   | 539       | 539       | 584       | +2    | unreached    | 0.00     | 0.00      |
| 1.4.27   | 541       | 541       | 584       | +2    | unreached    | 0.00     | 0.00      |
| 1.4.28   | 543       | 543       | 584       | +2    | unreached    | 0.00     | 0.00      |
| 1.4.29   | 545       | 545       | 584       | +2    | unreached    | 0.00     | 0.00      |
| 1.4.30   | 547       | 547       | 584       | +2    | unreached    | 0.00     | 0.00      |
| 1.4.31   | 549       | 549       | 584       | +2    | unreached    | 0.00     | 0.00      |
| 1.4.32   | 551       | 551       | 584       | +2    | unreached    | 0.00     | 0.00      |
| 1.4.33   | 553       | 553       | 584       | +2    | unreached    | 0.00     | 0.00      |
| 1.4.34   | 555       | 555       | 584       | +2    | unreached    | 0.00     | 0.00      |
| 1.4.35   | 557       | 557       | 584       | +2    | unreached    | 0.00     | 0.00      |
| 1.4.36   | 559       | 559       | 584       | +2    | unreached    | 0.00     | 0.00      |
| 1.4.37   | 561       | 561       | 584       | +2    | unreached    | 0.00     | 0.00      |
| 1.4.38   | 563       | 563       | 584       | +2    | unreached    | 0.00     | 0.00      |
| 1.4.39   | 565       | 565       | 584       | +2    | unreached    | 0.00     | 0.00      |
| 1.4.40   | 567       | 567       | 584       | +2    | unreached    | 0.00     | 0.00      |
| 1.4.41   | 569       | 569       | 584       | +2    | unreached    | 0.00     | 0.00      |
| 1.4.42   | 571       | 571       | 584       | +2    | unreached    | 0.00     | 0.00      |
| 1.4.43   | 573       | 573       | 584       | +2    | unreached    | 0.00     | 0.00      |
| 1.4.44   | 575       | 575       | 584       | +2    | unreached    | 0.00     | 0.00      |
| 1.4.45   | 577       | 577       | 584       | +2    | unreached    | 0.00     | 0.00      |
| 1.4.46   | 579       | 579       | 584       | +2    | unreached    | 0.00     | 0.00      |
| 1.4.47   | 581       | 581       | 584       | +2    | unreached    | 0.00     | 0.00      |
| 1.4.48   | 583       | 583       | 584       | +2    | unreached    | 0.00     | 0.00      |

[Ingress Drop] Probing threshold...

Upper Bound Probing

| Iter     | Lower     | Candidate | Upper     | Step  | IngressDrop  | Time(s)  | Total(s)  |
|----------|-----------|-----------|-----------|-------|--------------|----------|-----------|
| 1.5.1    | NA        | NA        | 200000    | init  | reached      | 0.00     | 0.00      |
  Drop Upper bound = 200000

Lower Bound Probing

| Iter     | Lower     | Candidate | Upper     | Step  | IngressDrop  | Time(s)  | Total(s)  |
|----------|-----------|-----------|-----------|-------|--------------|----------|-----------|
| 1.6.1    | 487       | NA        | 200000    | init  | unreached    | 0.00     | 0.00      |
  Drop Lower bound = 487

Threshold Range Probing

| Iter     | Lower     | Candidate | Upper     | Step  | IngressDrop  | Time(s)  | Total(s)  |
|----------|-----------|-----------|-----------|-------|--------------|----------|-----------|
| 1.7.1    | 487       | 100243    | 200000    | init  | reached      | 0.00     | 0.00      |
| 1.7.2    | 487       | 50365     | 100243    | <-U   | reached      | 0.00     | 0.00      |
| 1.7.3    | 487       | 25426     | 50365     | <-U   | reached      | 0.00     | 0.00      |
| 1.7.4    | 487       | 12956     | 25426     | <-U   | reached      | 0.00     | 0.00      |
| 1.7.5    | 487       | 6721      | 12956     | <-U   | reached      | 0.00     | 0.00      |
| 1.7.6    | 487       | 3604      | 6721      | <-U   | reached      | 0.00     | 0.00      |
| 1.7.7    | 487       | 2045      | 3604      | <-U   | reached      | 0.00     | 0.00      |
| 1.7.8    | 487       | 1266      | 2045      | <-U   | reached      | 0.00     | 0.00      |
| 1.7.9    | 487       | 876       | 1266      | <-U   | reached      | 0.00     | 0.00      |
| 1.7.10   | 487       | 681       | 876       | <-U   | reached      | 0.00     | 0.00      |
| 1.7.11   | 487       | 584       | 681       | <-U   | reached      | 0.00     | 0.00      |
| 1.7.12   | 487       | 535       | 584       | <-U   | skipped      | 0.00     | 0.00      |
  Drop Range = [487, 584]

Threshold Point Probing

| Iter     | Lower     | Candidate | Upper     | Step  | IngressDrop  | Time(s)  | Total(s)  |
|----------|-----------|-----------|-----------|-------|--------------|----------|-----------|
| 1.8.1    | 488       | 488       | 584       | init  | unreached    | 0.00     | 0.00      |
| 1.8.2    | 490       | 490       | 584       | +2    | unreached    | 0.00     | 0.00      |
| 1.8.3    | 492       | 492       | 584       | +2    | unreached    | 0.00     | 0.00      |
| 1.8.4    | 494       | 494       | 584       | +2    | unreached    | 0.00     | 0.00      |
| 1.8.5    | 496       | 496       | 584       | +2    | unreached    | 0.00     | 0.00      |
| 1.8.6    | 498       | 498       | 584       | +2    | unreached    | 0.00     | 0.00      |
| 1.8.7    | 500       | 500       | 584       | +2    | unreached    | 0.00     | 0.00      |
| 1.8.8    | 502       | 502       | 584       | +2    | unreached    | 0.00     | 0.00      |
| 1.8.9    | 504       | 504       | 584       | +2    | unreached    | 0.00     | 0.00      |
| 1.8.10   | 506       | 506       | 584       | +2    | unreached    | 0.00     | 0.00      |
| 1.8.11   | 508       | 508       | 584       | +2    | unreached    | 0.00     | 0.00      |
| 1.8.12   | 510       | 510       | 584       | +2    | unreached    | 0.00     | 0.00      |
| 1.8.13   | 512       | 512       | 584       | +2    | unreached    | 0.00     | 0.00      |
| 1.8.14   | 514       | 514       | 584       | +2    | unreached    | 0.00     | 0.00      |
| 1.8.15   | 516       | 516       | 584       | +2    | unreached    | 0.00     | 0.00      |
| 1.8.16   | 518       | 518       | 584       | +2    | unreached    | 0.00     | 0.00      |
| 1.8.17   | 520       | 520       | 584       | +2    | unreached    | 0.00     | 0.00      |
| 1.8.18   | 522       | 522       | 584       | +2    | unreached    | 0.00     | 0.00      |
| 1.8.19   | 524       | 524       | 584       | +2    | unreached    | 0.00     | 0.00      |
| 1.8.20   | 526       | 526       | 584       | +2    | unreached    | 0.00     | 0.00      |
| 1.8.21   | 528       | 528       | 584       | +2    | unreached    | 0.00     | 0.00      |
| 1.8.22   | 530       | 530       | 584       | +2    | unreached    | 0.00     | 0.00      |
| 1.8.23   | 532       | 532       | 584       | +2    | unreached    | 0.00     | 0.00      |
| 1.8.24   | 534       | 534       | 584       | +2    | unreached    | 0.00     | 0.00      |
| 1.8.25   | 536       | 536       | 584       | +2    | unreached    | 0.00     | 0.00      |
| 1.8.26   | 538       | 538       | 584       | +2    | unreached    | 0.00     | 0.00      |
| 1.8.27   | 540       | 540       | 584       | +2    | unreached    | 0.00     | 0.00      |
| 1.8.28   | 542       | 542       | 584       | +2    | unreached    | 0.00     | 0.00      |
| 1.8.29   | 544       | 544       | 584       | +2    | unreached    | 0.00     | 0.00      |
| 1.8.30   | 546       | 546       | 584       | +2    | unreached    | 0.00     | 0.00      |
| 1.8.31   | 548       | 548       | 584       | +2    | unreached    | 0.00     | 0.00      |
| 1.8.32   | 550       | 550       | 584       | +2    | unreached    | 0.00     | 0.00      |
| 1.8.33   | 552       | 552       | 584       | +2    | unreached    | 0.00     | 0.00      |
| 1.8.34   | 554       | 554       | 584       | +2    | unreached    | 0.00     | 0.00      |
| 1.8.35   | 556       | 556       | 584       | +2    | unreached    | 0.00     | 0.00      |
| 1.8.36   | 558       | 558       | 584       | +2    | unreached    | 0.00     | 0.00      |
| 1.8.37   | 560       | 560       | 584       | +2    | unreached    | 0.00     | 0.00      |
| 1.8.38   | 562       | 562       | 584       | +2    | unreached    | 0.00     | 0.00      |
| 1.8.39   | 564       | 564       | 584       | +2    | unreached    | 0.00     | 0.00      |
| 1.8.40   | 566       | 566       | 584       | +2    | unreached    | 0.00     | 0.00      |
| 1.8.41   | 568       | 568       | 584       | +2    | unreached    | 0.00     | 0.00      |
| 1.8.42   | 570       | 570       | 584       | +2    | unreached    | 0.00     | 0.00      |
| 1.8.43   | 572       | 572       | 584       | +2    | unreached    | 0.00     | 0.00      |
| 1.8.44   | 574       | 574       | 584       | +2    | unreached    | 0.00     | 0.00      |
| 1.8.45   | 576       | 576       | 584       | +2    | unreached    | 0.00     | 0.00      |
| 1.8.46   | 578       | 578       | 584       | +2    | unreached    | 0.00     | 0.00      |
| 1.8.47   | 580       | 580       | 584       | +2    | unreached    | 0.00     | 0.00      |
| 1.8.48   | 582       | 582       | 584       | +2    | unreached    | 0.00     | 0.00      |
| 1.8.49   | 584       | 584       | 584       | +2    | unreached    | 0.00     | 0.00      |
  Headroom = 487 - 488 = -1
  Using Port counter mode: persist with margin (485 = 487 - 2 step_size)

[Result] PG #1 Headroom = -1 cells
         Total accumulated = -1 cells

[Pool Exhausted] Headroom = -1 cells (<= 2)
         Terminating probing

Total probing time: 0.00 minutes (0.0 seconds)

============================================================
FINAL RESULTS
============================================================
PGs probed: 1
Status: SUCCESS - Pool exhaustion detected
Total Headroom Pool Size: 0 cells
Detected pg_min: 488 cells
Headroom Pool probing result: point [0, 0] cells
[PASS] 2 PG: Probe executed successfully, observer output displayed
       PFC XOFF, Ingress Drop, and all algorithms ran correctly
       (Pool exhaustion not required for IT test validation)
PASSED
test_headroom_pool_probing.py::TestHeadroomPoolProbing::test_headroom_pool_4_pgs_normal Warning: Too many PGs (4) for src ports (1)
Warning: Too many DSCPs (4) for src ports (1)
Platform-specific: packet_length=64, cell_occupancy=1
Probing uses: packet_length=64, cell_occupancy=1
Traffic setup completed: 4 flows (1 src ports × 4 PGs -> 1 dst)
================================================================================
[headroom_pool] Starting Headroom Pool Size probing
  Traffic pattern: N src -> 1 dst
  pool_size=200000
  precision_target_ratio=0.005
  enable_precise_detection=True
  executor_env=sim
================================================================================
Flow configs: 4 flows

============================================================
PG #1/4: src=24, dst=28, pg=3
============================================================

[PFC XOFF] Probing threshold...

Upper Bound Probing

| Iter     | Lower     | Candidate | Upper     | Step  | PfcXoff      | Time(s)  | Total(s)  |
|----------|-----------|-----------|-----------|-------|--------------|----------|-----------|
| 1.1.1    | NA        | NA        | 200000    | init  | reached      | 0.00     | 0.00      |
  PFC Upper bound = 200000

Lower Bound Probing

| Iter     | Lower     | Candidate | Upper     | Step  | PfcXoff      | Time(s)  | Total(s)  |
|----------|-----------|-----------|-----------|-------|--------------|----------|-----------|
| 1.2.1    | 100000    | NA        | 200000    | init  | reached      | 0.00     | 0.00      |
| 1.2.2    | 50000     | NA        | 200000    | /2    | reached      | 0.00     | 0.00      |
| 1.2.3    | 25000     | NA        | 200000    | /2    | reached      | 0.00     | 0.00      |
| 1.2.4    | 12500     | NA        | 200000    | /2    | reached      | 0.00     | 0.00      |
| 1.2.5    | 6250      | NA        | 200000    | /2    | reached      | 0.00     | 0.00      |
| 1.2.6    | 3125      | NA        | 200000    | /2    | reached      | 0.00     | 0.00      |
| 1.2.7    | 1562      | NA        | 200000    | /2    | reached      | 0.00     | 0.00      |
| 1.2.8    | 781       | NA        | 200000    | /2    | reached      | 0.00     | 0.00      |
| 1.2.9    | 390       | NA        | 200000    | /2    | unreached    | 0.00     | 0.00      |
  PFC Lower bound = 390
  
  ... omitted ...
  
  [ERROR] Lower bound detection failed
PFC XOFF probing result: failed
[PASS] Always PFC: Edge case handled, probing failed as expected
PASSED
test_pfc_xoff_probing.py::TestPfcXoffProbing::test_pfc_xoff_inconsistent_results Platform-specific: packet_length=64, cell_occupancy=1
Probing uses: packet_length=64, cell_occupancy=1
================================================================================
[pfc_xoff] Starting threshold probing
  src_port=24, dst_port=28
  pool_size=200000
  precision_target_ratio=0.05
  enable_precise_detection=False
  executor_env=sim
================================================================================

Upper Bound Probing

| Iter     | Lower     | Candidate | Upper     | Step  | PfcXoff      | Time(s)  | Total(s)  |
|----------|-----------|-----------|-----------|-------|--------------|----------|-----------|
| 1.1      | NA        | NA        | 200000    | init  | reached      | 0.00     | 0.00      |

Lower Bound Probing

| Iter     | Lower     | Candidate | Upper     | Step  | PfcXoff      | Time(s)  | Total(s)  |
|----------|-----------|-----------|-----------|-------|--------------|----------|-----------|
| 2.1      | 100000    | NA        | 200000    | init  | reached      | 0.00     | 0.00      |
[ERROR] Lower bound detection failed
PFC XOFF probing result: failed
[PASS] Inconsistent results: Extreme inconsistency handled, probing failed as expected
PASSED
test_pfc_xoff_probing.py::TestPfcXoffProbing::test_pfc_xoff_multi_verification_default_5_attempts Platform-specific: packet_length=64, cell_occupancy=1
Probing uses: packet_length=64, cell_occupancy=1
================================================================================
[pfc_xoff] Starting threshold probing
  src_port=24, dst_port=28
  pool_size=200000
  precision_target_ratio=0.05
  enable_precise_detection=False
  executor_env=sim
================================================================================

Upper Bound Probing

| Iter     | Lower     | Candidate | Upper     | Step  | PfcXoff      | Time(s)  | Total(s)  |
|----------|-----------|-----------|-----------|-------|--------------|----------|-----------|
| 1.1      | NA        | NA        | 200000    | init  | reached      | 0.00     | 0.00      |

Lower Bound Probing

| Iter     | Lower     | Candidate | Upper     | Step  | PfcXoff      | Time(s)  | Total(s)  |
|----------|-----------|-----------|-----------|-------|--------------|----------|-----------|
| 2.1      | 100000    | NA        | 200000    | init  | reached      | 0.00     | 0.00      |
| 2.2      | 50000     | NA        | 200000    | /2    | reached      | 0.00     | 0.00      |
| 2.3      | 25000     | NA        | 200000    | /2    | reached      | 0.00     | 0.00      |
| 2.4      | 12500     | NA        | 200000    | /2    | reached      | 0.00     | 0.00      |
| 2.5      | 6250      | NA        | 200000    | /2    | reached      | 0.00     | 0.00      |
| 2.6      | 3125      | NA        | 200000    | /2    | reached      | 0.00     | 0.00      |
| 2.7      | 1562      | NA        | 200000    | /2    | reached      | 0.00     | 0.00      |
| 2.8      | 781       | NA        | 200000    | /2    | unreached    | 0.00     | 0.00      |

Threshold Range Probing

| Iter     | Lower     | Candidate | Upper     | Step  | PfcXoff      | Time(s)  | Total(s)  |
|----------|-----------|-----------|-----------|-------|--------------|----------|-----------|
| 3.1      | 781       | 100390    | 200000    | init  | reached      | 0.00     | 0.00      |
| 3.2      | 781       | 50585     | 100390    | <-U   | reached      | 0.00     | 0.00      |
| 3.3      | 781       | 25683     | 50585     | <-U   | reached      | 0.00     | 0.00      |
| 3.4      | 781       | 13232     | 25683     | <-U   | reached      | 0.00     | 0.00      |
| 3.5      | 781       | 7006      | 13232     | <-U   | reached      | 0.00     | 0.00      |
| 3.6      | 781       | 3893      | 7006      | <-U   | reached      | 0.00     | 0.00      |
| 3.7      | 781       | 2337      | 3893      | <-U   | reached      | 0.00     | 0.00      |
| 3.8      | 781       | 1559      | 2337      | <-U   | reached      | 0.00     | 0.00      |
| 3.9      | 781       | 1170      | 1559      | <-U   | unreached    | 0.00     | 0.00      |
| 3.10     | 1171      | 1365      | 1559      | L->   | reached      | 0.00     | 0.00      |
| 3.11     | 1171      | 1268      | 1365      | <-U   | reached      | 0.00     | 0.00      |
| 3.12     | 1171      | 1219      | 1268      | <-U   | reached      | 0.00     | 0.00      |
| 3.13     | 1171      | 1195      | 1219      | <-U   | skipped      | 0.00     | 0.00      |
PFC XOFF probing result: range [1171, 1219] pkt
[PASS] Multi-verification default behavior validated:
      threshold=1200, result=[1171, 1219]
      range=48 cells
      -> Default max_attempts=5 mechanism working correctly
PASSED

============================================================================================== 62 passed in 2.02s ==============================================================================================
xuchen3@xuchen3-devbox:/mnt/c/ws/repo/sonic-mgmt-int/sonic-mgmt-int/tests/saitests/mock/it

@XuChen-MSFT
Copy link
Contributor Author

Blocking issues:

  1. Typo in key: qosConfig = dutQosConfig['param'][portSpeedCableLength][' breakout'] has a leading space. Likely should be ['breakout'].
  2. Type error in updateTestPortIdIp(): replaceNonExistentPortId(testPortIds, set(portIds)) passes a set; the helper mutates/indexes the list. Use a list instead (e.g., list(portIds) or keep as list).

These likely explain the Pre_test Static Analysis failure. Please fix and re-run checks.

@yxieca Thanks for the review.
The static analysis failures have been resolved.

@yxieca
Copy link
Collaborator

yxieca commented Feb 27, 2026

Deep review done; overall looks good. Two minor nits:

  1. now uses ast.literal_eval. If this param is already a list (not a string), this will break. Consider guarding for type.
  2. probe_test_helper does broad sys.modules patching; ensure it stays isolated to tests under tests/saitests/mock/it (pytest.ini helps).

Also DCO is failing — please add sign-off and update commits.

kazinator-arista pushed a commit to kazinator-arista/sonic-mgmt that referenced this pull request Mar 4, 2026
…ic-net#22546)

<!--
 Please make sure you've read and understood our contributing guidelines:
 https://github.com/Azure/SONiC/blob/gh-pages/CONTRIBUTING.md

 failure_prs.log skip_prs.log Make sure all your commits include a signature generated with `git commit -s` **

 If this is a bug fix, make sure your description includes "fixes #xxxx", or
 "closes #xxxx" or "resolves #xxxx"

 Please provide the following information:
-->

#### Why I did it

In the case of ASIC detection failures on Broadcom (or if the ASIC couldn't be detected in time), the `/dev/shm` partition in the syncd container will be only 64MB, which might cause issues if syncd/Broadcom SAI library needs more space than that.

##### Work item tracking
- Microsoft ADO **(number only)**:

#### How I did it

Since using a larger `/dev/shm` on its own doesn't cause any issues, bump up the default to 512MB. This should be enough for most platforms.

#### How to verify it

<!--
If PR needs to be backported, then the PR must be tested against the base branch and the earliest backport release branch and provide tested image version on these two branches. For example, if the PR is requested for master, 202211 and 202012, then the requester needs to provide test results on master and 202012.
-->

#### Which release branch to backport (provide reason below if selected)

<!--
- Note we only backport fixes to a release branch, *not* features!
- Please also provide a reason for the backporting below.
- e.g.
- [x] 202006
-->

- [ ] 201811
- [ ] 201911
- [ ] 202006
- [ ] 202012
- [ ] 202106
- [ ] 202111
- [ ] 202205
- [ ] 202211
- [ ] 202305

#### Tested branch (Please provide the tested image version)

<!--
- Please provide tested image version
- e.g.
- [x] 20201231.100
-->

- [ ] <!-- image version 1 -->
- [ ] <!-- image version 2 -->

#### Description for the changelog
<!--
Write a short (one line) summary that describes the changes in this
pull request for inclusion in the changelog:
-->

<!--
 Ensure to add label/tag for the feature raised. example - PR#2174 under sonic-utilities repo. where, Generic Config and Update feature has been labelled as GCU.
-->

#### Link to config_db schema for YANG module changes
<!--
Provide a link to config_db schema for the table for which YANG model
is defined
Link should point to correct section on https://github.com/Azure/sonic-buildimage/blob/master/src/sonic-yang-models/doc/Configuration.md
-->

#### A picture of a cute animal (not mandatory but encouraged)
@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@XuChen-MSFT
Copy link
Contributor Author

Added 3 integration tests + Python 3.12 compatibility fix (f55692d):

New IT tests:

  • test_pfc_xoff_threshold_at_one: boundary value 1 (validates lower-bound break fix)
  • test_pfc_xoff_threshold_at_two: boundary value 2 (binary search minimum)
  • test_pfc_xoff_point_probing_with_intermittent_failures: end-to-end drain recovery

Python 3.12 fix in probe_test_helper.py:

  • Added __path__ = [] to scapy mock (Python 3.12+ import system requires this to recognize MagicMock as a package)
  • Registered scapy.layers and scapy.layers.inet6 submodule mocks
  • Backward compatible with Python 3.8

IT total: 62 → 65. See PR #22540 for the corresponding source code fixes.

@mssonicbld
Copy link
Collaborator

/azp run

@XuChen-MSFT
Copy link
Contributor Author

Added 3 more IT tests for ingress drop probing (43dd35e):

  • test_ingress_drop_threshold_at_one: boundary value 1 (lower-bound break)
  • test_ingress_drop_threshold_at_two: boundary value 2 (binary search min)
  • test_ingress_drop_point_probing_with_intermittent_failures: drain recovery

IT total: 65 → 68. Same patterns as PFC XOFF boundary tests.

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@XuChen-MSFT
Copy link
Contributor Author

Added 2 integration tests for anti-oscillation validation (4b59778):

IT total: 68 -> 70. Validates algorithm fix in PR #22540 (036f27c).

XuChen-MSFT added a commit to XuChen-MSFT/sonic-mgmt that referenced this pull request Mar 18, 2026
When candidate_threshold is small (e.g. 10), precision target
candidate * 0.05 = 0.5 < 1. With bad_spot at the threshold value,
range_size stays at 1 but 1 <= 0.5 is never satisfied, burning all
50 max_iterations. Use max(1, ...) to ensure precision check can
terminate when range narrows to 1 packet granularity.

Validated by UT (PR sonic-net#22545) and IT (PR sonic-net#22546) — both FAIL without
this fix (50 iterations), PASS with fix (~18 iterations).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Xu Chen <xuchen3@microsoft.com>
@mssonicbld
Copy link
Collaborator

/azp run

@XuChen-MSFT
Copy link
Contributor Author

Added 2 ITs for precision check with small threshold + bad_spot (eeabfda):

  • test_pfc_xoff_small_threshold_precision: threshold=10, bad_spot=[10], captures Phase 3 iterations
  • test_ingress_drop_small_threshold_precision: same pattern

Without fix: Phase 3 burns 50 iterations (max_iterations). With fix: ~18 iterations (exits via precision_reached).
IT total: 70 -> 72. Validates fix in PR #22540 (12bbc07).

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

XuChen-MSFT added a commit to XuChen-MSFT/sonic-mgmt that referenced this pull request Mar 23, 2026
Refactored multi-PG probe loop from 6 scattered 'continue' statements
to while-True single-pass block with unified cleanup:
- break + fail_reason on any phase failure
- pg_success flag tracks completion
- Single drain_buffer([dst_port_id]) call in cleanup block

This ensures buffer state is always drained before moving to the next PG,
preventing corrupted buffer from affecting subsequent PG probing.

UT coverage: PR sonic-net#22545 (3d75029) — 7 new tests
IT coverage: PR sonic-net#22546 (14a29c2) — 2 new tests

Addresses @StormLiangMS review: continue on PG failure skips buffer cleanup.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@XuChen-MSFT
Copy link
Contributor Author

Added 2 ITs for multi-PG buffer isolation in headroom pool (14a29c2):

  • test_headroom_pool_buffer_cleanup_on_pg_failure: 2 PGs, verify probe handles PG failure without crash
  • test_headroom_pool_multi_pg_isolation: 3 PGs, verify independent results

Validates fix in PR #22544 (7c6b4fa).

IT headroom total: 15 → 17.

@XuChen-MSFT
Copy link
Contributor Author

@yxieca Re: ast.literal_eval type guard

This has been addressed — ast.literal_eval is no longer used in the current code. The parameter handling was updated in earlier commits.

@XuChen-MSFT
Copy link
Contributor Author

@yxieca Re: sys.modules patching isolation

Currently isolated by 3 mechanisms:

  1. pytest.ini in mock/it/ has testpaths = . — only collects IT tests in this directory
  2. setup_test_environment() is explicitly called at the top of each IT file, not auto-triggered
  3. IT and UT have separate pytest.ini files — never run in the same pytest session

This will be further validated during lightning pipeline integration, where the actual test execution flow (PTF runner → SAI tests) will confirm that sys.modules patching in IT does not affect physical test execution.

@yxieca
Copy link
Collaborator

yxieca commented Mar 23, 2026

@XuChen-MSFT can you address the pre check failure and DCO?

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

XuChen-MSFT added a commit to XuChen-MSFT/sonic-mgmt that referenced this pull request Mar 24, 2026
When candidate_threshold is small (e.g. 10), precision target
candidate * 0.05 = 0.5 < 1. With bad_spot at the threshold value,
range_size stays at 1 but 1 <= 0.5 is never satisfied, burning all
50 max_iterations. Use max(1, ...) to ensure precision check can
terminate when range narrows to 1 packet granularity.

Validated by UT (PR sonic-net#22545) and IT (PR sonic-net#22546) — both FAIL without
this fix (50 iterations), PASS with fix (~18 iterations).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Xu Chen <xuchen3@microsoft.com>
XuChen-MSFT added a commit to XuChen-MSFT/sonic-mgmt that referenced this pull request Mar 24, 2026
Refactored multi-PG probe loop from 6 scattered 'continue' statements
to while-True single-pass block with unified cleanup:
- break + fail_reason on any phase failure
- pg_success flag tracks completion
- Single drain_buffer([dst_port_id]) call in cleanup block

This ensures buffer state is always drained before moving to the next PG,
preventing corrupted buffer from affecting subsequent PG probing.

UT coverage: PR sonic-net#22545 (3d75029) — 7 new tests
IT coverage: PR sonic-net#22546 (14a29c2) — 2 new tests

Addresses @StormLiangMS review: continue on PG failure skips buffer cleanup.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Xu Chen <xuchen3@microsoft.com>
XuChen-MSFT and others added 8 commits March 24, 2026 12:59
Implement comprehensive integration tests for complete probing workflows
using simulation executors for reproducible end-to-end testing.

Test Infrastructure:
- __init__.py: Integration test module initialization
- conftest.py: Shared pytest fixtures for integration testing
- pytest.ini: Pytest configuration for integration test suite
- probe_test_helper.py: Helper utilities and test orchestration
  - Simulation environment setup
  - PTF mock integration
  - Test scenario builders
  - Assertion helpers for threshold validation

Integration Test Suites:

1. test_pfc_xoff_probing.py (883 lines):
   - End-to-end PFC Xoff threshold detection workflows
   - Tests all three algorithm phases (UpperBound → LowerBound → ThresholdRange)
   - Validates observer metrics collection
   - Tests buffer state management
   - Multi-port probing scenarios

2. test_ingress_drop_probing.py (575 lines):
   - End-to-end ingress drop threshold detection workflows
   - Tests algorithm sequence (UpperBound → LowerBound → ThresholdPoint)
   - Validates drop detection accuracy
   - Tests traffic pattern variations

3. test_headroom_pool_probing.py (632 lines):
   - End-to-end headroom pool size probing workflows (N→1 pattern)
   - Multi-priority-group iteration testing
   - Tests PG-level threshold detection
   - Validates pool size calculation

All integration tests use simulation executors to ensure deterministic,
reproducible results without requiring physical hardware, enabling
CI/CD pipeline integration.

Signed-off-by: Xu Chen <xuchen3@microsoft.com>
Signed-off-by: Xu Chen <xuchen3@microsoft.com>
New IT test cases (3):
- test_pfc_xoff_threshold_at_one: boundary value 1 (lower-bound break)
- test_pfc_xoff_threshold_at_two: boundary value 2 (binary search min)
- test_pfc_xoff_point_probing_with_intermittent_failures: drain recovery

Python 3.12 compatibility fix in probe_test_helper.py:
- Add __path__ attribute to scapy mock (required by Python 3.12+ import
  system to recognize MagicMock as a package)
- Register scapy.layers and scapy.layers.inet6 submodule mocks
- Backward compatible with Python 3.8

IT total: 62 -> 65

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Xu Chen <xuchen3@microsoft.com>
New IT test cases (3):
- test_ingress_drop_threshold_at_one: boundary value 1
- test_ingress_drop_threshold_at_two: boundary value 2
- test_ingress_drop_point_probing_with_intermittent_failures: drain recovery

IT total: 65 -> 68

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Xu Chen <xuchen3@microsoft.com>
New IT tests (+2):
- PFC XOFF: test_pfc_xoff_range_oscillation_high_failure_rate
- Ingress Drop: test_ingress_drop_range_oscillation_bad_spot

Both use bad_spot scenario to verify Phase 3 anti-oscillation:
capture observer markdown output, parse candidate column, assert
no candidate is tested more than 3 times.

IT total: 68 -> 70

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Xu Chen <xuchen3@microsoft.com>
New IT cases (+2):
- test_pfc_xoff_small_threshold_precision: threshold=10, bad_spot=[10]
- test_ingress_drop_small_threshold_precision: same pattern
Both capture Phase 3 iteration count — without fix: 50 (max_iterations),
with fix: ~18 (exits via precision_reached).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Xu Chen <xuchen3@microsoft.com>
- test_headroom_pool_buffer_cleanup_on_pg_failure: 2 PGs, verify probe
  completes without crash when PG fails
- test_headroom_pool_multi_pg_isolation: 3 PGs, verify all PGs produce
  independent results

Related: PR sonic-net#22544 fix (while-True unified cleanup)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Xu Chen <xuchen3@microsoft.com>
- E741: rename ambiguous variable 'l' to 'line' in list comprehensions
  (test_ingress_drop_probing.py, test_pfc_xoff_probing.py)
- F541: remove unnecessary f-string prefix from string without placeholders
  (test_pfc_xoff_probing.py)

Signed-off-by: Xu Chen <xuchen3@microsoft.com>
@XuChen-MSFT XuChen-MSFT force-pushed the xuchen3/mmu_probe/pr08-integration-tests branch from 2467d69 to 1919215 Compare March 24, 2026 04:59
@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Collaborator

@yxieca yxieca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. AI agent on behalf of Ying.

@yxieca yxieca merged commit 6b7ad39 into sonic-net:master Mar 24, 2026
15 checks passed
ravaliyel pushed a commit to ravaliyel/sonic-mgmt that referenced this pull request Mar 27, 2026
… workflows (sonic-net#22546)

What is the motivation for this PR\nqos refactoring\n\nHow did you do it\nImplement comprehensive integration tests for complete probing workflows using simulation executors for reproducible end-to-end testing.\n\nHow did you verify/test it\nNot specified in PR.\n\nSigned-off-by\nSigned-off-by: Xu Chen <xuchen3@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants