Skip to content

Dualtor AA enablement for QOS SAI#17765

Merged
kevinskwang merged 3 commits intosonic-net:masterfrom
rbpittman:qos_sai_dualtor_aa
Apr 2, 2025
Merged

Dualtor AA enablement for QOS SAI#17765
kevinskwang merged 3 commits intosonic-net:masterfrom
rbpittman:qos_sai_dualtor_aa

Conversation

@rbpittman
Copy link
Contributor

Description of PR

Summary:

  • Update test mark conditions to include dualtor-aa for every duplicate instance of the QOS SAI topo list (all lists are identical). Possible de-duplication here: Tests mark constants #17707
  • Only toggle simulator ports if active-standby.
  • Fix permissions on created empty write_standby file.

Type of change

  • Bug fix
  • Testbed and Framework(new/improvement)
  • New Test case
    • Skipped for non-supported platforms
  • Test case improvement

Back port request

  • 202012
  • 202205
  • 202305
  • 202311
  • 202405
  • 202411

Approach

What is the motivation for this PR?

How did you do it?

How did you verify/test it?

Ran on dualtor AA cisco-8000, verified tests execute without major errors. Some tests fail due to some normal instability and known issues.

Any platform specific information?

Supported testbed topology if it's a new test case?

Documentation

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Contributor

@wsycqyz wsycqyz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@wsycqyz
Copy link
Contributor

wsycqyz commented Apr 2, 2025

PR test failed. Re-try.

Copy link
Contributor

@kevinskwang kevinskwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kevinskwang kevinskwang merged commit 9641f98 into sonic-net:master Apr 2, 2025
19 checks passed
- https://github.com/sonic-net/sonic-mgmt/issues/12906
- "topo_type in ['m0', 'mx', 'm1', 'm2', 'm3']"
- "topo_name not in ['t0', 't0-64', 't0-116', 't0-118', 't0-35', 't0-56', 't0-standalone-32', 't0-standalone-64', 't0-standalone-128', 't0-standalone-256', 'dualtor-56', 'dualtor-120', 'dualtor', 't0-80', 't0-backend', 't1-lag', 't1-28-lag', 't1-64-lag', 't1-56-lag', 't1-backend', 't2', 't2_2lc_36p-masic', 't2_2lc_min_ports-masic'] and asic_type not in ['mellanox']"
- "topo_name not in ['t0', 't0-64', 't0-116', 't0-118', 't0-35', 't0-56', 't0-standalone-32', 't0-standalone-64', 't0-standalone-128', 't0-standalone-256', 'dualtor-56', 'dualtor-120', 'dualtor', 'dualtor-aa', 't0-80', 't0-backend', 't1-lag', 't1-28-lag', 't1-64-lag', 't1-56-lag', 't1-backend', 't2', 't2_2lc_36p-masic', 't2_2lc_min_ports-masic'] and asic_type not in ['mellanox']"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing dualtor-aa-56, dualtor-aa-120, dualtor-aa-64, dualtor-aa-64-breakout?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some discussion led us to coordinate on dualtor-aa, and that is the topo I've tested on. If you can confirm you'd like the rest of the dualtor-aa series I can file a PR.
This duplicated list should also be fixed, my proposed fix is #17707 . If there's an alternative for QOS SAI it should be filed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The duplication is now resulting in duplicated merge conflict issues. This needs to be resolved.

is_dualtor_active_standby = is_dualtor and active_standby_ports
""" Stop mux container for dual ToR Active-Standby """
if is_dualtor:
if is_dualtor_active_standby:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the dualtor is active-active, the practice is to set the active-active tors pair into active-standby so all upstream packets will only be forwarded to the active side; otherwise, the upstream packets will be ECMPed.
Please refer to #16549.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does achieve putting the upper tor into standby state, though it'd be preferrable to use something much more simple as in 16549 and eliminate a lot of the current AS code.
The fixtures in 16549 default to function scope, and QOS SAI needs a class scope. So the change is a bit non-trivial, requiring some more thorough testing.

@rbpittman rbpittman deleted the qos_sai_dualtor_aa branch April 2, 2025 13:35
@wsycqyz
Copy link
Contributor

wsycqyz commented Apr 3, 2025

@kevinskwang should we cherry-pick to 202411?

rbpittman added a commit to rbpittman/sonic-mgmt that referenced this pull request Apr 9, 2025
* Apply diff for dualtor AA QOS SAI with list change duplicated for all qos sai topo lists.

* Add back a newline.
@rbpittman
Copy link
Contributor Author

@wsycqyz
Double commit has a merge conflict, filed manual:
#17911

bpar9 pushed a commit that referenced this pull request Jun 13, 2025
* Apply diff for dualtor AA QOS SAI with list change duplicated for all qos sai topo lists.

* Add back a newline.
sdszhang pushed a commit to sdszhang/sonic-mgmt that referenced this pull request Jun 30, 2025
Code sync sonic-net/sonic-mgmt:202411 => 202412

```
*   86163d9 (HEAD -> code-sync-202412, origin/code-sync-202412) r12f 250620:0224 - Merge remote-tracking branch 'base/202411' into code-sync-202412
|\
| * 6c7825f (base/202411) Justin Wong 250611:2109 - Skip test_bgp_multipath_relax.py on t1-isolated (sonic-net#18558)
| * 829aa40 Chun'ang Li 250619:1618 - [CI][202411]update pr test template reference (sonic-net#19059)
| * 6f2a1bd prabhataravind 250617:1800 - Fix port mapping for new 4280 hwskus (sonic-net#18802) (sonic-net#19030)
| * 66c7667 ganglv 250617:1623 - [202411] Update timeout for gnmi subscribe API (sonic-net#19045)
| * 8f0a97b Justin Wong 250611:2146 - Extend bgp command check to allow multipath routes and "best" routes (sonic-net#18910)
| * 12b7793 StormLiangMS 250611:1110 - [XFAIL: test_crm_available.py] set xfail of test_crm_available on dualtor platform (sonic-net#18874)
| * ab5fd70 rbpittman 250615:2148 - Shutdown BGP instead of stopping some processes. (sonic-net#18267)
| * 3f35fa2 Harish Kalyanaraman 250321:0332 - Increase GCUTIMEOUT value for nokia-armhf platforms (sonic-net#17420)
| * 48bf447 xwjiang-ms 250613:1238 - [202411] Support Ubuntu 24 server in KVM (sonic-net#18929)
| * 194b907 rbpittman 250612:2057 - Dualtor AA enablement for QOS SAI (sonic-net#17765) (sonic-net#17911)
```
opcoder0 pushed a commit to opcoder0/sonic-mgmt that referenced this pull request Dec 8, 2025
* Apply diff for dualtor AA QOS SAI with list change duplicated for all qos sai topo lists.

* Add back a newline.

Signed-off-by: opcoder0 <[email protected]>
gshemesh2 pushed a commit to gshemesh2/sonic-mgmt that referenced this pull request Dec 21, 2025
* Apply diff for dualtor AA QOS SAI with list change duplicated for all qos sai topo lists.

* Add back a newline.

Signed-off-by: Guy Shemesh <[email protected]>
gshemesh2 pushed a commit to gshemesh2/sonic-mgmt that referenced this pull request Jan 26, 2026
* Apply diff for dualtor AA QOS SAI with list change duplicated for all qos sai topo lists.

* Add back a newline.

Signed-off-by: Guy Shemesh <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants