Skip to content

Code sync 202411 -> 202412#12

Merged
r12f merged 41 commits intoAzure:202412from
r12f:code-sync-202412
Jan 18, 2025
Merged

Code sync 202411 -> 202412#12
r12f merged 41 commits intoAzure:202412from
r12f:code-sync-202412

Conversation

@r12f
Copy link
Copy Markdown
Contributor

@r12f r12f commented Jan 17, 2025

Code sync 202411 -> 202412

cyw233 and others added 30 commits January 7, 2025 13:48
Description of PR
Add parallel modes file to 202411

Summary:
Fixes # (issue) Microsoft ADO 29843837

co-authorized by: [email protected]
During recent nightly runs, we observed that the Cisco 8000 supervisor had an average memory usage of 59.7% (calculated from the values 60.3, 59.9, 58.9, 59.2, 59.8, and 60.2). Since the memory threshold is set at 60%, this resulted in two failures. To ensure the stability of the tests, we propose increasing the memory threshold for the Cisco 8000 supervisor to 65%.
Description of PR
Support parallel run for more tests on Cisco 8800 chassis.

Summary:
Fixes # (issue) Microsoft ADO 29754370

Approach
What is the motivation for this PR?
We wanted to support parallel run for more tests on Cisco 8800 chassis to reduce the Nightly running time.

How did you do it?
How did you verify/test it?
I ran the test modules added in this PR in parallel and can confirm they all passed.

Any platform specific information?
Cisco 8800 chassis

Supported testbed topology if it's a new test case?
T2

co-authorized by: [email protected]
…IPv6 neighbor addresses on KVM testbeds. (#16371) (#16388)

Temporarily skipping test_arp_update_for_failed_standby_neighbor for IPv6 neighbor addresses on KVM testbeds.

Signed-off-by: Mahdi Ramezani <[email protected]>
Co-authored-by: mramezani95 <[email protected]>
… updated. (#16396) (#16397)

Signed-off-by: Mahdi Ramezani <[email protected]>
Co-authored-by: mramezani95 <[email protected]>
Signed-off-by: Mahdi Ramezani <[email protected]>
Co-authored-by: mramezani95 <[email protected]>
1. Use sonic-ubuntu-1c instead of sonic-common.
2. Fix docker run command to reuse agent.
Description of PR
Summary:
Fixes # (issue)
https://migsonic.atlassian.net/browse/MIGSMSFT-855

Approach
What is the motivation for this PR?
Fix failure of case qos/test_qos_sai.py::TestQosSai::testQosSaiPgSharedWatermark[multi_dut_shortlink_to_longlink-wm_pg_shared_lossless]

How did you do it?
Update pkts_num_trig_pfc of wm_pg_shared_lossless in yaml file.
pkts_num_trig_pfc in this case means the maximum buffers in queue, which is 14812.

Source queue counters for Ethernet0 tc 3:
   SQ buffer counter 14812
   SQ congestion state Xoff
   SQ headroom counter in buffers 1156 (trigger ingress drop)
How did you verify/test it?
Verified it in T2 testbed.

------------ generated xml file: /tmp/qos/test_qos_sai.py::TestQosSai::testQosSaiPgSharedWatermark_2025-01-08-21-54-09.xml ------------
INFO:root:Can not get Allure report URL. Please check logs
------------------------------------------------------- live log sessionfinish --------------------------------------------------------
22:19:57 __init__.pytest_terminal_summary         L0067 INFO   | Can not get Allure report URL. Please check logs
======================================================= short test summary info =======================================================
PASSED qos/test_qos_sai.py::TestQosSai::testQosSaiPgSharedWatermark[multi_dut_shortlink_to_longlink-wm_pg_shared_lossless]
SKIPPED [1] qos/test_qos_sai.py:1619: The lossy test is not valid for multiAsic configuration.
======================================== 1 passed, 1 skipped, 1 warning in 1546.54s (0:25:46) =========================================
sonic@sonic-ucs-m6-09:/data/tests$ 


Signed-off-by: Zhixin Zhu <[email protected]>
Description of PR
Summary:
In snappi tests, we have a number of tests with UDP streams. We need them to be counted sequentially instead of random, or any other order. This PR addresses some of the mistakes in the udp port count logic.

Approach
What is the motivation for this PR?
The tests fail in cisco-8000 due to backplane bandwidth limitations. The backplane bandwidth limit will be avoided if the number of streams we use in the traffic is large enough to cause load balancing across the backplane, so that many backplane ports are used instead of just one. This PR addresses some of the script locations where the udp port numbers are not correctly calculated.

How did you do it?
How did you verify/test it?
Ran it on my TB, with a -e --count=2 option, for repeating the tests 2 times:

=========================================================================================================== PASSES ===========================================================================================================
___________________________________________________________________________________ test_ecn_marking_port_toggle[multidut_port_info0-1-2] ____________________________________________________________________________________
___________________________________________________________________________________ test_ecn_marking_port_toggle[multidut_port_info0-2-2] ____________________________________________________________________________________
___________________________________________________________________________________ test_ecn_marking_port_toggle[multidut_port_info1-1-2] ____________________________________________________________________________________
___________________________________________________________________________________ test_ecn_marking_port_toggle[multidut_port_info1-2-2] ____________________________________________________________________________________
___________________________________________________________________________________ test_ecn_marking_port_toggle[multidut_port_info2-1-2] ____________________________________________________________________________________
___________________________________________________________________________________ test_ecn_marking_port_toggle[multidut_port_info2-2-2] ____________________________________________________________________________________
___________________________________________________________________________________ test_ecn_marking_port_toggle[multidut_port_info3-1-2] ____________________________________________________________________________________
___________________________________________________________________________________ test_ecn_marking_port_toggle[multidut_port_info3-2-2] ____________________________________________________________________________________
_________________________________________________________________________ test_ecn_marking_lossless_prio[multidut_port_info1-test_flow_percent0-1-2] _________________________________________________________________________
_________________________________________________________________________ test_ecn_marking_lossless_prio[multidut_port_info1-test_flow_percent0-2-2] _________________________________________________________________________
_________________________________________________________________________ test_ecn_marking_lossless_prio[multidut_port_info1-test_flow_percent1-1-2] _________________________________________________________________________
_________________________________________________________________________ test_ecn_marking_lossless_prio[multidut_port_info1-test_flow_percent1-2-2] _________________________________________________________________________
_________________________________________________________________________ test_ecn_marking_lossless_prio[multidut_port_info1-test_flow_percent2-1-2] _________________________________________________________________________
_________________________________________________________________________ test_ecn_marking_lossless_prio[multidut_port_info1-test_flow_percent2-2-2] _________________________________________________________________________
_________________________________________________________________________ test_ecn_marking_lossless_prio[multidut_port_info1-test_flow_percent3-1-2] _________________________________________________________________________
_________________________________________________________________________ test_ecn_marking_lossless_prio[multidut_port_info1-test_flow_percent3-2-2] _________________________________________________________________________
_________________________________________________________________________ test_ecn_marking_lossless_prio[multidut_port_info1-test_flow_percent4-1-2] _________________________________________________________________________
_________________________________________________________________________ test_ecn_marking_lossless_prio[multidut_port_info1-test_flow_percent4-2-2] _________________________________________________________________________
_________________________________________________________________________ test_ecn_marking_lossless_prio[multidut_port_info1-test_flow_percent5-1-2] _________________________________________________________________________
_________________________________________________________________________ test_ecn_marking_lossless_prio[multidut_port_info1-test_flow_percent5-2-2] _________________________________________________________________________
_________________________________________________________________________ test_ecn_marking_lossless_prio[multidut_port_info1-test_flow_percent6-1-2] _________________________________________________________________________
_________________________________________________________________________ test_ecn_marking_lossless_prio[multidut_port_info1-test_flow_percent6-2-2] _________________________________________________________________________
_________________________________________________________________________ test_ecn_marking_lossless_prio[multidut_port_info1-test_flow_percent7-1-2] _________________________________________________________________________
_________________________________________________________________________ test_ecn_marking_lossless_prio[multidut_port_info1-test_flow_percent7-2-2] _________________________________________________________________________
---------------------------------------------------------------- generated xml file: /run_logs/ixia/ecn_repeat/2025-01-08-22-40-28/tr_2025-01-08-22-40-28.xml ----------------------------------------------------------------
INFO:root:Can not get Allure report URL. Please check logs
================================================================================================== short test summary info ===================================================================================================
PASSED snappi_tests/multidut/ecn/test_multidut_ecn_marking_with_snappi.py::test_ecn_marking_port_toggle[multidut_port_info0-1-2]
PASSED snappi_tests/multidut/ecn/test_multidut_ecn_marking_with_snappi.py::test_ecn_marking_port_toggle[multidut_port_info0-2-2]
PASSED snappi_tests/multidut/ecn/test_multidut_ecn_marking_with_snappi.py::test_ecn_marking_port_toggle[multidut_port_info1-1-2]
PASSED snappi_tests/multidut/ecn/test_multidut_ecn_marking_with_snappi.py::test_ecn_marking_port_toggle[multidut_port_info1-2-2]
PASSED snappi_tests/multidut/ecn/test_multidut_ecn_marking_with_snappi.py::test_ecn_marking_port_toggle[multidut_port_info2-1-2]
PASSED snappi_tests/multidut/ecn/test_multidut_ecn_marking_with_snappi.py::test_ecn_marking_port_toggle[multidut_port_info2-2-2]
PASSED snappi_tests/multidut/ecn/test_multidut_ecn_marking_with_snappi.py::test_ecn_marking_port_toggle[multidut_port_info3-1-2]
PASSED snappi_tests/multidut/ecn/test_multidut_ecn_marking_with_snappi.py::test_ecn_marking_port_toggle[multidut_port_info3-2-2]
PASSED snappi_tests/multidut/ecn/test_multidut_ecn_marking_with_snappi.py::test_ecn_marking_lossless_prio[multidut_port_info1-test_flow_percent0-1-2]
PASSED snappi_tests/multidut/ecn/test_multidut_ecn_marking_with_snappi.py::test_ecn_marking_lossless_prio[multidut_port_info1-test_flow_percent0-2-2]
PASSED snappi_tests/multidut/ecn/test_multidut_ecn_marking_with_snappi.py::test_ecn_marking_lossless_prio[multidut_port_info1-test_flow_percent1-1-2]
PASSED snappi_tests/multidut/ecn/test_multidut_ecn_marking_with_snappi.py::test_ecn_marking_lossless_prio[multidut_port_info1-test_flow_percent1-2-2]
PASSED snappi_tests/multidut/ecn/test_multidut_ecn_marking_with_snappi.py::test_ecn_marking_lossless_prio[multidut_port_info1-test_flow_percent2-1-2]
PASSED snappi_tests/multidut/ecn/test_multidut_ecn_marking_with_snappi.py::test_ecn_marking_lossless_prio[multidut_port_info1-test_flow_percent2-2-2]
PASSED snappi_tests/multidut/ecn/test_multidut_ecn_marking_with_snappi.py::test_ecn_marking_lossless_prio[multidut_port_info1-test_flow_percent3-1-2]
PASSED snappi_tests/multidut/ecn/test_multidut_ecn_marking_with_snappi.py::test_ecn_marking_lossless_prio[multidut_port_info1-test_flow_percent3-2-2]
PASSED snappi_tests/multidut/ecn/test_multidut_ecn_marking_with_snappi.py::test_ecn_marking_lossless_prio[multidut_port_info1-test_flow_percent4-1-2]
PASSED snappi_tests/multidut/ecn/test_multidut_ecn_marking_with_snappi.py::test_ecn_marking_lossless_prio[multidut_port_info1-test_flow_percent4-2-2]
PASSED snappi_tests/multidut/ecn/test_multidut_ecn_marking_with_snappi.py::test_ecn_marking_lossless_prio[multidut_port_info1-test_flow_percent5-1-2]
PASSED snappi_tests/multidut/ecn/test_multidut_ecn_marking_with_snappi.py::test_ecn_marking_lossless_prio[multidut_port_info1-test_flow_percent5-2-2]
PASSED snappi_tests/multidut/ecn/test_multidut_ecn_marking_with_snappi.py::test_ecn_marking_lossless_prio[multidut_port_info1-test_flow_percent6-1-2]
PASSED snappi_tests/multidut/ecn/test_multidut_ecn_marking_with_snappi.py::test_ecn_marking_lossless_prio[multidut_port_info1-test_flow_percent6-2-2]
PASSED snappi_tests/multidut/ecn/test_multidut_ecn_marking_with_snappi.py::test_ecn_marking_lossless_prio[multidut_port_info1-test_flow_percent7-1-2]
PASSED snappi_tests/multidut/ecn/test_multidut_ecn_marking_with_snappi.py::test_ecn_marking_lossless_prio[multidut_port_info1-test_flow_percent7-2-2]
SKIPPED [16] common/helpers/assertions.py:16: ECN tests are not supported on Cisco switches yet.
SKIPPED [48] common/helpers/assertions.py:16: Invalid combination of duthosts or ASICs in snappi_ports
================================================================================= 24 passed, 64 skipped, 28 warnings in 12774.71s (3:32:54) ==================================================================================
sonic@snappi-sonic-mgmt-vanilla-202405-t2:/data/tests$ 

co-authorized by: [email protected]
Description of PR
Optimize bgp/test_reliable_tsa.py with multithreading to reduce the running time.

Summary:
Fixes # (issue)

Approach
What is the motivation for this PR?
The bgp/test_reliable_tsa.py takes a very long time to finish on T2 chassis (5.5h ~ 6h), so we wanted to optimize it using multithreading to reduce the running time. After the optimization, the running time is reduced to ~3.5h.

How did you do it?
How did you verify/test it?
I ran the updated code and can confirm it's working as expected. Elastictest link with flaky test case re-run link

co-authorized by: [email protected]
Summary:
Due to known issue sonic-net/sonic-buildimage#21319
PR #16301 doesn't ignore all the pmon iSmart binary format ERR log. Update the regex to ignore all of below ERR logs:

/var/log/syslog.3.gz:2025 Jan  9 20:42:05.687373 7215-5 ERR pmon#SsdUtil[55]: [Errno 8] Exec format error: 'iSmart'
/var/log/syslog.3.gz:2025 Jan  9 20:42:07.744397 7215-5 ERR pmon#StorageCommon[55]: [Errno 8] Exec format error: 'iSmart'
/var/log/syslog.5.gz:2025 Jan  9 19:37:33.855554 7215-5 ERR pmon#SsdUtil[49]: [Errno 8] Exec format error: 'iSmart'
/var/log/syslog.5.gz:2025 Jan  9 19:37:35.261567 7215-5 ERR pmon#StorageCommon[49]: [Errno 8] Exec format error: 'iSmart'

What is the motivation for this PR?
PR #16301 doesn't ignore all the pmon iSmart binary format ERR log. Update the regex to ignore all of below ERR logs:

How did you do it?
Update the regex.

How did you verify/test it?
Verified by running testcase: syslog/test_syslog.py::test_syslog
…(#16216)

What is the motivation for this PR?
There are global unique ipv6 address and link local ipv6 address configured in Vlan, not need to flush all of them to trigger fail to bind issue.

How did you do it?
Only delete GUA in Vlan

How did you verify/test it?
Run test
- What is the motivation for this PR?
The original way of setting bgp down would be too slow especially for CPUs which are not that strong.
And the bgp routes would be delay handled after several minutes, it would affect the qos buffer test result

- How did you do it?
Use bgp shutdown/start to control bgp

- How did you verify/test it?
Run it in internal regression
What is the motivation for this PR?
On Nokia-7215 platform, we observed below flaky failure:

Timeout: No Response from fe80::xxxx%eth0.
The root cause is that Nokia-7215 has low performance and the snmpget command is issued before snmpagent fully start.

How did you do it?
To resolve this issue, I added a wait_until to ensure snmpagent already listening on the link-local IP address.

How did you verify/test it?
Verified on Nokia-7215 Mx testbed.
…. (#15718)

Description of PR
Summary:
Fixes the flakiness of DWRR testcase. The PR adds a new fixture that slows down the scheduler without changing the underlying algorithm. This allows the dWRR test to pass consitently.

co-authorized by: [email protected]
What is the motivation for this PR?
Support 202405 and 202411 image.

How did you do it?
replace soft link with hard copy. and modified it to suit 202405 requirement.

How did you verify/test it?
Tested it locally.
…e missing (#16357)

What is the motivation for this PR?
Sometimes exabgp in ptf would be in incorrect status by stress testing, hence add restarting exabgp before re-announce routes in sanity check.

How did you do it?
Restart exabgp before re-announce routes
Add try catch to handle failed to re-announce issue
How did you verify/test it?
Run test with sanity check
Description of PR
Optimize the qos/test_qos_sai.py test to reduce the running time.

Summary:
Fixes # (issue) Microsoft ADO 30056122

Type of change
 Bug fix
 Testbed and Framework(new/improvement)
 New Test case
 Skipped for non-supported platforms
 Add ownership here(Microsft required only)
 Test case improvement

Approach
What is the motivation for this PR?
The running time of the qos/test_qos_sai.py test is too long (~9h) on T2 chassis so we wanted to reduce the running time. With this implementation, the running time will be reduced to (~7.5h)

How did you do it?
How did you verify/test it?
I ran the updated code and can confirm it's working well on T2 chassis: Elastictest link. The 2 DWRR failures are expected, which will be fixed after #16199

I also ran a T1 regression test to confirm: Elastictest link

co-authorized by: [email protected]
* [test_genric_hash.py]: cisco platform checks and some check_balance fix

* line spaces and artifact fixes

* indent fix

* check_balance fixes revert commit
Description of PR
Summary: Removing unused fixtures: get_multidut_tgen_peer_port_set and
get_multidut_snappi_ports from snappi_fixtures.py
Fixes # (issue)
#16015

Type of change
 Bug fix
 Testbed and Framework(new/improvement)
 Test case(new/improvement)
Back port request
 202012
 202205
 202305
 202311
 202405
Approach
What is the motivation for this PR?
How did you do it?
deleted the code

co-authorized by: [email protected]
Description of PR
Disable parallel run for tacacs/test_ro_disk.py test.

Summary:
Fixes # (issue) Microsoft ADO 30848681

Approach
What is the motivation for this PR?
The tacacs/test_ro_disk.py test is using a special reboot process on the DUT and if we run this test with parallel run enabled, the LCs might not be ready after Supervisor's reboot. Therefore, we need to disable parallel run for this test.

co-authorized by: [email protected]
…cy] handle exception during grep log (#16491)

What is the motivation for this PR?
https://github.com/sonic-net/sonic-mgmt/pull/16446/files
PR merge conflict

How did you do it?
manually merge
…5) (#16492)

What is the motivation for this PR?
Script log Analyzer timestamp is 2025-01-07 13:26:55.978427 while the real detect timestamp in syslog is Jan 7 13:27:18.132683.
Need delay more time to grep the syslog

2025 Jan  7 13:26:55.996909 bjw2-can-7260-2 DEBUG extract_log combine_logs from file /var/log/syslog create time 2025-01-07 13:26:55.978427, size 4236007

syslog:23447:2025 Jan  7 13:27:18.132683 bjw2-can-7260-2 NOTICE swss#orchagent: :- report_pfc_storm: PFC Watchdog detected PFC storm on port Ethernet24, queue index 3, queue id 0x1500000000014d and port id 0x1000000000001

How did you do it?
Increase the delay time and add some debug log in script

How did you verify/test it?
Use elastic to run the case on 7060 and 2700, passed.
https://elastictest.org/scheduler/testplan/67808a9a572c09d3aa482375
https://elastictest.org/scheduler/testplan/67808b40745d47c6b42e53f4
https://elastictest.org/scheduler/testplan/67808bd3c66ca5ae0571bd57
Description of PR
67989d1312b1778681d6575b12b66aa42fdf05a7

Please review the commit-ID given above.

Original PR13655 was raised to add the new testcases. However, manage the changes efficiently, it was decided to split the original into three PRs for ease in review process.

This PR tracks are the infrastructure related changes required for the execution of the testcases.

Note - PR #13848 needs to be merged in first before this PR is merged.

Summary:
Fixes # (issue)
#13655
#13215

Type of change
 Bug fix
 Testbed and Framework(new/improvement)
 Test case(new/improvement)
Back port request
 202012
 202205
 202305
 202311
 202405
Approach
What is the motivation for this PR?
This PR tracks only the infrastructure related changes needed for addition of the new testcases.

How did you do it?
Important changes are listed below:

Change directory - tests/common/snappi_tests/

Additional member variable 'base_flow_config_list' is added as list to class 'SnappiTestParams' in snappi_test_params.py file to accommodate for multiple base-flow-configs.

Existing functions - generate_test_flows, generate_background_flows, generate_pause_flows are modified to check if the base_flow_config_list exists. If it does, then base_flow_config is assigned snappi_extra_params.base_flow_config_list[flow_index]. Else existing code is used.

Existing function - 'verify_egress_queue_frame_count' is modified to check if base_flow_config_list exists. If yes, base_flow_config_list[0] is assigned to dut_port_config, else existing code is used.

The testcases calls 'run_traffic_and_collect_stats' function in traffic_generation file to run and gather IXIA+DUT statistics. Statistics are summarized in test_stats dictionary in return.

A function has been created to access the IXIA rest_py framework. This will in turn can be used to integrate MACSEC related changes in future. Currently, rest_py is used to generate the imix custom profile if the flag is set in the test_def dictionary (defined and passed by the test).

Depending upon the test_duration and test_interval defined in test_def of the test, the test-case will be executed.
At every test_interval, the statistics from IXIA and DUT are pulled in form of dictionary, where date-timestamp is primary key.

Important parameters from IXIA like Tx and Rx throughput, number of packets, latency etc are captured with each interval.

From DUT side, the Rx and Tx packets, loss packets (combination of failures, drops and errors), PFC count, queue counts are captured. Additional functions like - get_pfc_count, get_ingerface_stats etc are defined in the common/snappi_test helper files to assist with the same. The support for the above is added as part of the different pull-request.

At the end of the test, a CSV is created as raw data for the test-case execution. Summary of the test-case is generated in form of text file with same name. The run_sys_test also returns a dictionary test_stats with all the important parameters to be used for the verification of the test.
How did you verify/test it?
Test was executed on the local clone.

Any platform specific information?
These testcases are specifically meant for Broadcom-DNX multi-ASIC based platforms.

co-authorized by: [email protected]
… (#16460)

What is the motivation for this PR?
In gcu dhcp_relay test, it would add 2 vlans with 4 dhcp servers. Previously all 8 dhcp servers are added by cli separately, it would restart dhcp_relay container 8 times, which would cause dhcp_relay container is not running in gcu test in some low performance devices.

How did you do it?
Use sonic-db-cli to add dhcp servers, then manually resetart dhcp_relay container once.

How did you verify/test it?
Run tests
… configuration of the interface. (#16360)

* Configure the fec_mode that interface is already configured for

* Fix a backslash
Description of PR
Summary:
Fixes the flakiness of pfc_gen in pfcwd scripts for cisco-8000. We use a new debug CLI script to force the DUT to wait longer in case of a miss in pfc packets from the fanout due to pfc_gen script. So even if the pfc_gen/fanout misses a couple of pfc frames to DUT, the dut would still not send out data packets.

Approach
What is the motivation for this PR?
Flakiness of pfc-gen. Particularly with 400G links.

How did you do it?
We have added a new dshell based script that will force the DUT to wait before transmitting data in case of a miss in pfc pause frames.

How did you verify/test it?
Ran on our duts, with 100G and 400G.

Any platform specific information?
The new fix specific only to cisco-8000.

co-authorized by: [email protected]
…e SNMP request timeout to 20 (#16290)

Description of PR
Summary:
Fixes #30112399

Approach
What is the motivation for this PR?
Fix test_snmp_cpu failures on Cisco chassis.

How did you do it?
Incorporate timeout setting in all SNMP commands
Increase chassis SNMP request timeout to 20s
How did you verify/test it?
Run on Cisco chassis and it stably pass.

co-authorized by: [email protected]
yejianquan and others added 11 commits January 15, 2025 17:30
Description of PR
Cisco chassis used to be unstable months ago, so updated the wait time of sshd ready to 600s, and service ready to 900s.
After the image became stable, we can reduce the time to:
sshd ready: 420s
service ready: 600s

Will keep monitoring and see whether we can reduce the wait time to align with t0/t1 in the future.

Approach
What is the motivation for this PR?
Cisco chassis used to be unstable months ago, so updated the wait time of sshd ready to 600s, and service ready to 900s.
After the image became stable, we can reduce the time to:
sshd ready: 420s
service ready: 600s

Will keep monitoring and see whether we can reduce the wait time to align with t0/t1 in the future.

How did you do it?
How did you verify/test it?
In the nightly test, the configuration works well.

Any platform specific information?
Supported testbed topology if it's a new test case?
Documentation

co-authorized by: [email protected]
Description of PR
This pull-request has changes specifically for the following commit-IDs:
a82b489
180af4d
3da40bc

This PR specifically handles the testcases pertaining to the new PFC-ECN testplan added.

Summary:
Fixes # (issue)
#13655
#13215

Approach
What is the motivation for this PR?
Three test-scripts have been added to specifically test: non-congestion scenarios (line-rate tests), congestion testcases via over-subscription and PFCWD (drop and forward mode).

How did you do it?
Test case has dictionary called test_def which defines various testcases parameters necessary to run the testcase. An example of this, is packet-size (default is IMIX but can be changed to 1024B), test-duration, stats capture, file log at the end of the test.

Similarly, there is test_check which passes test-case uses for verification info. Lossless and lossy priorities are selected from the available list.

Most important change comes in form of port_map definition. Port map is a list with first two parameters defining the egress port count and egress speed. Last two parameters define the ingress port count and ingress speed. Example - [1, 100, 2 , 100] defines single egress port of speed 100Gbps and 2 ingress ports of 100Gbps.
This definition is important because, multi-speed ingress and egress ports needs to be supported. Example - [1, 100, 1, 400] will define single ingress and egress of 400Gbps and 100Gbps respectively.

A new function is provided to capture snappi_ports. This will pick the line-card choice from variable.py and choose the ports as defined in port_map. The port_map is used to filter out the available ports for the required port-speed.

At the end of the test, a CSV is created as raw data for the test-case execution. Summary of the test-case is generated in form of text file with same name. Additional checks are present in multi_dut helper file, depending upon the type of the test. The test passes the verification parameters in test_check in dictionary format.

There is important change in variables.py file. The line_card_choice is sent as dictionary from variables.py, which then is parameterized in the test. Depending upon the type of line_card_choice, the tests are ran for that specific line_card choice and set of ports.

Testcases:
a. tests/snappi_tests/pfc/test_pfc_no_congestion_throughput.py:
-- This testcase has testcases to test line-rate speeds with single ingress and egress. Traffic combination around lossless and lossy priorities have been used. Expectations is that no PFCs will be generated, line-rate will be achieved, no drops will be seen on both DUT and TGEN.
b. tests/snappi_tests/pfc/test_pfc_port_congestion.py:
-- This testcase has testcases to test behavior with 2 ingress ports and 1 egress port on the DUT. Traffic combination around lossless and lossy priorities.
c. tests/snappi_tests/pfcwd/test_pfcwd_actions.py:
-- Testcases cover PFCWD action - DROP and FORWARD mode. DROP and FORWARD mode is also tested for two ingresses and single egress with pause frames on egress.

How did you verify/test it?
Test case was executed on local clone.

Results of the verification:

Test cases executed for 100Gbps interfaces.
Two combinations - single-line-card-multi-asic and multiple-dut
Non-congestion:
19:06:48 test_sys_non_congestion.test_multiple_pr L0095 INFO   | Running test for testbed subtype: single-dut-multi-asic
19:15:21 traffic_generation.run_sys_traffic       L1190 INFO   | Writing statistics to file : /tmp/Single_Ingress_Egress_diff_dist_100Gbps_single-dut-multi-asic_1024B-2024-10-08-19-15.csv
PASSED                                                                                                                                                                                                                                        [ 16%]
snappi_tests/multidut/systest/test_sys_non_congestion.py::test_multiple_prio_diff_dist[multidut_port_info1-port_map0] 
19:15:26 test_sys_non_congestion.test_multiple_pr L0095 INFO   | Running test for testbed subtype: single-dut-single-asic
19:23:37 traffic_generation.run_sys_traffic       L1190 INFO   | Writing statistics to file : /tmp/Single_Ingress_Egress_diff_dist_100Gbps_single-dut-single-asic_1024B-2024-10-08-19-23.csv
PASSED                                                                                                                                                                                                                                        [ 33%]
snappi_tests/multidut/systest/test_sys_non_congestion.py::test_multiple_prio_uni_dist[multidut_port_info0-port_map0] 
19:23:42 test_sys_non_congestion.test_multiple_pr L0235 INFO   | Running test for testbed subtype: single-dut-multi-asic
19:31:57 traffic_generation.run_sys_traffic       L1190 INFO   | Writing statistics to file : /tmp/Single_Ingress_Egress_uni_dist_100Gbps_single-dut-multi-asic_1024B-2024-10-08-19-31.csv
PASSED                                                                                                                                                                                                                                        [ 50%]
snappi_tests/multidut/systest/test_sys_non_congestion.py::test_multiple_prio_uni_dist[multidut_port_info1-port_map0] 
19:32:02 test_sys_non_congestion.test_multiple_pr L0235 INFO   | Running test for testbed subtype: single-dut-single-asic
19:40:12 traffic_generation.run_sys_traffic       L1190 INFO   | Writing statistics to file : /tmp/Single_Ingress_Egress_uni_dist_100Gbps_single-dut-single-asic_1024B-2024-10-08-19-40.csv
PASSED                                                                                                                                                                                                                                        [ 66%]
snappi_tests/multidut/systest/test_sys_non_congestion.py::test_single_lossless_prio[multidut_port_info0-port_map0] 
19:40:18 test_sys_non_congestion.test_single_loss L0375 INFO   | Running test for testbed subtype: single-dut-multi-asic
19:48:26 traffic_generation.run_sys_traffic       L1190 INFO   | Writing statistics to file : /tmp/Single_Ingress_Egress_1Prio_linerate_100Gbps_single-dut-multi-asic_1024B-2024-10-08-19-48.csv
PASSED                                                                                                                                                                                                                                        [ 83%]
snappi_tests/multidut/systest/test_sys_non_congestion.py::test_single_lossless_prio[multidut_port_info1-port_map0] 
19:48:31 test_sys_non_congestion.test_single_loss L0375 INFO   | Running test for testbed subtype: single-dut-single-asic
19:56:38 traffic_generation.run_sys_traffic       L1190 INFO   | Writing statistics to file : /tmp/Single_Ingress_Egress_1Prio_linerate_100Gbps_single-dut-single-asic_1024B-2024-10-08-19-56.csv
PASSED                                                                                                                                                                                                                                        [100%]
 
Over-subscription:

20:13:40 test_sys_over_subscription.test_multiple L0093 INFO   | Running test for testbed subtype: single-dut-multi-asic
20:23:07 traffic_generation.run_sys_traffic       L1190 INFO   | Writing statistics to file : /tmp/Two_Ingress_Single_Egress_diff_dist_100Gbps_single-dut-multi-asic_1024B-2024-10-08-20-23.csv
PASSED                                                                                                                                                                                                                                        [ 12%]
snappi_tests/multidut/systest/test_sys_over_subscription.py::test_multiple_prio_diff_dist[multidut_port_info1-port_map0] 
20:23:16 test_sys_over_subscription.test_multiple L0093 INFO   | Running test for testbed subtype: single-dut-single-asic
20:32:20 traffic_generation.run_sys_traffic       L1190 INFO   | Writing statistics to file : /tmp/Two_Ingress_Single_Egress_diff_dist_100Gbps_single-dut-single-asic_1024B-2024-10-08-20-32.csv
PASSED                                                                                                                                                                                                                                        [ 25%]
snappi_tests/multidut/systest/test_sys_over_subscription.py::test_multiple_prio_uni_dist[multidut_port_info0-port_map0] 
20:32:29 test_sys_over_subscription.test_multiple L0227 INFO   | Running test for testbed subtype: single-dut-multi-asic
20:41:39 traffic_generation.run_sys_traffic       L1190 INFO   | Writing statistics to file : /tmp/Two_Ingress_Single_Egress_uni_dist_full100Gbps_single-dut-multi-asic_1024B-2024-10-08-20-41.csv
PASSED                                                                                                                                                                                                                                        [ 37%]
snappi_tests/multidut/systest/test_sys_over_subscription.py::test_multiple_prio_uni_dist[multidut_port_info1-port_map0] 
20:41:48 test_sys_over_subscription.test_multiple L0227 INFO   | Running test for testbed subtype: single-dut-single-asic
20:50:53 traffic_generation.run_sys_traffic       L1190 INFO   | Writing statistics to file : /tmp/Two_Ingress_Single_Egress_uni_dist_full100Gbps_single-dut-single-asic_1024B-2024-10-08-20-50.csv
PASSED                                                                                                                                                                                                                                        [ 50%]
snappi_tests/multidut/systest/test_sys_over_subscription.py::test_multiple_prio_uni_dist_full[multidut_port_info0-port_map0] 
20:51:02 test_sys_over_subscription.test_multiple L0364 INFO   | Running test for testbed subtype: single-dut-multi-asic
21:00:11 traffic_generation.run_sys_traffic       L1190 INFO   | Writing statistics to file : /tmp/Two_Ingress_Single_Egress_uni_dist_full100Gbps_single-dut-multi-asic_1024B-2024-10-08-21-00.csv
PASSED                                                                                                                                                                                                                                        [ 62%]
snappi_tests/multidut/systest/test_sys_over_subscription.py::test_multiple_prio_uni_dist_full[multidut_port_info1-port_map0] 
21:00:20 test_sys_over_subscription.test_multiple L0364 INFO   | Running test for testbed subtype: single-dut-single-asic
21:09:25 traffic_generation.run_sys_traffic       L1190 INFO   | Writing statistics to file : /tmp/Two_Ingress_Single_Egress_uni_dist_full100Gbps_single-dut-single-asic_1024B-2024-10-08-21-09.csv
PASSED                                                                                                                                                                                                                                        [ 75%]
snappi_tests/multidut/systest/test_sys_over_subscription.py::test_multiple_prio_non_cngtn[multidut_port_info0-port_map0] 
21:09:34 test_sys_over_subscription.test_multiple L0502 INFO   | Running test for testbed subtype: single-dut-multi-asic
21:18:38 traffic_generation.run_sys_traffic       L1190 INFO   | Writing statistics to file : /tmp/Two_Ingress_Single_Egress_non_cngstn_100Gbps_single-dut-multi-asic_1024B-2024-10-08-21-18.csv
PASSED                                                                                                                                                                                                                                        [ 87%]
snappi_tests/multidut/systest/test_sys_over_subscription.py::test_multiple_prio_non_cngtn[multidut_port_info1-port_map0] 
21:18:47 test_sys_over_subscription.test_multiple L0502 INFO   | Running test for testbed subtype: single-dut-single-asic
21:27:45 traffic_generation.run_sys_traffic       L1190 INFO   | Writing statistics to file : /tmp/Two_Ingress_Single_Egress_non_cngstn_100Gbps_single-dut-single-asic_1024B-2024-10-08-21-27.csv
PASSED                                                                                                                                                                                                                                        [100%]
PFCWD:

01:08:43 traffic_generation.run_sys_traffic       L1190 INFO   | Writing statistics to file : /tmp/One_Ingress_Egress_pfcwd_drop_90_10_dist100Gbps_single-dut-multi-asic_1024B-2024-10-09-01-08.csv
PASSED                                                                                                                                                                                                                                        [ 10%]
01:19:33 traffic_generation.run_sys_traffic       L1190 INFO   | Writing statistics to file : /tmp/One_Ingress_Egress_pfcwd_drop_90_10_dist100Gbps_single-dut-single-asic_1024B-2024-10-09-01-19.csv
PASSED                                                                                                                                                                                                                                        [ 20%]
01:30:32 traffic_generation.run_sys_traffic       L1190 INFO   | Writing statistics to file : /tmp/One_Ingress_Egress_pfcwd_frwd_90_10_dist100Gbps_single-dut-multi-asic_1024B-2024-10-09-01-30.csv
PASSED                                                                                                                                                                                                                                        [ 30%]
01:41:25 traffic_generation.run_sys_traffic       L1190 INFO   | Writing statistics to file : /tmp/One_Ingress_Egress_pfcwd_frwd_90_10_dist100Gbps_single-dut-single-asic_1024B-2024-10-09-01-41.csv
PASSED                                                                                                                                                                                                                                        [ 40%]
01:53:08 traffic_generation.run_sys_traffic       L1190 INFO   | Writing statistics to file : /tmp/Two_Ingress_Single_Egress_pfcwd_drop_40_9_dist100Gbps_single-dut-multi-asic_1024B-2024-10-09-01-53.csv
PASSED                                                                                                                                                                                                                                        [ 50%]
02:04:49 traffic_generation.run_sys_traffic       L1190 INFO   | Writing statistics to file : /tmp/Two_Ingress_Single_Egress_pfcwd_drop_40_9_dist100Gbps_single-dut-single-asic_1024B-2024-10-09-02-04.csv
PASSED                                                                                                                                                                                                                                        [ 60%]
02:16:26 traffic_generation.run_sys_traffic       L1190 INFO   | Writing statistics to file : /tmp/Two_Ingress_Single_Egress_pfcwd_frwd_40_9_dist100Gbps_single-dut-multi-asic_1024B-2024-10-09-02-16.csv
PASSED                                                                                                                                                                                                                                        [ 70%]
02:27:53 traffic_generation.run_sys_traffic       L1190 INFO   | Writing statistics to file : /tmp/Two_Ingress_Single_Egress_pfcwd_frwd_40_9_dist100Gbps_single-dut-single-asic_1024B-2024-10-09-02-27.csv
PASSED                                                                                                                                                                                                                                        [ 80%]
02:38:45 traffic_generation.run_sys_traffic       L1190 INFO   | Writing statistics to file : /tmp/Single_Ingress_Single_Egress_pause_cngstn_100Gbps_single-dut-multi-asic_1024B-2024-10-09-02-38.csv
PASSED                                                                                                                                                                                                                                        [ 90%]
02:49:22 traffic_generation.run_sys_traffic       L1190 INFO   | Writing statistics to file : /tmp/Single_Ingress_Single_Egress_pause_cngstn_100Gbps_single-dut-single-asic_1024B-2024-10-09-02-49.csv
PASSED                                                                                                                                                                                                                                        [100%]
Any platform specific information?
The testcases are specifically meant for Broadcom DNX Multi-ASIC platform DUT.

co-authorized by: [email protected]
…clear_counters function (#16272)

Description of PR
The clear_counter function in tests/common/snappi_tests/common_helper.py used dut.shell to clear the sonic counters.

However, the dut.shell inherently uses sudo, thus sonic clear counters in non-sudo mode were not getting cleared.

Replacing dut.shell with dut.command to ensure that sonic-clear counters works in both SUDO and non-SUDO mode.

Summary:
Fixes #16270

Approach
What is the motivation for this PR?
Replacing dut.shell with dut.command to clear the counters in both SUDO and non-SUDO mode.

How did you do it?
Replaced dut.shell with dut.command.

How did you verify/test it?
Local clone.

AzDevOps@68684a43ec9e:/data/tests$ date;python3 -m pytest --inventory ../ansible/ixia-sonic --host-pattern ixre-egl-board71,ixre-egl-board72 --testbed ixre-chassis17-t2 --testbed_file ../ansible/testbed.csv --log-cli-level info --log-file-level info --kube_master unset --showlocals -ra --show-capture stdout --junit-xml=/tmp/test.xml --skip_sanity --log-file=/tmp/test.log  --topology multidut-tgen --cache-clear --disable_loganalyzer snappi_tests/pfc/test_pfc_no_congestion_throughput.py -k test_multiple_prio_diff_dist
snappi_tests/pfc/test_pfc_no_congestion_throughput.py::test_multiple_prio_diff_dist[multidut_port_info0-port_map0] 
19:31:48 test_pfc_no_congestion_throughput.test_m L0096 INFO   | Running test for testbed subtype: multi-dut-single-asic
19:31:51 snappi_fixtures.__intf_config_multidut   L0933 INFO   | Configuring Dut: ixre-egl-board71 with port Ethernet0 with IP 20.10.1.2/31
19:31:52 snappi_fixtures.__intf_config_multidut   L0933 INFO   | Configuring Dut: ixre-egl-board72 with port Ethernet0 with IP 20.10.1.0/31
19:31:53 test_pfc_no_congestion_throughput.test_m L0129 INFO   | Selected lossless :[3, 4] and lossy priorities:[0, 2, 1] for the test
19:31:53 snappi_fixtures.clear_fabric_counters    L1504 INFO   | Clearing fabric counters for DUT:ixre-egl-board71
19:31:56 snappi_fixtures.clear_fabric_counters    L1504 INFO   | Clearing fabric counters for DUT:ixre-egl-board72
19:33:23 traffic_generation.run_traffic_and_colle L1026 INFO   | Clearing PFC, dropcounters, queuecounters and stats
PASSED                                                                                                                                                                                                          [ 33%]
snappi_tests/pfc/test_pfc_no_congestion_throughput.py::test_multiple_prio_diff_dist[multidut_port_info1-port_map0] 
19:40:41 test_pfc_no_congestion_throughput.test_m L0096 INFO   | Running test for testbed subtype: single-dut-multi-asic
19:40:44 snappi_fixtures.__intf_config_multidut   L0933 INFO   | Configuring Dut: ixre-egl-board71 with port Ethernet8 with IP 20.10.1.0/31
19:40:46 snappi_fixtures.__intf_config_multidut   L0933 INFO   | Configuring Dut: ixre-egl-board71 with port Ethernet152 with IP 20.10.1.2/31
19:40:47 test_pfc_no_congestion_throughput.test_m L0129 INFO   | Selected lossless :[3, 4] and lossy priorities:[6, 1, 5] for the test
19:40:47 snappi_fixtures.clear_fabric_counters    L1504 INFO   | Clearing fabric counters for DUT:ixre-egl-board71
19:40:49 snappi_fixtures.clear_fabric_counters    L1504 INFO   | Clearing fabric counters for DUT:ixre-egl-board72
19:42:00 traffic_generation.run_traffic_and_colle L1026 INFO   | Clearing PFC, dropcounters, queuecounters and stats
PASSED                                                                                                                                                                                                          [ 66%]
snappi_tests/pfc/test_pfc_no_congestion_throughput.py::test_multiple_prio_diff_dist[multidut_port_info2-port_map0] 
19:49:13 test_pfc_no_congestion_throughput.test_m L0096 INFO   | Running test for testbed subtype: single-dut-single-asic
19:49:16 snappi_fixtures.__intf_config_multidut   L0933 INFO   | Configuring Dut: ixre-egl-board72 with port Ethernet8 with IP 20.10.1.0/31
19:49:18 snappi_fixtures.__intf_config_multidut   L0933 INFO   | Configuring Dut: ixre-egl-board72 with port Ethernet16 with IP 20.10.1.2/31
19:49:19 test_pfc_no_congestion_throughput.test_m L0129 INFO   | Selected lossless :[3, 4] and lossy priorities:[2, 6, 0] for the test
19:49:19 snappi_fixtures.clear_fabric_counters    L1504 INFO   | Clearing fabric counters for DUT:ixre-egl-board71
19:49:21 snappi_fixtures.clear_fabric_counters    L1504 INFO   | Clearing fabric counters for DUT:ixre-egl-board72
19:50:32 traffic_generation.run_traffic_and_colle L1026 INFO   | Clearing PFC, dropcounters, queuecounters and stats
PASSED                                   

co-authorized by: [email protected]
Description of PR
As part of the new testcases to be added for the PFC-ECN, this PR addresses the mixed-speed ingress and egress testcases.

Approach
What is the motivation for this PR?
This script addresses the mixed speed testcases. The topology has single ingress and egress of 400Gbps and 100Gbps respectively. The congestion is caused due to three factors:

Due to oversubscription of egress.
Pause frames received on egress link of 100Gbps.
Both - over-subscription of egress and pause frames received on egress.
Idea is to test behavior of the DUT in these conditions.

How did you do it?
The port_map defines to choose single ingress of 400Gbps and egress of 100Gbps.

Following test functions are used:

test_mixed_speed_diff_dist_dist_over:
Lossless and lossy traffic are sent at 88 and 12% of the line-rate (400Gbps) respectively, causing normal congestion on DUT due to oversubscription of the egress. Lossless priority 3 and 4 are used, whereas lossy priorities are 0,1 and 2. Expectation is that lossless priorities will cause DUT to send PAUSE frames to IXIA transmitter, will be rate-limited and hence no drops. Lossy priority traffic will see no drops at all. Egress throughput is expected to be around 100Gbps. Lossy ingress and egress throughput does not change.

test_mixed_speed_uni_dist_dist_over:
Lossless and lossy traffic are sent at 20% of the line-rate (400Gbps) respectively, causing normal congestion on DUT due to oversubscription of the egress. Lossless priority 3 and 4 are used, whereas lossy priorities are 0,1 and 2. Expectation is that lossless priorities will cause DUT to send PAUSE frames to IXIA transmitter, will be rate-limited and hence no drops. Lossy priority traffic will however see partial drop. Egress throughput is expected to be around 100Gbps with lossless and lossy traffic of equal (or close to equal) ratio.

test_mixed_speed_pfcwd_enable:
Lossless and lossy traffic are sent at 20% of the line-rate (400Gbps) respectively, causing normal congestion on DUT due to oversubscription of the egress. Lossless priority 3 and 4 are used, whereas lossy priorities are 0,1 and 2. Additionally, the IXIA receiver is sending PAUSE frames to DUT for lossless priority traffic. This causes additional congestion on the DUT.
Expectation is that DUT sends PFC to the IXIA transmitter for lossless priorities in response to natural congestion on DUT due to oversubscription of egress. Lossless priority is rate-limited by IXIA in response to PFCs from DUT. Lossy priority is partially dropped on DUT.
But since the DUT is receiving PFCs on egress, the rate-limited lossless traffic is eventually dropped on egress. The IXIA receiver receives ONLY 60Gbps of lossy traffic.

test_mixed_speed_pfcwd_disable:
Lossless and lossy traffic are sent at 20% of the line-rate (400Gbps) respectively, causing normal congestion on DUT due to oversubscription of the egress. Lossless priority 3 and 4 are used, whereas lossy priorities are 0,1 and 2. Additionally, the IXIA receiver is sending PAUSE frames to DUT for lossless priority traffic. This causes additional congestion on the DUT.
Since PFCWD is disabled in this scenario, DUT forwards both lossless and lossy traffic to the IXIA receiver. DUT is sending PFCs in response to natural congestion as well as PFCs received on the egress.
The egress line-rate is 100Gbps with lossy traffic being partially dropped. Lossy and lossless traffic are in equal (or close to equal) ratio.

test_mixed_speed_no_congestion:
Purpose of the testcase is to see if the DUT does not congestion in case the ingress 400Gbps is receiving 100Gbps of traffic, which it seamlessly moves to the egress without any drops or congestion.

For all the above testcases, an additional check for the fabric counters is added. The tests will clear the fabric counters on line-cards and supervisor card (if part of the test). At the end of the test, counters are being checked again for CRC and uncorrectable FEC errors and asserts if the counts are non-zero. The checks are added as part of a different PR process and will need to be merged first. The underlying infra also needs to be added first before the testcases are added.

How did you verify/test it?
Tested on local platform.

16:05:25 traffic_generation.run_sys_traffic       L1190 INFO   | Writing statistics to file : /tmp/Single_400Gbps_Ingress_Single_100Gbps_Egress_diff_dist__multiple-dut-mixed-speed_1024B-2024-10-09-16-05.csv
PASSED                                                                                                                                                                                                                                        [ 20%]
16:13:48 traffic_generation.run_sys_traffic       L1190 INFO   | Writing statistics to file : /tmp/Single_400Gbps_Ingress_Single_100Gbps_Egress_uni_dist__multiple-dut-mixed-speed_1024B-2024-10-09-16-13.csv
PASSED                                                                                                                                                                                                                                        [ 40%]
16:22:13 traffic_generation.run_sys_traffic       L1190 INFO   | Writing statistics to file : /tmp/Single_400Gbps_Ingress_Single_100Gbps_Egress_pause_pfcwd_enable__multiple-dut-mixed-speed_1024B-2024-10-09-16-22.csv
PASSED                                                                                                                                                                                                                                        [ 60%]
16:30:33 traffic_generation.run_sys_traffic       L1190 INFO   | Writing statistics to file : /tmp/Single_400Gbps_Ingress_Single_100Gbps_Egress_pause_pfcwd_disable__multiple-dut-mixed-speed_1024B-2024-10-09-16-30.csv
PASSED                                                                                                                                                                                                                                        [ 80%]
16:38:56 traffic_generation.run_sys_traffic       L1190 INFO   | Writing statistics to file : /tmp/Single_400Gbps_Ingress_Single_100Gbps_Egress_no_cong__multiple-dut-mixed-speed_1024B-2024-10-09-16-38.csv
PASSED                                                                                                                                                                                                                                        [100%]
Any platform specific information?
The test is specifically meant for Broadcom-DNX multi-ASIC platforms ONLY.

co-authorized by: [email protected]
* Stabilize `test_snmp_fdb_send_tagged

Signed-off-by: Longxiang Lyu <[email protected]>
What is the motivation for this PR?
Use thread-pool to parallel run the mux toggles.
This code is from PR: #16164, which is reverted.
Let's have the change here.

Signed-off-by: Longxiang [email protected]

How did you do it?
As the motivation.

How did you verify/test it?
Run on dualtor/dualtor-120 testbed.

Signed-off-by: Longxiang <[email protected]>
Description of PR
The reboot process for chassis will need longer time to wait for the BGP to be established. For example, the acl/test_acl.py can be flaky if we are not waiting BGP long enough. Therefore, we are introducing wait_for_bgp option to the reboot() function.

Summary:
Fixes # (issue) Microsoft ADO 30862178

Type of change
 Bug fix
 Testbed and Framework(new/improvement)
 New Test case
 Skipped for non-supported platforms
 Test case improvement
Back port request
 202012
 202205
 202305
 202311
 202405
 202411
Approach
What is the motivation for this PR?
We found that tests like acl/test_acl.py can be flaky if we are not waiting for BGP long enough on chassis after reboot. Therefore, we want to mimic what we have in config_reload() to also introduce the wait_for_bgp option to the reboot() function.

How did you do it?
How did you verify/test it?
I ran the updated acl test code and can make sure it's working well.

co-authorized by: [email protected]
…6494)

What is the motivation for this PR?
We add wait LLA logic in dhcp6relay with sonic-net/sonic-dhcp-relay#52
Current PR is to add test for it to verify dhcp6relay would work well when LLA is missing

How did you do it?
Modify test_interface_binding:

Remove LLA for Vlans
Restart dhcp_relay container
Verify whether sockets for LLA are not established
Add LLA back
Verify whether sockets for LLA are established

How did you verify/test it?
Run test_dhcpv6_relay.py
Description of PR
Summary:
Fixes interface stays down after tests/platform_tests/api/test_sfp.py::sfp_reset()

And causing error during shutdown_ebgp fixture teardown.

Type of change
 Bug fix
 Testbed and Framework(new/improvement)
 Test case(new/improvement)
Back port request
 202012
 202205
 202305
 202311
 202405
 202411
Approach
What is the motivation for this PR?
keep interface up after sfp_reset if it's T2 and QSFP-DD SFP.

How did you do it?
flap the interface after sfp_reset to restore the interface state.

How did you verify/test it?
passed on physical testbed with

admin@svcstr2-8800-lc1-1:~$ sudo sfputil show eeprom -d -p Ethernet0
Ethernet0: SFP EEPROM detected
...
        Application Advertisement: 400GAUI-8 C2M (Annex 120E) - Host Assign (0x1) - Active Cable assembly with BER < 5x10^-5 - Media Assign (0x1)
                                   CAUI-4 C2M (Annex 83E) - Host Assign (0x1) - Active Cable assembly with BER < 5x10^-5 - Media Assign (0x1)
        CMIS Revision: 4.0
        Connector: No separable connector
        Encoding: N/A
        Extended Identifier: Power Class 5 (10.0W Max)
        Extended RateSelect Compliance: N/A
        Hardware Revision: 1.0
        Host Electrical Interface: 400GAUI-8 C2M (Annex 120E)
        Host Lane Assignment Options: 1
        Host Lane Count: 8
        Identifier: QSFP-DD Double Density 8X Pluggable Transceiver
        Length Cable Assembly(m): 1.0
......
platform_tests/api/test_sfp.py::TestSfpApi::test_reset[xxx-lc1-1] PASSED [ 73%]
......
=========================== short test summary info ============================
FAILED platform_tests/api/test_sfp.py::TestSfpApi::test_lpmode[svcstr2-8800-lc1-1]                        <<<< this is separate issue, not related to this PR.
========================= 1 failed, 22 passed, 1 warning in 2104.41s (0:35:04) =========================
``
============= 1 failed, 22 passed, 1 warning in 2104.41s (0:35:04) =============
Any platform specific information?
Supported testbed topology if it's a new test case?

Co-authored-by: [email protected]
@r12f r12f merged commit 7718486 into Azure:202412 Jan 18, 2025
@r12f r12f deleted the code-sync-202412 branch January 18, 2025 02:32
mssonicbld added a commit to mssonicbld/sonic-mgmt.msft that referenced this pull request Feb 12, 2025
<!--
Please make sure you've read and understood our contributing guidelines;
https://github.com/sonic-net/SONiC/blob/gh-pages/CONTRIBUTING.md

Please provide following information to help code review process a bit easier:
-->
### Description of PR
<!--
- Please include a summary of the change and which issue is fixed.
- Please also include relevant motivation and context. Where should reviewer start? background context?
- List any dependencies that are required for this change.
-->
Add a module-level fixture for temporarily disabling route check for a test module

Summary:
Fixes # (issue) Microsoft ADO 31326413

### Type of change

<!--
- Fill x for your type of change.
- e.g.
- [x] Bug fix
-->

- [ ] Bug fix
- [ ] Testbed and Framework(new/improvement)
- [ ] New Test case
    - [ ] Skipped for non-supported platforms
- [x] Test case improvement

### Back port request
- [ ] 202012
- [ ] 202205
- [ ] 202305
- [ ] 202311
- [ ] 202405
- [x] 202411

### Approach
#### What is the motivation for this PR?
In our recent Cisco T2 Nightly run, we observed that we would get the following error syslog during some test modules:

```
E               Failed: Processes "['analyze_logs--<MultiAsicSonicHost dut-lc1-1>']" failed with exit code "1"
E               Exception:
E               match: 1
E               expected_match: 0
E               expected_missing_match: 0
E
E               Match Messages:
E               2025 Feb  3 03:03:29.550827 svcstr2-8800-lc1-1 ERR monit[914]: 'routeCheck' status failed (255) -- Failure results: {{Azure#12    "asic1": {Azure#12        "Unaccounted_ROUTE_ENTRY_TABLE_entries": [Azure#12            "100.1.0.22/32",Azure#12
```

After discussion, we decided to add a fixture so users can disable route check for a test module if they think that test tends to have such error syslog.

#### How did you do it?

#### How did you verify/test it?
I ran the updated code and can confirm it's working well.

#### Any platform specific information?

#### Supported testbed topology if it's a new test case?

### Documentation
<!--
(If it's a new feature, new test case)
Did you update documentation/Wiki relevant to your implementation?
Link to the wiki page?
-->
mssonicbld added a commit that referenced this pull request Feb 12, 2025
<!--
Please make sure you've read and understood our contributing guidelines;
https://github.com/sonic-net/SONiC/blob/gh-pages/CONTRIBUTING.md

Please provide following information to help code review process a bit easier:
-->
### Description of PR
<!--
- Please include a summary of the change and which issue is fixed.
- Please also include relevant motivation and context. Where should reviewer start? background context?
- List any dependencies that are required for this change.
-->
Add a module-level fixture for temporarily disabling route check for a test module

Summary:
Fixes # (issue) Microsoft ADO 31326413

### Type of change

<!--
- Fill x for your type of change.
- e.g.
- [x] Bug fix
-->

- [ ] Bug fix
- [ ] Testbed and Framework(new/improvement)
- [ ] New Test case
 - [ ] Skipped for non-supported platforms
- [x] Test case improvement

### Back port request
- [ ] 202012
- [ ] 202205
- [ ] 202305
- [ ] 202311
- [ ] 202405
- [x] 202411

### Approach
#### What is the motivation for this PR?
In our recent Cisco T2 Nightly run, we observed that we would get the following error syslog during some test modules:

```
E Failed: Processes "['analyze_logs--<MultiAsicSonicHost dut-lc1-1>']" failed with exit code "1"
E Exception:
E match: 1
E expected_match: 0
E expected_missing_match: 0
E
E Match Messages:
E 2025 Feb 3 03:03:29.550827 svcstr2-8800-lc1-1 ERR monit[914]: 'routeCheck' status failed (255) -- Failure results: {{#12 "asic1": {#12 "Unaccounted_ROUTE_ENTRY_TABLE_entries": [#12 "100.1.0.22/32",#12
```

After discussion, we decided to add a fixture so users can disable route check for a test module if they think that test tends to have such error syslog.

#### How did you do it?

#### How did you verify/test it?
I ran the updated code and can confirm it's working well.

#### Any platform specific information?

#### Supported testbed topology if it's a new test case?

### Documentation
<!--
(If it's a new feature, new test case)
Did you update documentation/Wiki relevant to your implementation?
Link to the wiki page?
-->
r12f pushed a commit that referenced this pull request Feb 20, 2025
Description of PR
Add a module-level fixture for temporarily disabling route check for a test module

Summary:
Fixes # (issue) Microsoft ADO 31326413

Approach
What is the motivation for this PR?
In our recent Cisco T2 Nightly run, we observed that we would get the following error syslog during some test modules:

E               Failed: Processes "['analyze_logs--<MultiAsicSonicHost dut-lc1-1>']" failed with exit code "1"
E               Exception:
E               match: 1
E               expected_match: 0
E               expected_missing_match: 0
E               
E               Match Messages:
E               2025 Feb  3 03:03:29.550827 svcstr2-8800-lc1-1 ERR monit[914]: 'routeCheck' status failed (255) -- Failure results: {{#12    "asic1": {#12        "Unaccounted_ROUTE_ENTRY_TABLE_entries": [#12            "100.1.0.22/32",#12
After discussion, we decided to add a fixture so users can disable route check for a test module if they think that test tends to have such error syslog.

How did you do it?
How did you verify/test it?
I ran the updated code and can confirm it's working well.

co-authorized by: [email protected]
echuawu pushed a commit to echuawu/sonic-mgmt.msft that referenced this pull request Jun 12, 2025
Description of PR
Add a module-level fixture for temporarily disabling route check for a test module

Summary:
Fixes # (issue) Microsoft ADO 31326413

Approach
What is the motivation for this PR?
In our recent Cisco T2 Nightly run, we observed that we would get the following error syslog during some test modules:

E               Failed: Processes "['analyze_logs--<MultiAsicSonicHost dut-lc1-1>']" failed with exit code "1"
E               Exception:
E               match: 1
E               expected_match: 0
E               expected_missing_match: 0
E               
E               Match Messages:
E               2025 Feb  3 03:03:29.550827 svcstr2-8800-lc1-1 ERR monit[914]: 'routeCheck' status failed (255) -- Failure results: {{Azure#12    "asic1": {Azure#12        "Unaccounted_ROUTE_ENTRY_TABLE_entries": [Azure#12            "100.1.0.22/32",Azure#12
After discussion, we decided to add a fixture so users can disable route check for a test module if they think that test tends to have such error syslog.

How did you do it?
How did you verify/test it?
I ran the updated code and can confirm it's working well.

co-authorized by: [email protected]
github-actions bot pushed a commit that referenced this pull request Jul 23, 2025
What is the motivation for this PR?
Arista 7060CX also has the same SAI issue as 7260CX3, which causes Everflow IPv6 tests to fail (#19096):

ERR syncd#syncd: [none] SAI_API_ACL:_brcm_sai_acl_xgs_create_entry:6142 field entry install failed with bcm error Feature unavailable (0xfffffff0).#12!!!
How did you do it?
Skipped the Everflow IPv6 tests on Arista 7060CX as well.

How did you verify/test it?
Any platform specific information?
Arista 7060CX
bingwang-ms pushed a commit that referenced this pull request Jan 16, 2026
What is the motivation for this PR?
Arista 7060CX also has the same SAI issue as 7260CX3, which causes Everflow IPv6 tests to fail (#19096):

ERR syncd#syncd: [none] SAI_API_ACL:_brcm_sai_acl_xgs_create_entry:6142 field entry install failed with bcm error Feature unavailable (0xfffffff0).#12!!!
How did you do it?
Skipped the Everflow IPv6 tests on Arista 7060CX as well.

How did you verify/test it?
Any platform specific information?
Arista 7060CX
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.