Skip to content

updating the testplan to include the new test case and metrics measur…#21437

Closed
PriyanshTratiya wants to merge 311 commits intosonic-net:masterfrom
PriyanshTratiya:testplan/new-test-cases
Closed

updating the testplan to include the new test case and metrics measur…#21437
PriyanshTratiya wants to merge 311 commits intosonic-net:masterfrom
PriyanshTratiya:testplan/new-test-cases

Conversation

@PriyanshTratiya
Copy link
Copy Markdown
Contributor

@PriyanshTratiya PriyanshTratiya commented Nov 26, 2025

Description of PR

Summary:
Fixes # (issue)
This test plan update is to add a test case to see if control/data plane can handle the flapping of numerous admin BGP session holding a lot routes, and estimate the impact on it.

We also introduce route programming timing measurement.

Related PRs:

  • PR #21335 : Reuse of existing codes for future test additions
  • PR #21415 : Introduction of the new test case - BGP Admin Flap
  • PR #21416 : Route Programming Time measurement
  • PR #21417 : Clean ptf dataplane to fix non linear downtime increase
  • PR #21418 : Fix nexthop non linear downtime increase

Type of change

  • Bug fix
  • Testbed and Framework(new/improvement)
  • New Test case
    • Skipped for non-supported platforms
  • Test case improvement

Back port request

  • 202205
  • 202305
  • 202311
  • 202405
  • 202411
  • 202505

Approach

What is the motivation for this PR?

Update the test plan documents with methodology and details of the new test plan.

How did you do it?

How did you verify/test it?

Any platform specific information?

Ran the tests on t0-isolated-d2u510s2 topology

Supported testbed topology if it's a new test case?

Documentation

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@PriyanshTratiya PriyanshTratiya requested a review from r12f November 26, 2025 00:02
@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines could not run because the pipeline triggers exclude this branch/path.

@PriyanshTratiya PriyanshTratiya marked this pull request as ready for review November 26, 2025 22:27
@PriyanshTratiya PriyanshTratiya marked this pull request as draft November 26, 2025 22:30
@PriyanshTratiya PriyanshTratiya marked this pull request as ready for review November 26, 2025 22:40
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines could not run because the pipeline triggers exclude this branch/path.

@github-actions github-actions bot requested a review from mihirpat1 January 21, 2026 21:26
wangxin and others added 17 commits January 21, 2026 13:26
In PR sonic-net#21045, I made mistakes while resolving the conflicts with sonic-net#20292. Some code added in sonic-net#20292 was accidentally reverted.

This change is to add back the code introduced in sonic-net#20292 for creating SNMP UdpTransportTarget. Then the snmp_facts module will work with IPv6 only scenario again.

Signed-off-by: Xin Wang <[email protected]>
Signed-off-by: Priyansh Tratiya <[email protected]>
In sanity check, command `traceroute` is executed in the `print_logs` function to check PTF device reachability. However, the `traceroute` command is executed without the `-n` argument. Then it tries to resolve the DNS name of involved IP addresses. Usually the DNS resolve will timeout because PTF IP is usually not resolvable. Because of this, the `print_logs` function need more than 30 seconds to complete.

This change added "-n" argument to the traceroute command. With this change, usually 40 seonds less is required to run sanity check.

Signed-off-by: Xin Wang <[email protected]>
Signed-off-by: Priyansh Tratiya <[email protected]>
VsSetup documentation is currently a little unclear on somethings. Add a little improvement to improve readability.

Signed-off-by: Xichen Lin <[email protected]>
Signed-off-by: Priyansh Tratiya <[email protected]>
…net#21494)

Current isc-dhcp-relay do not support dropping packets with wrong ip and udp checksum, this skipping this test. Issue tracking this sonic-net/sonic-buildimage#24660.

Signed-off-by: Xichen Lin <[email protected]>
Signed-off-by: Priyansh Tratiya <[email protected]>
enable test_acl and test_qos_sai for t0-f2-d40u8, t1-f2-d10u8 topo

Signed-off-by: Priyansh Tratiya <[email protected]>
…t#21469)

Skip fast/warm reboot for t0-f2-d40u8 and t1-f2-d10u8 as it's not required.

Signed-off-by: Priyansh Tratiya <[email protected]>
* [202511] Update PR template (sonic-net#21395)

Signed-off-by: vikumarks <[email protected]>

* pretty print tgen stats

Signed-off-by: vikumarks <[email protected]>

* Revert "pretty print tgen stats"

This reverts commit e67d03b.

Signed-off-by: vikumarks <[email protected]>

* Pretty print tgen Stats

Signed-off-by: vikumarks <[email protected]>

---------

Signed-off-by: vikumarks <[email protected]>
Co-authored-by: Vineet  Mittal <[email protected]>
Signed-off-by: Priyansh Tratiya <[email protected]>
What is the motivation for this PR?
Skip test_bgp_port_disable on all public branches. This only works for internal.
Signed-off-by: yawenni <[email protected]>
Signed-off-by: Priyansh Tratiya <[email protected]>
* Added VxLAN underlay ECMP tests.

Signed-off-by: Mahdi Ramezani <[email protected]>
Signed-off-by: Priyansh Tratiya <[email protected]>
…21318)

* Support multi-chip in packet trimming tests

Signed-off-by: Priyansh Tratiya <[email protected]>
Enabling srv6/test_srv6_static_config.py and srv6/test_srv6_dataplane.py test cases for SONiC-VPP on T1 and T1-lag. Dependent on sonic-net/sonic-buildimage#24359 and sonic-net/sonic-sairedis#1673 to pass.

What is the motivation for this PR?
SONiC VPP has some SRv6 features enabled, but currently no sonic-mgmt test cases are covering them.

How did you do it?
How did you verify/test it?
Ran sonic-mgmt locally with the above two PRs applied to SONiC-VPP image and all cases are passing.

Signed-off-by: Priyansh Tratiya <[email protected]>
Skip `test_replace_fec` on dualtor as the test is trying to change
downlink's fec from `none` to `rs`, this is not a valid/supported
scenario, the testcase needs a further improvement.

Signed-off-by: Longxiang Lyu <[email protected]>
Signed-off-by: Priyansh Tratiya <[email protected]>
markx-arista and others added 21 commits January 21, 2026 13:27
Skip v4 neighbor checks for v6 topo.
Add new lookback_ipv6 fixture because IPv6 loopback IP is not directly
used for route advertisement. Also add test_bgp_router_id_set_ipv6 for
v6 topo only.
Delete xfail and add skip for
test_bgp_router_id_set/test_bgp_router_id_set_ipv6 based on v6/non-v6
topo.

Signed-off-by: markxiao <[email protected]>
Signed-off-by: Priyansh Tratiya <[email protected]>
Skip v4 neighbors and checks for v6 topo
Delete test_bgp_gr_helper.py xfail for v6 topo

Signed-off-by: markxiao <[email protected]>
Signed-off-by: Priyansh Tratiya <[email protected]>
sonic-net#21451)

Fix test_bgp_session_flap when memory stat is always 0, e.g., v4 neighbors memory usage in v6 topo.

Signed-off-by: markxiao <[email protected]>
Signed-off-by: Priyansh Tratiya <[email protected]>
Description of PR
Summary:
Fixes # (issue) Currently, deploy-l1 is using overwrite mode. Which replace all the previous connection and add a new one on top.

However, this will also not allow us to re-use l1.

New changes add a parameter allow_previous_connection which helps to allow existing connections.

It only remove if the connection is overlap with the second connection

Signed-off-by: Austin Pham <[email protected]>
Signed-off-by: Priyansh Tratiya <[email protected]>
Description of PR
Skip recover the rate limit on vs testbed. Otherwise it will prevent swssN container sending rsyslog to host

Signed-off-by: Priyansh Tratiya <[email protected]>
What is the motivation for this PR?
Packet trimming verification was failing when using a PortChannel. We found that the current implementation for distributing packets was not working for the th5 chip. Specifically, all the packets that are supposed to fill up the egress queue were going to one of the ports in the LAG, and then the second set of packet that we expect to be trimmed was going to the other port. Therefore, trimming never happened.

How did you do it?
Instead of setting the UDP source port to DEFAULT_SRC_PORT + interface_index we use DEFAULT_SRC_PORT + 10 * interface_index.

How did you verify/test it?
Verified via tcpdump that packets were now filling up the egress queue on both ports in the LAG. Also verified that trimming was occurring when sending the second wave of packets.

Any platform specific information?
This problem was only discovered when testing with th5

Signed-off-by: Priyansh Tratiya <[email protected]>
)

meter_type is platform specific, so the inclusion of it is only
implemented for Broadcom TH5 chip right now. Other chips in the future
that require it will need to modify the code.

Signed-off-by: Ryan Garofano <[email protected]>
Signed-off-by: Priyansh Tratiya <[email protected]>
…net#21967)

What is the motivation for this PR?
202511 nightly upgrade has the following image upgrade issue:

2026-01-19 04:47:52,053 ansible_hosts.py#502 DEBUG - /var/src/sonic-mgmt_testbed-bjw3-can-dual-t0-7260-1/ansible/devutil/devices/sonic.py::post_upgrade_actions#237: "localhost" -> AnsibleModule::pause | Results =>{"hostname": "localhost", "reachable": true, "failed": false, "changed": false, "rc": 0, "stderr": "", "stdout": "Paused for 60.0 seconds", "start": "2026-01-19 04:46:52.047977", "stop": "2026-01-19 04:47:52.049217", "delta": 60, "echo": true, "user_input": "", "_ansible_no_log": false}
2026-01-19 04:47:52,054 ansible_hosts.py#426 DEBUG - ===== ['bjw3-can-7260-13', 'bjw3-can-7260-14'] -> shell ================================================================
2026-01-19 04:47:52,054 ansible_hosts.py#444 DEBUG - /var/src/sonic-mgmt_testbed-bjw3-can-dual-t0-7260-1/ansible/devutil/devices/sonic.py::post_upgrade_actions#242: ["bjw3-can-7260-13", "bjw3-can-7260-14"] -> AnsibleModule::shell, {"module_name": "shell", "args": ["sed -i \"s/^ClientAliveInterval [0-9].*/ClientAliveInterval 900/g\" /etc/ssh/sshd_config && systemctl restart sshd"], "kwargs": {}, "module_attrs": {"become": true}}
2026-01-19 04:47:53,211 ansible_hosts.py#502 DEBUG - /var/src/sonic-mgmt_testbed-bjw3-can-dual-t0-7260-1/ansible/devutil/devices/sonic.py::post_upgrade_actions#242: ["bjw3-can-7260-13", "bjw3-can-7260-14"] -> AnsibleModule::shell | Results =>{"bjw3-can-7260-13": {"hostname": "bjw3-can-7260-13", "reachable": true, "failed": true, "module_stdout": "", "module_stderr": "Warning: Permanently added '10.150.238.70' (RSA) to the list of known hosts.\r\nDebian GNU/Linux 13 \\n \\l\n\n/bin/sh: 1: /usr/bin/python3.11: not found\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\nSee stdout/stderr for the exact error", "rc": 127, "_ansible_no_log": false, "changed": false}, "bjw3-can-7260-14": {"hostname": "bjw3-can-7260-14", "reachable": true, "failed": true, "module_stdout": "", "module_stderr": "Warning: Permanently added '10.150.238.72' (RSA) to the list of known hosts.\r\nDebian GNU/Linux 13 \\n \\l\n\n/bin/sh: 1: /usr/bin/python3.11: not found\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\nSee stdout/stderr for the exact error", "rc": 127, "_ansible_no_log": false, "changed": false}}
2026-01-19 04:47:53,212 sonic.py#265 ERROR - Post upgrade actions failed, devices: ['bjw3-can-7260-13', 'bjw3-can-7260-14'], error: RunAnsibleModuleFailed('/var/src/sonic-mgmt_testbed-bjw3-can-dual-t0-7260-1/ansible/devutil/devices/sonic.py::post_upgrade_actions#242: ["bjw3-can-7260-13", "bjw3-can-7260-14"] -> AnsibleModule::"shell" failed, Results => {"bjw3-can-7260-13": {"hostname": "bjw3-can-7260-13", "reachable": true, "failed": true, "module_stdout": "", "module_stderr": "Warning: Permanently added \'10.150.238.70\' (RSA) to the list of known hosts.\\r\\nDebian GNU/Linux 13 \\\\n \\\\l\\n\\n/bin/sh: 1: /usr/bin/python3.11: not found\\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\\nSee stdout/stderr for the exact error", "rc": 127, "_ansible_no_log": false, "changed": false}, "bjw3-can-7260-14": {"hostname": "bjw3-can-7260-14", "reachable": true, "failed": true, "module_stdout": "", "module_stderr": "Warning: Permanently added \'10.150.238.72\' (RSA) to the list of known hosts.\\r\\nDebian GNU/Linux 13 \\\\n \\\\l\\n\\n/bin/sh: 1: /usr/bin/python3.11: not found\\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\\nSee stdout/stderr for the exact error", "rc": 127, "_ansible_no_log": false, "changed": false}}')
The issue is due to the PREV image and the upgrade-to image have different python interpreter versions. When Ansible runs first time on a target device, it will cache the python interpreter path in the memory; the error arises when the device boots up with the upgrade-to image and Ansible fails to find the python interpreter using the path that is from the PREV image.

This is observed on nightly that tries to upgrade to 20251110.03:

admin@bjw3-can-7260-13:~$ show version | head -n 5

SONiC Software Version: SONiC.20251110.02
SONiC OS Version: 12
Distribution: Debian 12.12
Kernel: 6.1.0-29-2-amd64
admin@bjw3-can-7260-13:~$ python --version
Python 3.11.2

admin@bjw3-can-7260-13:~$ show version | head -n 5

SONiC Software Version: SONiC.20251110.03
SONiC OS Version: 13
Distribution: Debian 13.2
Kernel: 6.12.41+deb13-sonic-amd64
admin@bjw3-can-7260-13:~$ python --version
Python 3.13.5
Signed-off-by: Longxiang [email protected]

How did you do it?
Let's reset the facts cache in the postupgrade before running any Ansible modules on the new image.

How did you verify/test it?
2026-01-19 06:29:24,312 upgrade_image.py#187 INFO - SONiC host bjw3-can-7260-13 current version 20251110.03
2026-01-19 06:29:24,312 upgrade_image.py#187 INFO - SONiC host bjw3-can-7260-14 current version 20251110.03
2026-01-19 06:29:24,312 upgrade_image.py#202 INFO - Skip enabling FIPS
2026-01-19 06:29:24,313 upgrade_image.py#220 INFO - Use default docker folder size
2026-01-19 06:29:24,313 upgrade_image.py#232 INFO - ===== UPGRADE IMAGE DONE =====

Signed-off-by: Priyansh Tratiya <[email protected]>
* Add a test for verifying cpu queue shaper config

 * This test verifies the cpu queue shaper configuration on broadcom
   platforms. This may be extended to other platforms once they implement
   cpu queue shapers.

 * Note that the test itself doesn't run any traffic tests as those may
   not provide a reflection of the cpu queue shaper configuration due to
   trap policer configurations on sonic.

Signed-off-by: Prabhat Aravind <[email protected]>

* Use a different config option than reboot_type which already exists

Signed-off-by: Prabhat Aravind <[email protected]>

* Remove queue1 check as queue1 only has a policer

Signed-off-by: Prabhat Aravind <[email protected]>

---------

Signed-off-by: Prabhat Aravind <[email protected]>
Signed-off-by: Priyansh Tratiya <[email protected]>
…e tests (sonic-net#19594)

This PR adds erspan-ip-ver support to the newly added 'test_everflow_fwd_recircle_port_queue_check'
It adds 'erspan_ip_ver' to the mirror session configuration for the respective test.

Signed-off-by: Priyansh Tratiya <[email protected]>
…et#21959)

What is the motivation for this PR?
In the new testbed, the kernel outputs warning messages containing the word “error,” which causes LogAnalyzer to treat them as errors.

How did you do it?
It's a warning message, should not be failed in loganalyzer, ignore these messages.

How did you verify/test it?
https://elastictest.org/scheduler/testplan/696d71df4bbe3bd7ad16cdf3
https://elastictest.org/scheduler/testplan/696d71dd4b8aa910b618436d
https://elastictest.org/scheduler/testplan/696d71dd4b8aa910b618436b
https://elastictest.org/scheduler/testplan/696d71dc15026fd4f746b8f0
https://elastictest.org/scheduler/testplan/696d71db4bbe3bd7ad16cdf1
https://elastictest.org/scheduler/testplan/696d71da5e4aaa3c28ac7499

Signed-off-by: Priyansh Tratiya <[email protected]>
This enable VPP data plane testing. For now, we keep it optional

Signed-off-by: Austin Pham <[email protected]>
Signed-off-by: Priyansh Tratiya <[email protected]>
Signed-off-by: Austin Pham <[email protected]>
Signed-off-by: Priyansh Tratiya <[email protected]>
* fix: added kwargs arg to console connection to align with netmiko

Signed-off-by: Carl Flottmann <[email protected]>

* explicitly add delay factor with kwargs

Signed-off-by: Carl Flottmann <[email protected]>

---------

Signed-off-by: Carl Flottmann <[email protected]>
Signed-off-by: Priyansh Tratiya <[email protected]>
What is the motivation for this PR?
This repo has enabled auto code reviewer assign according to the git history. The .github/CODEOWNERS file is now outdated and not necessary.

Background of auto code reviewer assign
It is tedious to manually maintain the CODEOWNERS file. @nikamirrr from Nvidia contributed auto code reviewer assign workflow: https://github.com/sonic-net/sonic-mgmt/actions/workflows/code-reviewer-tagging.yml

This automation automatically check git history of files updated by PR. Find out the previous contributors of the touched files and assign them as reviewer of the PR.

This automation has 2 major benefits:

No need to manually maintain the CODEOWNERS file.
Automatically find the most appropriate owners/reviewers. Not able to find appropriate owner of code has been a pain point of community for quite some time.
For details, please refer to https://github.com/sonic-net/sonic-pipelines/tree/main/scripts/code-owners

How did you do it?
This change deleted this .github/CODEOWNERS file.

Signed-off-by: Xin Wang <[email protected]>
Signed-off-by: Priyansh Tratiya <[email protected]>
Signed-off-by: Dashuai Zhang <[email protected]>
Signed-off-by: Priyansh Tratiya <[email protected]>
* ptf dataplane cleaners for in between test runs

Signed-off-by: Priyansh Tratiya <[email protected]>
Description of PR
Summary:
Fixes test_qos_sai failures due to unexpected ipv6 packets coming from fanout device

Type of change
 Bug fix
 Testbed and Framework(new/improvement)
 New Test case
 Skipped for non-supported platforms
 Test case improvement

Approach
What is the motivation for this PR?
Disable ipv6 for broadcom SONiC fanout devices.

How did you do it?
Add broadcom into the list.

How did you verify/test it?
physical testbed.

Signed-off-by: Dashuai Zhang <[email protected]>
Signed-off-by: Priyansh Tratiya <[email protected]>
Description of PR
Summary:
Fix the following log analysis error when running generic_config_updater/test_bgp_prefix.py. it's due to the BGP template used for isolated topo is different.

2026 Jan 18 05:17:43.204062 xxxx ERR bgp#bgpcfgd: BGPAllowListMgr::Default action community value is not found. route-map 'ALLOW_LIST_DEPLOYMENT_ID_0_V4' entry. seq_no=65535
same test case has been skipped for 202412 in sonic-net#18612
skip it for 202505 for isolated topo

Type of change
 Bug fix
 Testbed and Framework(new/improvement)
 New Test case
 Skipped for non-supported platforms
 Test case improvement

Approach
What is the motivation for this PR?
Skip unsupported feature

How did you do it?
mark condition

How did you verify/test it?
local testbed

Signed-off-by: Dashuai Zhang <[email protected]>
Signed-off-by: Priyansh Tratiya <[email protected]>
…sonic-net#20410)

Description of PR
Summary:
Fixes # (issue)

This PR fixes issues in 'test_iface_namingmode.py' for "TestShowQueue#test_show_queue_counters" tests.

The above test is failing on multi-asic devices, while retrieving 'interfaces' from 'buffer_queue_keys' values with the following error.

>                   assert (re.search(QUEUE_COUNTERS_RE_FMT.format(alias),
                                      queue_counter) is not None) \
                        and (re.search(QUEUE_COUNTERS_RE_FMT.format(setup['port_alias_map'][alias]),
                             queue_counter) is None)
E                   AssertionError

alias      = 'TestAlias21'

Signed-off-by: Priyansh Tratiya <[email protected]>
Signed-off-by: Priyansh Tratiya <[email protected]>
@PriyanshTratiya PriyanshTratiya force-pushed the testplan/new-test-cases branch from 0436abb to b589462 Compare January 21, 2026 21:27
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.