updating the testplan to include the new test case and metrics measur…#21437
Closed
PriyanshTratiya wants to merge 311 commits intosonic-net:masterfrom
Closed
updating the testplan to include the new test case and metrics measur…#21437PriyanshTratiya wants to merge 311 commits intosonic-net:masterfrom
PriyanshTratiya wants to merge 311 commits intosonic-net:masterfrom
Conversation
Collaborator
|
/azp run |
|
Azure Pipelines could not run because the pipeline triggers exclude this branch/path. |
10 tasks
10 tasks
Collaborator
|
/azp run |
|
Azure Pipelines could not run because the pipeline triggers exclude this branch/path. |
In PR sonic-net#21045, I made mistakes while resolving the conflicts with sonic-net#20292. Some code added in sonic-net#20292 was accidentally reverted. This change is to add back the code introduced in sonic-net#20292 for creating SNMP UdpTransportTarget. Then the snmp_facts module will work with IPv6 only scenario again. Signed-off-by: Xin Wang <[email protected]> Signed-off-by: Priyansh Tratiya <[email protected]>
In sanity check, command `traceroute` is executed in the `print_logs` function to check PTF device reachability. However, the `traceroute` command is executed without the `-n` argument. Then it tries to resolve the DNS name of involved IP addresses. Usually the DNS resolve will timeout because PTF IP is usually not resolvable. Because of this, the `print_logs` function need more than 30 seconds to complete. This change added "-n" argument to the traceroute command. With this change, usually 40 seonds less is required to run sanity check. Signed-off-by: Xin Wang <[email protected]> Signed-off-by: Priyansh Tratiya <[email protected]>
VsSetup documentation is currently a little unclear on somethings. Add a little improvement to improve readability. Signed-off-by: Xichen Lin <[email protected]> Signed-off-by: Priyansh Tratiya <[email protected]>
…net#21494) Current isc-dhcp-relay do not support dropping packets with wrong ip and udp checksum, this skipping this test. Issue tracking this sonic-net/sonic-buildimage#24660. Signed-off-by: Xichen Lin <[email protected]> Signed-off-by: Priyansh Tratiya <[email protected]>
enable test_acl and test_qos_sai for t0-f2-d40u8, t1-f2-d10u8 topo Signed-off-by: Priyansh Tratiya <[email protected]>
…t#21469) Skip fast/warm reboot for t0-f2-d40u8 and t1-f2-d10u8 as it's not required. Signed-off-by: Priyansh Tratiya <[email protected]>
Signed-off-by: Priyansh Tratiya <[email protected]>
) Signed-off-by: Ryan Garofano <[email protected]> Signed-off-by: Priyansh Tratiya <[email protected]>
* [202511] Update PR template (sonic-net#21395) Signed-off-by: vikumarks <[email protected]> * pretty print tgen stats Signed-off-by: vikumarks <[email protected]> * Revert "pretty print tgen stats" This reverts commit e67d03b. Signed-off-by: vikumarks <[email protected]> * Pretty print tgen Stats Signed-off-by: vikumarks <[email protected]> --------- Signed-off-by: vikumarks <[email protected]> Co-authored-by: Vineet Mittal <[email protected]> Signed-off-by: Priyansh Tratiya <[email protected]>
What is the motivation for this PR? https://github.com/sonic-net/SONiC/blob/master/doc/reboot/Reboot_BlockingMode_HLD.md How did you do it? https://github.com/sonic-net/SONiC/blob/master/doc/reboot/Reboot_BlockingMode_HLD.md Signed-off-by: Litao Yu <[email protected]> Signed-off-by: Priyansh Tratiya <[email protected]>
Signed-off-by: Priyansh Tratiya <[email protected]>
What is the motivation for this PR? Skip test_bgp_port_disable on all public branches. This only works for internal. Signed-off-by: yawenni <[email protected]> Signed-off-by: Priyansh Tratiya <[email protected]>
* Added VxLAN underlay ECMP tests. Signed-off-by: Mahdi Ramezani <[email protected]> Signed-off-by: Priyansh Tratiya <[email protected]>
…21318) * Support multi-chip in packet trimming tests Signed-off-by: Priyansh Tratiya <[email protected]>
Signed-off-by: Priyansh Tratiya <[email protected]>
Enabling srv6/test_srv6_static_config.py and srv6/test_srv6_dataplane.py test cases for SONiC-VPP on T1 and T1-lag. Dependent on sonic-net/sonic-buildimage#24359 and sonic-net/sonic-sairedis#1673 to pass. What is the motivation for this PR? SONiC VPP has some SRv6 features enabled, but currently no sonic-mgmt test cases are covering them. How did you do it? How did you verify/test it? Ran sonic-mgmt locally with the above two PRs applied to SONiC-VPP image and all cases are passing. Signed-off-by: Priyansh Tratiya <[email protected]>
Skip `test_replace_fec` on dualtor as the test is trying to change downlink's fec from `none` to `rs`, this is not a valid/supported scenario, the testcase needs a further improvement. Signed-off-by: Longxiang Lyu <[email protected]> Signed-off-by: Priyansh Tratiya <[email protected]>
Skip v4 neighbor checks for v6 topo. Add new lookback_ipv6 fixture because IPv6 loopback IP is not directly used for route advertisement. Also add test_bgp_router_id_set_ipv6 for v6 topo only. Delete xfail and add skip for test_bgp_router_id_set/test_bgp_router_id_set_ipv6 based on v6/non-v6 topo. Signed-off-by: markxiao <[email protected]> Signed-off-by: Priyansh Tratiya <[email protected]>
Skip v4 neighbors and checks for v6 topo Delete test_bgp_gr_helper.py xfail for v6 topo Signed-off-by: markxiao <[email protected]> Signed-off-by: Priyansh Tratiya <[email protected]>
sonic-net#21451) Fix test_bgp_session_flap when memory stat is always 0, e.g., v4 neighbors memory usage in v6 topo. Signed-off-by: markxiao <[email protected]> Signed-off-by: Priyansh Tratiya <[email protected]>
Description of PR Summary: Fixes # (issue) Currently, deploy-l1 is using overwrite mode. Which replace all the previous connection and add a new one on top. However, this will also not allow us to re-use l1. New changes add a parameter allow_previous_connection which helps to allow existing connections. It only remove if the connection is overlap with the second connection Signed-off-by: Austin Pham <[email protected]> Signed-off-by: Priyansh Tratiya <[email protected]>
Description of PR Skip recover the rate limit on vs testbed. Otherwise it will prevent swssN container sending rsyslog to host Signed-off-by: Priyansh Tratiya <[email protected]>
What is the motivation for this PR? Packet trimming verification was failing when using a PortChannel. We found that the current implementation for distributing packets was not working for the th5 chip. Specifically, all the packets that are supposed to fill up the egress queue were going to one of the ports in the LAG, and then the second set of packet that we expect to be trimmed was going to the other port. Therefore, trimming never happened. How did you do it? Instead of setting the UDP source port to DEFAULT_SRC_PORT + interface_index we use DEFAULT_SRC_PORT + 10 * interface_index. How did you verify/test it? Verified via tcpdump that packets were now filling up the egress queue on both ports in the LAG. Also verified that trimming was occurring when sending the second wave of packets. Any platform specific information? This problem was only discovered when testing with th5 Signed-off-by: Priyansh Tratiya <[email protected]>
) meter_type is platform specific, so the inclusion of it is only implemented for Broadcom TH5 chip right now. Other chips in the future that require it will need to modify the code. Signed-off-by: Ryan Garofano <[email protected]> Signed-off-by: Priyansh Tratiya <[email protected]>
…net#21967) What is the motivation for this PR? 202511 nightly upgrade has the following image upgrade issue: 2026-01-19 04:47:52,053 ansible_hosts.py#502 DEBUG - /var/src/sonic-mgmt_testbed-bjw3-can-dual-t0-7260-1/ansible/devutil/devices/sonic.py::post_upgrade_actions#237: "localhost" -> AnsibleModule::pause | Results =>{"hostname": "localhost", "reachable": true, "failed": false, "changed": false, "rc": 0, "stderr": "", "stdout": "Paused for 60.0 seconds", "start": "2026-01-19 04:46:52.047977", "stop": "2026-01-19 04:47:52.049217", "delta": 60, "echo": true, "user_input": "", "_ansible_no_log": false} 2026-01-19 04:47:52,054 ansible_hosts.py#426 DEBUG - ===== ['bjw3-can-7260-13', 'bjw3-can-7260-14'] -> shell ================================================================ 2026-01-19 04:47:52,054 ansible_hosts.py#444 DEBUG - /var/src/sonic-mgmt_testbed-bjw3-can-dual-t0-7260-1/ansible/devutil/devices/sonic.py::post_upgrade_actions#242: ["bjw3-can-7260-13", "bjw3-can-7260-14"] -> AnsibleModule::shell, {"module_name": "shell", "args": ["sed -i \"s/^ClientAliveInterval [0-9].*/ClientAliveInterval 900/g\" /etc/ssh/sshd_config && systemctl restart sshd"], "kwargs": {}, "module_attrs": {"become": true}} 2026-01-19 04:47:53,211 ansible_hosts.py#502 DEBUG - /var/src/sonic-mgmt_testbed-bjw3-can-dual-t0-7260-1/ansible/devutil/devices/sonic.py::post_upgrade_actions#242: ["bjw3-can-7260-13", "bjw3-can-7260-14"] -> AnsibleModule::shell | Results =>{"bjw3-can-7260-13": {"hostname": "bjw3-can-7260-13", "reachable": true, "failed": true, "module_stdout": "", "module_stderr": "Warning: Permanently added '10.150.238.70' (RSA) to the list of known hosts.\r\nDebian GNU/Linux 13 \\n \\l\n\n/bin/sh: 1: /usr/bin/python3.11: not found\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\nSee stdout/stderr for the exact error", "rc": 127, "_ansible_no_log": false, "changed": false}, "bjw3-can-7260-14": {"hostname": "bjw3-can-7260-14", "reachable": true, "failed": true, "module_stdout": "", "module_stderr": "Warning: Permanently added '10.150.238.72' (RSA) to the list of known hosts.\r\nDebian GNU/Linux 13 \\n \\l\n\n/bin/sh: 1: /usr/bin/python3.11: not found\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\nSee stdout/stderr for the exact error", "rc": 127, "_ansible_no_log": false, "changed": false}} 2026-01-19 04:47:53,212 sonic.py#265 ERROR - Post upgrade actions failed, devices: ['bjw3-can-7260-13', 'bjw3-can-7260-14'], error: RunAnsibleModuleFailed('/var/src/sonic-mgmt_testbed-bjw3-can-dual-t0-7260-1/ansible/devutil/devices/sonic.py::post_upgrade_actions#242: ["bjw3-can-7260-13", "bjw3-can-7260-14"] -> AnsibleModule::"shell" failed, Results => {"bjw3-can-7260-13": {"hostname": "bjw3-can-7260-13", "reachable": true, "failed": true, "module_stdout": "", "module_stderr": "Warning: Permanently added \'10.150.238.70\' (RSA) to the list of known hosts.\\r\\nDebian GNU/Linux 13 \\\\n \\\\l\\n\\n/bin/sh: 1: /usr/bin/python3.11: not found\\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\\nSee stdout/stderr for the exact error", "rc": 127, "_ansible_no_log": false, "changed": false}, "bjw3-can-7260-14": {"hostname": "bjw3-can-7260-14", "reachable": true, "failed": true, "module_stdout": "", "module_stderr": "Warning: Permanently added \'10.150.238.72\' (RSA) to the list of known hosts.\\r\\nDebian GNU/Linux 13 \\\\n \\\\l\\n\\n/bin/sh: 1: /usr/bin/python3.11: not found\\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\\nSee stdout/stderr for the exact error", "rc": 127, "_ansible_no_log": false, "changed": false}}') The issue is due to the PREV image and the upgrade-to image have different python interpreter versions. When Ansible runs first time on a target device, it will cache the python interpreter path in the memory; the error arises when the device boots up with the upgrade-to image and Ansible fails to find the python interpreter using the path that is from the PREV image. This is observed on nightly that tries to upgrade to 20251110.03: admin@bjw3-can-7260-13:~$ show version | head -n 5 SONiC Software Version: SONiC.20251110.02 SONiC OS Version: 12 Distribution: Debian 12.12 Kernel: 6.1.0-29-2-amd64 admin@bjw3-can-7260-13:~$ python --version Python 3.11.2 admin@bjw3-can-7260-13:~$ show version | head -n 5 SONiC Software Version: SONiC.20251110.03 SONiC OS Version: 13 Distribution: Debian 13.2 Kernel: 6.12.41+deb13-sonic-amd64 admin@bjw3-can-7260-13:~$ python --version Python 3.13.5 Signed-off-by: Longxiang [email protected] How did you do it? Let's reset the facts cache in the postupgrade before running any Ansible modules on the new image. How did you verify/test it? 2026-01-19 06:29:24,312 upgrade_image.py#187 INFO - SONiC host bjw3-can-7260-13 current version 20251110.03 2026-01-19 06:29:24,312 upgrade_image.py#187 INFO - SONiC host bjw3-can-7260-14 current version 20251110.03 2026-01-19 06:29:24,312 upgrade_image.py#202 INFO - Skip enabling FIPS 2026-01-19 06:29:24,313 upgrade_image.py#220 INFO - Use default docker folder size 2026-01-19 06:29:24,313 upgrade_image.py#232 INFO - ===== UPGRADE IMAGE DONE ===== Signed-off-by: Priyansh Tratiya <[email protected]>
* Add a test for verifying cpu queue shaper config * This test verifies the cpu queue shaper configuration on broadcom platforms. This may be extended to other platforms once they implement cpu queue shapers. * Note that the test itself doesn't run any traffic tests as those may not provide a reflection of the cpu queue shaper configuration due to trap policer configurations on sonic. Signed-off-by: Prabhat Aravind <[email protected]> * Use a different config option than reboot_type which already exists Signed-off-by: Prabhat Aravind <[email protected]> * Remove queue1 check as queue1 only has a policer Signed-off-by: Prabhat Aravind <[email protected]> --------- Signed-off-by: Prabhat Aravind <[email protected]> Signed-off-by: Priyansh Tratiya <[email protected]>
…e tests (sonic-net#19594) This PR adds erspan-ip-ver support to the newly added 'test_everflow_fwd_recircle_port_queue_check' It adds 'erspan_ip_ver' to the mirror session configuration for the respective test. Signed-off-by: Priyansh Tratiya <[email protected]>
…et#21959) What is the motivation for this PR? In the new testbed, the kernel outputs warning messages containing the word “error,” which causes LogAnalyzer to treat them as errors. How did you do it? It's a warning message, should not be failed in loganalyzer, ignore these messages. How did you verify/test it? https://elastictest.org/scheduler/testplan/696d71df4bbe3bd7ad16cdf3 https://elastictest.org/scheduler/testplan/696d71dd4b8aa910b618436d https://elastictest.org/scheduler/testplan/696d71dd4b8aa910b618436b https://elastictest.org/scheduler/testplan/696d71dc15026fd4f746b8f0 https://elastictest.org/scheduler/testplan/696d71db4bbe3bd7ad16cdf1 https://elastictest.org/scheduler/testplan/696d71da5e4aaa3c28ac7499 Signed-off-by: Priyansh Tratiya <[email protected]>
This enable VPP data plane testing. For now, we keep it optional Signed-off-by: Austin Pham <[email protected]> Signed-off-by: Priyansh Tratiya <[email protected]>
Signed-off-by: Austin Pham <[email protected]> Signed-off-by: Priyansh Tratiya <[email protected]>
* fix: added kwargs arg to console connection to align with netmiko Signed-off-by: Carl Flottmann <[email protected]> * explicitly add delay factor with kwargs Signed-off-by: Carl Flottmann <[email protected]> --------- Signed-off-by: Carl Flottmann <[email protected]> Signed-off-by: Priyansh Tratiya <[email protected]>
What is the motivation for this PR? This repo has enabled auto code reviewer assign according to the git history. The .github/CODEOWNERS file is now outdated and not necessary. Background of auto code reviewer assign It is tedious to manually maintain the CODEOWNERS file. @nikamirrr from Nvidia contributed auto code reviewer assign workflow: https://github.com/sonic-net/sonic-mgmt/actions/workflows/code-reviewer-tagging.yml This automation automatically check git history of files updated by PR. Find out the previous contributors of the touched files and assign them as reviewer of the PR. This automation has 2 major benefits: No need to manually maintain the CODEOWNERS file. Automatically find the most appropriate owners/reviewers. Not able to find appropriate owner of code has been a pain point of community for quite some time. For details, please refer to https://github.com/sonic-net/sonic-pipelines/tree/main/scripts/code-owners How did you do it? This change deleted this .github/CODEOWNERS file. Signed-off-by: Xin Wang <[email protected]> Signed-off-by: Priyansh Tratiya <[email protected]>
Signed-off-by: Dashuai Zhang <[email protected]> Signed-off-by: Priyansh Tratiya <[email protected]>
* ptf dataplane cleaners for in between test runs Signed-off-by: Priyansh Tratiya <[email protected]>
Description of PR Summary: Fixes test_qos_sai failures due to unexpected ipv6 packets coming from fanout device Type of change Bug fix Testbed and Framework(new/improvement) New Test case Skipped for non-supported platforms Test case improvement Approach What is the motivation for this PR? Disable ipv6 for broadcom SONiC fanout devices. How did you do it? Add broadcom into the list. How did you verify/test it? physical testbed. Signed-off-by: Dashuai Zhang <[email protected]> Signed-off-by: Priyansh Tratiya <[email protected]>
Description of PR Summary: Fix the following log analysis error when running generic_config_updater/test_bgp_prefix.py. it's due to the BGP template used for isolated topo is different. 2026 Jan 18 05:17:43.204062 xxxx ERR bgp#bgpcfgd: BGPAllowListMgr::Default action community value is not found. route-map 'ALLOW_LIST_DEPLOYMENT_ID_0_V4' entry. seq_no=65535 same test case has been skipped for 202412 in sonic-net#18612 skip it for 202505 for isolated topo Type of change Bug fix Testbed and Framework(new/improvement) New Test case Skipped for non-supported platforms Test case improvement Approach What is the motivation for this PR? Skip unsupported feature How did you do it? mark condition How did you verify/test it? local testbed Signed-off-by: Dashuai Zhang <[email protected]> Signed-off-by: Priyansh Tratiya <[email protected]>
…sonic-net#20410) Description of PR Summary: Fixes # (issue) This PR fixes issues in 'test_iface_namingmode.py' for "TestShowQueue#test_show_queue_counters" tests. The above test is failing on multi-asic devices, while retrieving 'interfaces' from 'buffer_queue_keys' values with the following error. > assert (re.search(QUEUE_COUNTERS_RE_FMT.format(alias), queue_counter) is not None) \ and (re.search(QUEUE_COUNTERS_RE_FMT.format(setup['port_alias_map'][alias]), queue_counter) is None) E AssertionError alias = 'TestAlias21' Signed-off-by: Priyansh Tratiya <[email protected]>
Signed-off-by: Priyansh Tratiya <[email protected]>
0436abb to
b589462
Compare
Collaborator
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description of PR
Summary:
Fixes # (issue)
This test plan update is to add a test case to see if control/data plane can handle the flapping of numerous admin BGP session holding a lot routes, and estimate the impact on it.
We also introduce route programming timing measurement.
Related PRs:
Type of change
Back port request
Approach
What is the motivation for this PR?
Update the test plan documents with methodology and details of the new test plan.
How did you do it?
How did you verify/test it?
Any platform specific information?
Ran the tests on t0-isolated-d2u510s2 topology
Supported testbed topology if it's a new test case?
Documentation