Skip to content

test_snmp_queue_counters.py/test_telemetry.py config_reload and snmpwwalk output time delay fix, test_snmp_queue_counters.py multi-asic KeyError fix#15688

Merged
yejianquan merged 3 commits intosonic-net:masterfrom
harjotsinghpawra:snmp_queue_counters_fix
Nov 22, 2024
Merged

test_snmp_queue_counters.py/test_telemetry.py config_reload and snmpwwalk output time delay fix, test_snmp_queue_counters.py multi-asic KeyError fix#15688
yejianquan merged 3 commits intosonic-net:masterfrom
harjotsinghpawra:snmp_queue_counters_fix

Conversation

@harjotsinghpawra
Copy link
Copy Markdown
Contributor

@harjotsinghpawra harjotsinghpawra commented Nov 21, 2024

test_snmp_queue_counters.py/test_telemetry.py config_reload and snmpwalk output time delay fix, test_snmp_queue_counters.py multi-asic KeyError fix

Description of PR

Scripts:
test_snmp_queue_counters.py
test_telemetry

/////////////////////////////////////////////////
First Issue :
When we run these scripts sometimes based on the platform and image along with other factors it takes some time for ports to come up and buffer queues to be generated and then further Snmp OID or even gnmi info to be genrated .

In script we immediately try to snmpwalk after all docker are up . But interfaces are still not up so no oid is generated .
Snmpwalk says No Such Instance currently exists at this OID whihc script count as 1 counter being created when none is created, which causes test case to fail.

enum_rand_one_per_hwsku_frontend_hostname = 'mth64-m5-2'
get_bfr_queue_cntrs_cmd = 'docker exec snmp snmpwalk -v2c -c public 1.74.23.17 1.3.6.1.4.1.9.9.580.1.5.5.1.4.1'
hostip = '1.74.23.17'
multicast_expected_diff = 16
queue_counters_cnt_post = 1
queue_counters_cnt_pre = 1
unicast_expected_diff = 8

["docker exec snmp snmpwalk -v2c -c public 1.74.23.17 1.3.6.1.4.1.9.9.580.1.5.5.1.4.1"], kwargs={}
12:37:54 base._run L0108 �[35mDEBUG �[0m| /data/tests/common/devices/multi_asic.py::_run_on_asics#134: [mth64-m5-2] AnsibleModule::shell Result => {"changed": true, "stdout": "iso.3.6.1.4.1.9.9.580.1.5.5.1.4.1 = No Such Instance currently exists at this OID", "stderr": "", "rc": 0, "cmd": "docker exec snmp snmpwalk -v2c -c public 1.74.23.17 1.3.6.1.4.1.9.9.580.1.5.5.1.4.1", "start": "2024-08-28 12:37:55.343677", "end": "2024-08-28 12:37:55.452104", "delta": "0:00:00.108427", "msg": "", "invocation": {"module_args": {"_raw_params": "docker exec snmp snmpwalk -v2c -c public 1.74.23.17 1.3.6.1.4.1.9.9.580.1.5.5.1.4.1", "_uses_shell": true, "warn": false, "stdin_add_newline": true, "strip_empty_ends": true, "argv": null, "chdir": null, "executable": null, "creates": null, "removes": null, "stdin": null}}, "stdout_lines": ["iso.3.6.1.4.1.9.9.580.1.5.5.1.4.1 = No Such Instance currently exists at this OID"], "stderr_lines": [], "_ansible_no_log": null, "failed": false}

//////////////////////////////////////////////////
Second issue :
In test_snmp_queue_counters script in multi-asic case we choose a buffer_queue of first interface mentioned in BUFFER_QUEUE config and then we try to match that, also we search asic.namepace in queue name which is invalid check which causes buffer_queue_to_del to be None.

This in turn fails the test case by saying that KeyError: None when we try to delete buffer
result = testfunction(**testargs)
File "/var/src/sonic-mgmt/tests/snmp/test_snmp_queue_counters.py", line 123, in test_snmp_queue_counters
del data['BUFFER_QUEUE'][buffer_queue_to_del]
KeyError: None

Summary:
Fixes #15683 and #15686

Type of change

Test script delays and condition fix mentioned in How did you do it?

  • Bug fix
  • Testbed and Framework(new/improvement)
  • Test case(new/improvement)

Back port request

  • 202012
  • 202205
  • 202305
  • 202311
  • 202405

Approach

What is the motivation for this PR?

How did you do it?

1.) added necessary checks so that all the interfaces are up and oid's are generated only then take command output.
2.) changed wrong logic of multi asic buffer queue selection and alsoimproved it to work for both single and multi-asic system.
3.) Also added extra check where i match the OID's of counters generated by snmp with queuestat output because they should match queuestat gives the latest information.

How did you verify/test it?

Ran it on local CISCO platforms and its passing

Any platform specific information?

Supported testbed topology if it's a new test case?

Documentation

…alk output time delay fix, test_snmp_queue_counters.py multi-asic KeyError fix
@mssonicbld
Copy link
Copy Markdown
Collaborator

The pre-commit check detected issues in the files touched by this pull request.
The pre-commit check is a mandatory check, please fix detected issues.

Detailed pre-commit check results:
trim trailing whitespace.................................................Passed
fix end of files.........................................................Passed
check yaml...........................................(no files to check)Skipped
check for added large files..............................................Passed
check python ast.........................................................Passed
flake8...................................................................Failed
- hook id: flake8
- exit code: 1

tests/snmp/test_snmp_queue_counters.py:17:1: E302 expected 2 blank lines, found 1
tests/snmp/test_snmp_queue_counters.py:19:71: E203 whitespace before ','
tests/snmp/test_snmp_queue_counters.py:21:1: E302 expected 2 blank lines, found 1
tests/snmp/test_snmp_queue_counters.py:24:1: E302 expected 2 blank lines, found 1
tests/snmp/test_snmp_queue_counters.py:31:1: E302 expected 2 blank lines, found 1
tests/snmp/test_snmp_queue_counters.py:90:25: E222 multiple spaces after operator
tests/snmp/test_snmp_queue_counters.py:129:23: E127 continuation line over-indented for visual indent
tests/snmp/test_snmp_queue_counters.py:139:23: E127 continuation line over-indented for visual indent
tests/snmp/test_snmp_queue_counters.py:143:5: E303 too many blank lines (2)
tests/telemetry/test_telemetry.py:54:1: E302 expected 2 blank lines, found 1
tests/telemetry/test_telemetry.py:61:1: E302 expected 2 blank lines, found 1
...
[truncated extra lines, please run pre-commit locally to view full check results]

To run the pre-commit checks locally, you can follow below steps:

  1. Ensure that default python is python3. In sonic-mgmt docker container, default python is python2. You can run
    the check by activating the python3 virtual environment in sonic-mgmt docker container or outside of sonic-mgmt
    docker container.
  2. Ensure that the pre-commit package is installed:
sudo pip install pre-commit
  1. Go to repository root folder
  2. Install the pre-commit hooks:
pre-commit install
  1. Use pre-commit to check staged file:
pre-commit
  1. Alternatively, you can check committed files using:
pre-commit run --from-ref <commit_id> --to-ref <commit_id>

@harjotsinghpawra harjotsinghpawra changed the title test_snmp_queue_counters.py/test_telemetry.py config_reload and snmpw… test_snmp_queue_counters.py/test_telemetry.py config_reload and snmpwwalk output time delay fix, test_snmp_queue_counters.py multi-asic KeyError fix Nov 22, 2024
@harjotsinghpawra harjotsinghpawra marked this pull request as ready for review November 22, 2024 00:46
@vperumal
Copy link
Copy Markdown
Collaborator

Hi @abdosi @yejianquan - Kindly help review and merge

@vperumal
Copy link
Copy Markdown
Collaborator

This help fix - Fixes #15683 and #15686

Copy link
Copy Markdown
Collaborator

@yejianquan yejianquan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mssonicbld
Copy link
Copy Markdown
Collaborator

@harjotsinghpawra PR conflicts with 202405 branch

@yejianquan
Copy link
Copy Markdown
Collaborator

Hi @harjotsinghpawra , please resolve the conflict and create a separate PR

yejianquan pushed a commit that referenced this pull request Nov 27, 2024
…g_reload and snmpwwalk output time delay fix, test_snmp_queue_counters.py multi-asic KeyError fix (#15735)

Description of PR
Cherry pick PR #15688 for 202405
Resolve conflit issues -

This PR also resolves following issues along with ones mentioned in above PR -
Fixed multi-asic issues where right buffer queues were not selected
asic namespace should not be checked in queue name

co-authorized by: jianquanye@microsoft.com
@bingwang-ms
Copy link
Copy Markdown
Collaborator

This change caused some regressions.
The code below is not working if the selected port and queue name doesn't have -, such as Ethernet136|5.

buffer_queues_removed = int(range_str.split('-')[1]) - int(range_str.split('-')[0]) + 1

The error message is

            range_str = str(buffer_queue_to_del.split('|')[-1])
>           buffer_queues_removed = int(range_str.split('-')[1]) - int(range_str.split('-')[0]) + 1
E           IndexError: list index out of range

@bingwang-ms
Copy link
Copy Markdown
Collaborator

@harjotsinghpawra , @yejianquan Can you please help check the regression?

@yejianquan
Copy link
Copy Markdown
Collaborator

@harjotsinghpawra , @yejianquan Can you please help check the regression?

Hi @harjotsinghpawra , please resolve the regression ASAP.
I remember you confirmed it works on single asic systems.
Why we hit this regression?
#16072
image

@harjotsinghpawra
Copy link
Copy Markdown
Contributor Author

Hi @yejianquan

This is a a completely new issue. I fixed the other issue that day which was count mismatch. In all the testbeds and config files i used the Queue names were in format of "Ethernet128|1-2" or something like that there was a range at , i looked at the schema also the similar ranges were used but if you think there will be case where only one Bufffer Queue is present and we need to test that also i can make changes accordingly . But then interface naming standard has to consistent . Is there a guideline or doc for that , i am new to sonic so please let me know what valid strings can be in interface name otherwise it can be failed again in future .

This would have failed in multi-asic as well . if interface name was picked where only one buffer is created ?

@yejianquan
Copy link
Copy Markdown
Collaborator

Hi @yejianquan

This is a a completely new issue. I fixed the other issue that day which was count mismatch. In all the testbeds and config files i used the Queue names were in format of "Ethernet128|1-2" or something like that there was a range at , i looked at the schema also the similar ranges were used but if you think there will be case where only one Bufffer Queue is present and we need to test that also i can make changes accordingly . But then interface naming standard has to consistent . Is there a guideline or doc for that , i am new to sonic so please let me know what valid strings can be in interface name otherwise it can be failed again in future .

This would have failed in multi-asic as well . if interface name was picked where only one buffer is created ?

Hi @harjotsinghpawra , ok and welcome to the SONiC world!

Here's the doc https://github.com/sonic-net/SONiC/wiki/Configuration#buffer-queue
And you code is not compatible on this configuration:
{
"Ethernet188|6": {
"profile": "egress_lossless_profile"
},
"Ethernet188|7": {
"profile": "egress_lossy_profile"
}
}
I'm not sure whether we have more patterns in use, but I think you can make it compatible on '0-1' and '3' formats?
That would be easy and we can resolve the regression.
@bingwang-ms please suggest if you have more advices.

@bingwang-ms
Copy link
Copy Markdown
Collaborator

Currently we don't have more patterns. Only below two patterns

  • Ethernet188|6
  • Ethernet188|0-1

When debugging the test, I also saw some other failures. The error message is as below. Not sure if it's caused by this change.

image

@harjotsinghpawra
Copy link
Copy Markdown
Contributor Author

@bingwang-ms the other failure you saw was fixed as part of this PR #16072
@yejianquan
Also i have raised the PR for the range failure we are seeing
#16139

raaghavendrakra-arista added a commit to raaghavendrakra-arista/sonic-mgmt that referenced this pull request Jan 27, 2025
Github  issue: https://github.com/aristanetworks/sonic-qual.msft/issues/371
The interface from  where the queuestats fetched was different from the
interface that was deleted from the BUFFER_QUEUE.

This issue is seen after PR: sonic-net#15688
StormLiangMS pushed a commit that referenced this pull request Feb 17, 2025
Description of PR
The interface from where the queuestats fetched was different from the interface that was deleted from the BUFFER_QUEUE.

Github issue: aristanetworks/sonic-qual.msft#371
This issue is seen after PR: #15688

The issue was that
XML dump is below for context

buffer_queue_to_del = 'Ethernet112|6'
buffer_queues = ['Ethernet112|0-1', 'Ethernet112|2-4', 'Ethernet112|5', 'Ethernet112|6', 'Ethernet112|7', 'Ethernet116|0-1', ...]
buffer_queues_removed = 1
interface  = 'Ethernet68'
When the string 'Ethernet112|6' when split with delimiter "|" the string in 1st index "6" is a substring of "Ethernet68" and it picked as a candidate to delete it from BQ, which is wrong.

Summary:
Fixes # aristanetworks/sonic-qual.msft#371
mssonicbld pushed a commit to mssonicbld/sonic-mgmt that referenced this pull request Feb 17, 2025
Description of PR
The interface from where the queuestats fetched was different from the interface that was deleted from the BUFFER_QUEUE.

Github issue: aristanetworks/sonic-qual.msft#371
This issue is seen after PR: sonic-net#15688

The issue was that
XML dump is below for context

buffer_queue_to_del = 'Ethernet112|6'
buffer_queues = ['Ethernet112|0-1', 'Ethernet112|2-4', 'Ethernet112|5', 'Ethernet112|6', 'Ethernet112|7', 'Ethernet116|0-1', ...]
buffer_queues_removed = 1
interface  = 'Ethernet68'
When the string 'Ethernet112|6' when split with delimiter "|" the string in 1st index "6" is a substring of "Ethernet68" and it picked as a candidate to delete it from BQ, which is wrong.

Summary:
Fixes # aristanetworks/sonic-qual.msft#371
mssonicbld pushed a commit to mssonicbld/sonic-mgmt that referenced this pull request Feb 17, 2025
Description of PR
The interface from where the queuestats fetched was different from the interface that was deleted from the BUFFER_QUEUE.

Github issue: aristanetworks/sonic-qual.msft#371
This issue is seen after PR: sonic-net#15688

The issue was that
XML dump is below for context

buffer_queue_to_del = 'Ethernet112|6'
buffer_queues = ['Ethernet112|0-1', 'Ethernet112|2-4', 'Ethernet112|5', 'Ethernet112|6', 'Ethernet112|7', 'Ethernet116|0-1', ...]
buffer_queues_removed = 1
interface  = 'Ethernet68'
When the string 'Ethernet112|6' when split with delimiter "|" the string in 1st index "6" is a substring of "Ethernet68" and it picked as a candidate to delete it from BQ, which is wrong.

Summary:
Fixes # aristanetworks/sonic-qual.msft#371
mssonicbld pushed a commit that referenced this pull request Feb 17, 2025
Description of PR
The interface from where the queuestats fetched was different from the interface that was deleted from the BUFFER_QUEUE.

Github issue: aristanetworks/sonic-qual.msft#371
This issue is seen after PR: #15688

The issue was that
XML dump is below for context

buffer_queue_to_del = 'Ethernet112|6'
buffer_queues = ['Ethernet112|0-1', 'Ethernet112|2-4', 'Ethernet112|5', 'Ethernet112|6', 'Ethernet112|7', 'Ethernet116|0-1', ...]
buffer_queues_removed = 1
interface  = 'Ethernet68'
When the string 'Ethernet112|6' when split with delimiter "|" the string in 1st index "6" is a substring of "Ethernet68" and it picked as a candidate to delete it from BQ, which is wrong.

Summary:
Fixes # aristanetworks/sonic-qual.msft#371
mssonicbld pushed a commit that referenced this pull request Feb 17, 2025
Description of PR
The interface from where the queuestats fetched was different from the interface that was deleted from the BUFFER_QUEUE.

Github issue: aristanetworks/sonic-qual.msft#371
This issue is seen after PR: #15688

The issue was that
XML dump is below for context

buffer_queue_to_del = 'Ethernet112|6'
buffer_queues = ['Ethernet112|0-1', 'Ethernet112|2-4', 'Ethernet112|5', 'Ethernet112|6', 'Ethernet112|7', 'Ethernet116|0-1', ...]
buffer_queues_removed = 1
interface  = 'Ethernet68'
When the string 'Ethernet112|6' when split with delimiter "|" the string in 1st index "6" is a substring of "Ethernet68" and it picked as a candidate to delete it from BQ, which is wrong.

Summary:
Fixes # aristanetworks/sonic-qual.msft#371
wangxin pushed a commit to wangxin/sonic-mgmt that referenced this pull request Feb 21, 2025
Description of PR
The interface from where the queuestats fetched was different from the interface that was deleted from the BUFFER_QUEUE.

Github issue: aristanetworks/sonic-qual.msft#371
This issue is seen after PR: sonic-net#15688

The issue was that
XML dump is below for context

buffer_queue_to_del = 'Ethernet112|6'
buffer_queues = ['Ethernet112|0-1', 'Ethernet112|2-4', 'Ethernet112|5', 'Ethernet112|6', 'Ethernet112|7', 'Ethernet116|0-1', ...]
buffer_queues_removed = 1
interface  = 'Ethernet68'
When the string 'Ethernet112|6' when split with delimiter "|" the string in 1st index "6" is a substring of "Ethernet68" and it picked as a candidate to delete it from BQ, which is wrong.

Summary:
Fixes # aristanetworks/sonic-qual.msft#371
nnelluri-cisco pushed a commit to nnelluri-cisco/sonic-mgmt that referenced this pull request Mar 15, 2025
Description of PR
The interface from where the queuestats fetched was different from the interface that was deleted from the BUFFER_QUEUE.

Github issue: aristanetworks/sonic-qual.msft#371
This issue is seen after PR: sonic-net#15688

The issue was that
XML dump is below for context

buffer_queue_to_del = 'Ethernet112|6'
buffer_queues = ['Ethernet112|0-1', 'Ethernet112|2-4', 'Ethernet112|5', 'Ethernet112|6', 'Ethernet112|7', 'Ethernet116|0-1', ...]
buffer_queues_removed = 1
interface  = 'Ethernet68'
When the string 'Ethernet112|6' when split with delimiter "|" the string in 1st index "6" is a substring of "Ethernet68" and it picked as a candidate to delete it from BQ, which is wrong.

Summary:
Fixes # aristanetworks/sonic-qual.msft#371
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: test/test_snmp_queue_counters.py and telemetry/test_telemetry.py issue of counter mismatch because of no proper delay after config-reload

5 participants