test_snmp_queue_counters.py/test_telemetry.py config_reload and snmpwwalk output time delay fix, test_snmp_queue_counters.py multi-asic KeyError fix#15688
test_snmp_queue_counters.py/test_telemetry.py config_reload and snmpwwalk output time delay fix, test_snmp_queue_counters.py multi-asic KeyError fix#15688yejianquan merged 3 commits intosonic-net:masterfrom harjotsinghpawra:snmp_queue_counters_fix
Conversation
…alk output time delay fix, test_snmp_queue_counters.py multi-asic KeyError fix
|
The pre-commit check detected issues in the files touched by this pull request. Detailed pre-commit check results: To run the pre-commit checks locally, you can follow below steps:
|
|
Hi @abdosi @yejianquan - Kindly help review and merge |
|
@harjotsinghpawra PR conflicts with 202405 branch |
|
Hi @harjotsinghpawra , please resolve the conflict and create a separate PR |
…g_reload and snmpwwalk output time delay fix, test_snmp_queue_counters.py multi-asic KeyError fix (#15735) Description of PR Cherry pick PR #15688 for 202405 Resolve conflit issues - This PR also resolves following issues along with ones mentioned in above PR - Fixed multi-asic issues where right buffer queues were not selected asic namespace should not be checked in queue name co-authorized by: jianquanye@microsoft.com
|
This change caused some regressions. The error message is |
|
@harjotsinghpawra , @yejianquan Can you please help check the regression? |
Hi @harjotsinghpawra , please resolve the regression ASAP. |
|
Hi @yejianquan This is a a completely new issue. I fixed the other issue that day which was count mismatch. In all the testbeds and config files i used the Queue names were in format of "Ethernet128|1-2" or something like that there was a range at , i looked at the schema also the similar ranges were used but if you think there will be case where only one Bufffer Queue is present and we need to test that also i can make changes accordingly . But then interface naming standard has to consistent . Is there a guideline or doc for that , i am new to sonic so please let me know what valid strings can be in interface name otherwise it can be failed again in future . This would have failed in multi-asic as well . if interface name was picked where only one buffer is created ? |
Hi @harjotsinghpawra , ok and welcome to the SONiC world! Here's the doc https://github.com/sonic-net/SONiC/wiki/Configuration#buffer-queue |
|
@bingwang-ms the other failure you saw was fixed as part of this PR #16072 |
Github issue: https://github.com/aristanetworks/sonic-qual.msft/issues/371 The interface from where the queuestats fetched was different from the interface that was deleted from the BUFFER_QUEUE. This issue is seen after PR: sonic-net#15688
Description of PR The interface from where the queuestats fetched was different from the interface that was deleted from the BUFFER_QUEUE. Github issue: aristanetworks/sonic-qual.msft#371 This issue is seen after PR: #15688 The issue was that XML dump is below for context buffer_queue_to_del = 'Ethernet112|6' buffer_queues = ['Ethernet112|0-1', 'Ethernet112|2-4', 'Ethernet112|5', 'Ethernet112|6', 'Ethernet112|7', 'Ethernet116|0-1', ...] buffer_queues_removed = 1 interface = 'Ethernet68' When the string 'Ethernet112|6' when split with delimiter "|" the string in 1st index "6" is a substring of "Ethernet68" and it picked as a candidate to delete it from BQ, which is wrong. Summary: Fixes # aristanetworks/sonic-qual.msft#371
Description of PR The interface from where the queuestats fetched was different from the interface that was deleted from the BUFFER_QUEUE. Github issue: aristanetworks/sonic-qual.msft#371 This issue is seen after PR: sonic-net#15688 The issue was that XML dump is below for context buffer_queue_to_del = 'Ethernet112|6' buffer_queues = ['Ethernet112|0-1', 'Ethernet112|2-4', 'Ethernet112|5', 'Ethernet112|6', 'Ethernet112|7', 'Ethernet116|0-1', ...] buffer_queues_removed = 1 interface = 'Ethernet68' When the string 'Ethernet112|6' when split with delimiter "|" the string in 1st index "6" is a substring of "Ethernet68" and it picked as a candidate to delete it from BQ, which is wrong. Summary: Fixes # aristanetworks/sonic-qual.msft#371
Description of PR The interface from where the queuestats fetched was different from the interface that was deleted from the BUFFER_QUEUE. Github issue: aristanetworks/sonic-qual.msft#371 This issue is seen after PR: sonic-net#15688 The issue was that XML dump is below for context buffer_queue_to_del = 'Ethernet112|6' buffer_queues = ['Ethernet112|0-1', 'Ethernet112|2-4', 'Ethernet112|5', 'Ethernet112|6', 'Ethernet112|7', 'Ethernet116|0-1', ...] buffer_queues_removed = 1 interface = 'Ethernet68' When the string 'Ethernet112|6' when split with delimiter "|" the string in 1st index "6" is a substring of "Ethernet68" and it picked as a candidate to delete it from BQ, which is wrong. Summary: Fixes # aristanetworks/sonic-qual.msft#371
Description of PR The interface from where the queuestats fetched was different from the interface that was deleted from the BUFFER_QUEUE. Github issue: aristanetworks/sonic-qual.msft#371 This issue is seen after PR: #15688 The issue was that XML dump is below for context buffer_queue_to_del = 'Ethernet112|6' buffer_queues = ['Ethernet112|0-1', 'Ethernet112|2-4', 'Ethernet112|5', 'Ethernet112|6', 'Ethernet112|7', 'Ethernet116|0-1', ...] buffer_queues_removed = 1 interface = 'Ethernet68' When the string 'Ethernet112|6' when split with delimiter "|" the string in 1st index "6" is a substring of "Ethernet68" and it picked as a candidate to delete it from BQ, which is wrong. Summary: Fixes # aristanetworks/sonic-qual.msft#371
Description of PR The interface from where the queuestats fetched was different from the interface that was deleted from the BUFFER_QUEUE. Github issue: aristanetworks/sonic-qual.msft#371 This issue is seen after PR: #15688 The issue was that XML dump is below for context buffer_queue_to_del = 'Ethernet112|6' buffer_queues = ['Ethernet112|0-1', 'Ethernet112|2-4', 'Ethernet112|5', 'Ethernet112|6', 'Ethernet112|7', 'Ethernet116|0-1', ...] buffer_queues_removed = 1 interface = 'Ethernet68' When the string 'Ethernet112|6' when split with delimiter "|" the string in 1st index "6" is a substring of "Ethernet68" and it picked as a candidate to delete it from BQ, which is wrong. Summary: Fixes # aristanetworks/sonic-qual.msft#371
Description of PR The interface from where the queuestats fetched was different from the interface that was deleted from the BUFFER_QUEUE. Github issue: aristanetworks/sonic-qual.msft#371 This issue is seen after PR: sonic-net#15688 The issue was that XML dump is below for context buffer_queue_to_del = 'Ethernet112|6' buffer_queues = ['Ethernet112|0-1', 'Ethernet112|2-4', 'Ethernet112|5', 'Ethernet112|6', 'Ethernet112|7', 'Ethernet116|0-1', ...] buffer_queues_removed = 1 interface = 'Ethernet68' When the string 'Ethernet112|6' when split with delimiter "|" the string in 1st index "6" is a substring of "Ethernet68" and it picked as a candidate to delete it from BQ, which is wrong. Summary: Fixes # aristanetworks/sonic-qual.msft#371
Description of PR The interface from where the queuestats fetched was different from the interface that was deleted from the BUFFER_QUEUE. Github issue: aristanetworks/sonic-qual.msft#371 This issue is seen after PR: sonic-net#15688 The issue was that XML dump is below for context buffer_queue_to_del = 'Ethernet112|6' buffer_queues = ['Ethernet112|0-1', 'Ethernet112|2-4', 'Ethernet112|5', 'Ethernet112|6', 'Ethernet112|7', 'Ethernet116|0-1', ...] buffer_queues_removed = 1 interface = 'Ethernet68' When the string 'Ethernet112|6' when split with delimiter "|" the string in 1st index "6" is a substring of "Ethernet68" and it picked as a candidate to delete it from BQ, which is wrong. Summary: Fixes # aristanetworks/sonic-qual.msft#371


test_snmp_queue_counters.py/test_telemetry.py config_reload and snmpwalk output time delay fix, test_snmp_queue_counters.py multi-asic KeyError fix
Description of PR
Scripts:
test_snmp_queue_counters.py
test_telemetry
/////////////////////////////////////////////////
First Issue :
When we run these scripts sometimes based on the platform and image along with other factors it takes some time for ports to come up and buffer queues to be generated and then further Snmp OID or even gnmi info to be genrated .
In script we immediately try to snmpwalk after all docker are up . But interfaces are still not up so no oid is generated .
Snmpwalk says No Such Instance currently exists at this OID whihc script count as 1 counter being created when none is created, which causes test case to fail.
enum_rand_one_per_hwsku_frontend_hostname = 'mth64-m5-2'
get_bfr_queue_cntrs_cmd = 'docker exec snmp snmpwalk -v2c -c public 1.74.23.17 1.3.6.1.4.1.9.9.580.1.5.5.1.4.1'
hostip = '1.74.23.17'
multicast_expected_diff = 16
queue_counters_cnt_post = 1
queue_counters_cnt_pre = 1
unicast_expected_diff = 8
["docker exec snmp snmpwalk -v2c -c public 1.74.23.17 1.3.6.1.4.1.9.9.580.1.5.5.1.4.1"], kwargs={}
12:37:54 base._run L0108 �[35mDEBUG �[0m| /data/tests/common/devices/multi_asic.py::_run_on_asics#134: [mth64-m5-2] AnsibleModule::shell Result => {"changed": true, "stdout": "iso.3.6.1.4.1.9.9.580.1.5.5.1.4.1 = No Such Instance currently exists at this OID", "stderr": "", "rc": 0, "cmd": "docker exec snmp snmpwalk -v2c -c public 1.74.23.17 1.3.6.1.4.1.9.9.580.1.5.5.1.4.1", "start": "2024-08-28 12:37:55.343677", "end": "2024-08-28 12:37:55.452104", "delta": "0:00:00.108427", "msg": "", "invocation": {"module_args": {"_raw_params": "docker exec snmp snmpwalk -v2c -c public 1.74.23.17 1.3.6.1.4.1.9.9.580.1.5.5.1.4.1", "_uses_shell": true, "warn": false, "stdin_add_newline": true, "strip_empty_ends": true, "argv": null, "chdir": null, "executable": null, "creates": null, "removes": null, "stdin": null}}, "stdout_lines": ["iso.3.6.1.4.1.9.9.580.1.5.5.1.4.1 = No Such Instance currently exists at this OID"], "stderr_lines": [], "_ansible_no_log": null, "failed": false}
//////////////////////////////////////////////////
Second issue :
In test_snmp_queue_counters script in multi-asic case we choose a buffer_queue of first interface mentioned in BUFFER_QUEUE config and then we try to match that, also we search asic.namepace in queue name which is invalid check which causes buffer_queue_to_del to be None.
This in turn fails the test case by saying that KeyError: None when we try to delete buffer
result = testfunction(**testargs)
File "/var/src/sonic-mgmt/tests/snmp/test_snmp_queue_counters.py", line 123, in test_snmp_queue_counters
del data['BUFFER_QUEUE'][buffer_queue_to_del]
KeyError: None
Summary:
Fixes #15683 and #15686
Type of change
Test script delays and condition fix mentioned in How did you do it?
Back port request
Approach
What is the motivation for this PR?
How did you do it?
1.) added necessary checks so that all the interfaces are up and oid's are generated only then take command output.
2.) changed wrong logic of multi asic buffer queue selection and alsoimproved it to work for both single and multi-asic system.
3.) Also added extra check where i match the OID's of counters generated by snmp with queuestat output because they should match queuestat gives the latest information.
How did you verify/test it?
Ran it on local CISCO platforms and its passing
Any platform specific information?
Supported testbed topology if it's a new test case?
Documentation