[crm] Error handler for 'ip neigh show' command#828
Conversation
108c52c to
c454c33
Compare
There was a problem hiding this comment.
I thinks this will never execute because grep will fail in previous task if didn't find anything.
Could you add an error message for this case?
Or better to ignore grep failure in previous task and fail here.
There was a problem hiding this comment.
I split Get Hext Hop IPv6 into two steps:
shell: ip -6 neigh show dev {{crm_intf}} nud reachable nud stale
shell: echo {{ out.stdout }} | grep -v fe80
Now fail: msg="Get Next Hop IP failed. Neighbour not found" definitely means empty grep result (and not ip -6 neigh show dev failure)
a639a6d to
01dfa66
Compare
01dfa66 to
7ccc42d
Compare
Signed-off-by: Roman Kachur <[email protected]>
7ccc42d to
c63ced4
Compare
|
@romankachur-mlnx , if we encounter this, it means the CRM test failed because there is no available neighbor while adding a route. However this is not exactly a test failure. I'm not sure if we must sleep and retry, since this would be a transient state that neighbor entry expired. |
prsunny
left a comment
There was a problem hiding this comment.
Please check the comment!
@prsunny This PR aims to make the failure more informative, i.e. to show the relevant reason of failure. And not to fix or workaround the issue. |
…for IPv6 (sonic-net#828) <!-- Please make sure you've read and understood our contributing guidelines; https://github.com/sonic-net/SONiC/blob/gh-pages/CONTRIBUTING.md Please provide following information to help code review process a bit easier: --> ### Description of PR <!-- - Please include a summary of the change and which issue is fixed. - Please also include relevant motivation and context. Where should reviewer start? background context? - List any dependencies that are required for this change. --> Summary: Fix the following error on t0-isolated-d96u32s2 topo. ``` arp/test_arp_extended.py::test_proxy_arp[v6-str5-7060x6-moby-512-1-None] -------------------------------- live log call --------------------------------- 03/10/2025 02:10:49 __init__.pytest_runtest_call L0040 ERROR | Traceback (most recent call last): File "/usr/local/lib/python3.8/dist-packages/_pytest/python.py", line 1788, in runtest self.ihook.pytest_pyfunc_call(pyfuncitem=self) File "/usr/local/lib/python3.8/dist-packages/pluggy/_hooks.py", line 513, in __call__ return self._hookexec(self.name, self._hookimpls.copy(), kwargs, firstresult) File "/usr/local/lib/python3.8/dist-packages/pluggy/_manager.py", line 120, in _hookexec return self._inner_hookexec(hook_name, methods, kwargs, firstresult) File "/usr/local/lib/python3.8/dist-packages/pluggy/_callers.py", line 139, in _multicall raise exception.with_traceback(exception.__traceback__) File "/usr/local/lib/python3.8/dist-packages/pluggy/_callers.py", line 103, in _multicall res = hook_impl.function(*args) File "/usr/local/lib/python3.8/dist-packages/_pytest/python.py", line 194, in pytest_pyfunc_call result = testfunction(**testargs) File "/var/src/sonic-mgmt_vms91-t0-7060x6-moby-512-1/tests/arp/test_arp_extended.py", line 98, in test_proxy_arp testutils.verify_packet(ptfadapter, expected_packet, ptf_intf_index, timeout=10) File "/usr/local/lib/python3.8/dist-packages/ptf/testutils.py", line 3250, in verify_packet test.fail( File "/usr/lib/python3.8/unittest/case.py", line 753, in fail raise self.failureException(msg) AssertionError: Expected packet was not received on device 0, port 32. ``` When ndppd is starting up, it will read the whole ipv6 routes via `/proc/net/ipv6_route`. For topology with large ipv6 routes and next hops, this reading may take long time to finish. ### Type of change <!-- - Fill x for your type of change. - e.g. - [x] Bug fix --> - [ ] Bug fix - [ ] Testbed and Framework(new/improvement) - [ ] New Test case - [ ] Skipped for non-supported platforms - [ ] Test case improvement ### Back port request - [ ] 202205 - [ ] 202305 - [ ] 202311 - [ ] 202405 - [ ] 202411 - [x] 202505 ### Approach #### What is the motivation for this PR? Fix test_arp_extended.py::test_proxy_arp failure for large topo. #### How did you do it? Add an extra delay in test case by reading `/proc/net/ipv6_route` to match up with the behavior in ndppd. #### How did you verify/test it? physical testbed. #### Any platform specific information? #### Supported testbed topology if it's a new test case? ### Documentation <!-- (If it's a new feature, new test case) Did you update documentation/Wiki relevant to your implementation? Link to the wiki page? -->
e438b0db6a8912b50f7acddf93d4dc2157f53ecf (HEAD -> 201911, origin/201911) Increase Syncd operation timeout from 1 min to 6 min. (sonic-net#828) 17974adb369111b44dd56837547806918ed4b1ed Update syncd_flex_counter.cpp (sonic-net#798) Signed-off-by: Abhishek Dosi <[email protected]>
Signed-off-by: Roman Kachur [email protected]
Description of PR
Summary: Make CRM test more informative when failed
Fixes # (issue)
Type of change
Approach
How did you do it?
When Ansible performs ip -4 neigh show, it presumes that the returned value is always valid,
and manipulates on it without validation.
If the returned value is empty, it results in ambiguous error value like
"ERROR! list object has no element 0"
I added conditional fail operator, to make the failure more informative:
How did you verify/test it?
Run CRM test with both: all neighbors are Reachable, and some neighbours are Failed.
Now when the command fails, it returns more informative error:
"msg": "Get Next Hop IP failed. Neighbor not found"Instead of previous:
"ERROR! list object has no element 0"Any platform specific information?
Supported testbed topology if it's a new test case?
Documentation