[arp_update]: Fix IPv6 neighbor race condition#15583
Merged
prsunny merged 3 commits intosonic-net:masterfrom Jun 30, 2023
Merged
[arp_update]: Fix IPv6 neighbor race condition#15583prsunny merged 3 commits intosonic-net:masterfrom
prsunny merged 3 commits intosonic-net:masterfrom
Conversation
Signed-off-by: Lawrence Lee <[email protected]>
Signed-off-by: Lawrence Lee <[email protected]>
Contributor
|
@theasianpianist , can you please add Requestfor labels and remove the Approvedfor label. Approvedfor shall be done by branch owners |
prsunny
reviewed
Jun 26, 2023
| if [[ ! -z "$failed_kernel_neighbors" ]]; then | ||
| neigh_replace_template="sed -e 's/^/ip neigh replace /' -e 's/,/ dev /' -e 's/$/ nud incomplete;/'" | ||
| ip_neigh_replace_cmd="echo \"$failed_kernel_neighbors\" | cut -d ' ' -f 1,3 --output-delimiter=',' | $neigh_replace_template" | ||
| eval `eval "$ip_neigh_replace_cmd"` |
Contributor
There was a problem hiding this comment.
can you please test this with traffic which can resolve a neighbor during a ping and another where neighbor cannot be resolved?
Contributor
Author
There was a problem hiding this comment.
Tested with one resolvable IP, one unresolvable IP, and one unresolvable IP where I manually deleted the APPL_DB entry. The script behaved as expected - the deleted IP was flushed, all three IPs were pinged, and the final state of the IPs was REACHABLE (for the resolvable IP) and INCOMPLETE for the other two.
Signed-off-by: Lawrence Lee <[email protected]>
prsunny
approved these changes
Jun 30, 2023
mssonicbld
pushed a commit
to mssonicbld/sonic-buildimage
that referenced
this pull request
Jun 30, 2023
* [arp_update]: Fix IPv6 neighbor race condition on dualtors Signed-off-by: Lawrence Lee <[email protected]>
Collaborator
|
Cherry-pick PR to 202211: #15693 |
mssonicbld
pushed a commit
to mssonicbld/sonic-buildimage
that referenced
this pull request
Jun 30, 2023
* [arp_update]: Fix IPv6 neighbor race condition on dualtors Signed-off-by: Lawrence Lee <[email protected]>
Collaborator
|
Cherry-pick PR to 202205: #15694 |
This was referenced Jun 30, 2023
mssonicbld
added a commit
that referenced
this pull request
Jul 1, 2023
mssonicbld
added a commit
that referenced
this pull request
Jul 1, 2023
mssonicbld
pushed a commit
to mssonicbld/sonic-buildimage
that referenced
this pull request
Jul 17, 2023
* [arp_update]: Fix IPv6 neighbor race condition on dualtors Signed-off-by: Lawrence Lee <[email protected]>
Collaborator
|
Cherry-pick PR to 202305: #15877 |
11 tasks
mssonicbld
added a commit
that referenced
this pull request
Jul 19, 2023
sonic-otn
pushed a commit
to sonic-otn/sonic-buildimage
that referenced
this pull request
Sep 20, 2023
* [arp_update]: Fix IPv6 neighbor race condition on dualtors Signed-off-by: Lawrence Lee <[email protected]>
lixiaoyuner
pushed a commit
to lixiaoyuner/sonic-buildimage
that referenced
this pull request
Feb 6, 2024
Merge code from master to internal Related work items: sonic-net#32, sonic-net#49, sonic-net#376, sonic-net#2598, sonic-net#11862, sonic-net#12530, sonic-net#14000, sonic-net#14547, sonic-net#14549, sonic-net#14814, sonic-net#15077, sonic-net#15239, sonic-net#15252, sonic-net#15253, sonic-net#15298, sonic-net#15357, sonic-net#15384, sonic-net#15394, sonic-net#15399, sonic-net#15405, sonic-net#15511, sonic-net#15566, sonic-net#15583, sonic-net#15591, sonic-net#15592, sonic-net#15593, sonic-net#15602, sonic-net#15604, sonic-net#15611, sonic-net#15621, sonic-net#15625, sonic-net#15634, sonic-net#15635, sonic-net#15645, sonic-net#15646, sonic-net#15647, sonic-net#15657, sonic-net#15658, sonic-net#15697, sonic-net#15699
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why I did it
A race condition exists which makes it possible for the kernel to resolve a trapped packet's destination IP at the same time that arp_update is running the
ip neigh replacecommand for that neighbor IP. When this occurs, the kernel's neighbor entry for this IP is in the INCOMPLETE state, and theip neigh replacecommand sets it to permanently incomplete. This means no netlink message will be generated for this neighbor, since the kernel doesn't generate netlink messages for INCOMPLETE neighbors (it would only generate a message once the neighbor transitions to FAILED, which doesn't happen due to theip neigh replacecommand). As a result, no APPL_DB neighbor table entry is ever created and no tunnel route for the IP is ever installed, leading to dropped traffic.Work item tracking
How I did it
pinging the neighbor IPs, wait for any neighbor entries which might be transiently INCOMPLETE to transition to FAILED (so that the subsequentip neigh replacecommand can set them to permanently incomplete)ip neigh replacecommand in case they are new neighbors for which no netlink message has been generated yetHow to verify it
Run
arp_updatewith no FAILED or INCOMPLETE neighbor entries, verify no changes are made to the kernel neighbor tableRun
arp_updatewith a FAILED neighbor entry with corresponding APPL_DB entry, verify the neighbor IP is pinged and set to INCOMPLETE permanentlyRun
arp_updatewith an INCOMPLETE neighbor entry with corresponding APPL_DB entry, verify the neighbor IP is pinged and set to INCOMPLETE permanentlyRun
arp_updatewith a FAILED neighbor entry without corresponding APPL_DB entry, verify the neighbor is flushed, pinged, and set to INCOMPLETE permanentlyRun
arp_updatewith an INCOMPLETE neighbor entry without corresponding APPL_DB entry, verify the neighbor is flushed, pinged, and set to INCOMPLETE permanentlyRun
arp_updatewith various combinations of above scenarios - verify that only neighbors missing APPL_DB entries are flushed from the kernel; verify that all FAILED/INCOMPLETE neighbors are pinged and set to permanently INCOMPLETEWhich release branch to backport (provide reason below if selected)
Tested branch (Please provide the tested image version)
Description for the changelog
Link to config_db schema for YANG module changes
A picture of a cute animal (not mandatory but encouraged)