[routesync] Stale neighbor fix (PR #2553) review comment fix#3007
[routesync] Stale neighbor fix (PR #2553) review comment fix#3007vganesan-nokia wants to merge 1 commit intosonic-net:masterfrom
Conversation
Changes done to stale neighbor fix PR#2553 (sonic-net#2553) Changes include adding check to delete the stale neighbor from ASIC_DB and APPL_DB only if the kernel command is RTM_DELROUTE Signed-off-by: vedganes <[email protected]>
| // But still we need to clear the route from the APPL_DB. Otherwise the APPL_DB and data | ||
| // path will be left with stale route entry | ||
| if(alsv.size() == 1) | ||
| if((nlmsg_type == RTM_DELROUTE) && (alsv.size() == 1)) |
There was a problem hiding this comment.
@vganesan-nokia , bit confused with the comments above this line. I thought the whole idea was for a route which had previously some nexthops is now changed to only have 'eth0' in which case we have to delete it from ASIC even if its a RTM_ADD command. So with this, how do we handle that?
There was a problem hiding this comment.
@prsunny, before PR #2553, when delete route update comes with only next hop on eth0/docker0, the update will not be sent to APPL_DB and ASIC_DB. This resulted in out-of-sync route table situation between kernel and ASIC_DB. The idea of PR #2553 is to avoid leaving these routes un-deleted in APPL_DB and ASIC_DB when the "delete" route update comes with all the next hops removed except next hop on eth0/docker0. For "add" route updates, due to timing/sequence of next hop updates, we may get a valid route update with a next hop on interfaces other than eth0/docker0 which will be sent to APPL_DB and ASIC_DB. Later if we receive add route update with next hop on eth0/docker0 (only one next hop), the changes in PR #2553 wrongly delets the already programmed valid route - as commented by @peter-yu. This is fixed by doing the delete only for delete route updates.
|
Due to PR #2553, when route with muti nexthops which not include eth0, frr has feture which when one nexthop invalid,will resolve to default; thus zebra will update route with muti nexthop, and one of them via eth0, thus we can not handle this situation; |
What I did
Fixed a review comment for the commits made in PR #2553
Why I did it
The original fix in PR #2553 deleted stale neighbor from APPL_DB and ASIC_DB even for new route (if the route has only one next hop on interface eth0 or docker0). This is incorrect. The stale neighbor should be cleared only for the route delete command. There was review comment identifying this requirement. To address the review comment, changes are done to stale neighbor fix PR #2553. Changes include adding check to delete the stale neighbor from ASIC_DB and APPL_DB only if the kernel command is RTM_DELROUTE
How I verified it
vs test
Details if related
PR #2553