modified down rules to pre-down rules to ensure that default route is…#3853
modified down rules to pre-down rules to ensure that default route is…#3853prsunny merged 6 commits intosonic-net:masterfrom kannankvs:mvrf_change_pre_down
Conversation
… deleted just before interface is made down
…pre-down instead of down.
|
retest please |
…pre-down instead of down.
| up ip link set dev lo-m master mgmt | ||
| down ip link delete dev lo-m | ||
| down ip link delete dev lo-m | ||
| {% endif %} |
There was a problem hiding this comment.
duplicated condition here if, should remove one.
There was a problem hiding this comment.
@tylerlinp : Which is duplicated condition? We modified only the order.
There was a problem hiding this comment.
The condition from line 8 to 20, so line 12 and 19 is duplicated. And it is better to add a block mgmt.
| pre-down ip {{ '-4' if prefix | ipv4 else '-6' }} route delete {{ prefix | network }}/{{ prefix | prefixlen }} dev eth0 table {{ vrf_table }} | ||
| pre-down ip {{ '-4' if prefix | ipv4 else '-6' }} rule delete from {{ prefix | ip }}/{{ '32' if prefix | ipv4 else '128' }} table {{ vrf_table }} | ||
| {% if (MGMT_VRF_CONFIG) and (MGMT_VRF_CONFIG['vrf_global']['mgmtVrfEnabled'] == "true") %} | ||
| pre-down ip link set dev eth0 nomaster |
There was a problem hiding this comment.
Why add this? It is different from original style. And it may be wrong, in kernel handling set nomaster is including event down, so here do a down in pre-down. I think cgdelete is enough.
There was a problem hiding this comment.
@tylerlinp : How come nomaster is doing an event down? Yes, we also expect the cgdelete to handle to take care of nomaster. But, similar to pre-down for route delete, we thought that "explicitly setting to nomaster" is better/safe for applications like zebra/dhcp_relay to handle this "nomaster" event separately. Anyway, this "nomaster" is not making any difference in the behavior in applications dhcp_relay and hence we think that it is safe to remove this.
@prsunny , @jleveque : Any comments?
There was a problem hiding this comment.
To be exactly, linux kernel do change master just like event down, will flush neighbors and routes. If set nomaster before delete routes in kernel(here they all in pre-down, it make troubles), kernel will clear routes in handling down, and commands to delete doutes will do nothing, and there is no deldoute netlink, and our fix will fail again.
case NETDEV_DOWN:
fib_disable_ip(dev, event, false);
break;
case NETDEV_CHANGE:
flags = dev_get_flags(dev);
if (flags & (IFF_RUNNING | IFF_LOWER_UP))
fib_sync_up(dev, RTNH_F_LINKDOWN);
else
fib_sync_down_dev(dev, event, false);
/* fall through */
case NETDEV_CHANGEMTU:
rt_cache_flush(net);
break;
case NETDEV_CHANGEUPPER:
info = ptr;
/* flush all routes if dev is linked to or unlinked from
* an L3 master device (e.g., VRF)
*/
if (info->upper_dev && netif_is_l3_master(info->upper_dev))
fib_disable_ip(dev, NETDEV_DOWN, true);
break;
At least here cannot use pre-down. But now that there is no up ip link set dev eth0 master mgmt how comes down ip link set dev eth0 nomaster? I think this is not safe but dangerous.
| vrf mgmt | ||
| up cgcreate -g l3mdev:mgmt | ||
| up cgset -r l3mdev.master-device=mgmt mgmt | ||
| pre-down ip link set dev eth0 nomaster |
…wn the interface. cgdelete is enough
|
@tylerlinp : We removed the nomaster as discussed. Assuming that cgdelete takes care of it. Request you to review and approve the changes. |
…wn the interface. cgdelete is enough
|
@tylerlinp , can you check the changes? |
|
@prsunny I had checked a few days ago and a suggestion was not replied.
|
@tylerlinp : Addressed the comment. Please check and approve. |
|
retest vsimage please |
|
Retest mellanox please |
|
Request for 201911 branch |
#3853) * modified down rules to pre-down rules to ensure that default route is deleted just before interface is made down
sonic-net#3853) * modified down rules to pre-down rules to ensure that default route is deleted just before interface is made down
sonic-net#3853) * modified down rules to pre-down rules to ensure that default route is deleted just before interface is made down
…AD automatically (sonic-net#1066) #### Why I did it src/sonic-utilities ``` * ba955a91 - (HEAD -> 202412, origin/202412) Merge pull request sonic-net#172 from mssonicbld/sonicbld/202412-merge (21 hours ago) [mssonicbld] * 4d5c7cb9 - Merge branch '202411' of https://github.com/sonic-net/sonic-utilities into 202412 (23 hours ago) [Sonic Automation] * e6fb8f96 - Stop the hwclock service and timer files before updating CMOS time (sonic-net#3853) (2 days ago) [Saikrishna Arcot] ``` #### How I did it #### How to verify it #### Description for the changelog
- What I did
When management VRF is deleted, zebra was crashing as specified in the gitissue3798. As discussed in the issue, it has got two parts. One part is to fix the crash in FRR, which will happen in the future. Other part is to ensure that default route is deleted just before interface is brought down by using "pre-down" commands instead of "down" commands, which will avoid the zebra crash. Irrespective of this change, FRR will be fixed to avoid the crash in zebra code.
- How I did it
Changed "down" commands to "pre-down" commands.
- How to verify it
Executed management VRF test cases specified in the sonic-mgmt PR1210 and verified that they are passing.
- Description for the changelog
Changed the commands given under "down" to "pre-down"