Warm reboot: Support vlanmgrd process warm restart#550
Warm reboot: Support vlanmgrd process warm restart#550lguohan merged 7 commits intosonic-net:masterfrom
Conversation
Signed-off-by: Jipan Yang <jipan.yang@alibaba-inc.com>
|
based on our discussion, we need to vs test for the warm reboot work flow. please add. |
| { | ||
| // Don't reset vlan aware bridge upon swss docker warm restart. | ||
| SWSS_LOG_INFO("vlanmgrd warm start, skipping bridge create"); | ||
| return; |
There was a problem hiding this comment.
i am not sure if this approach is bullet proof.
what if we later change vlan_filtering option, or enable more option for the bridge. it could happen that older version does not have that option, but new vlanmgrd will enable that option, but the warm reboot will miss it.
I think the right approach should still remove all of them and add new, this is mainly control plane, then we can still achieve non data plane disruption.
There was a problem hiding this comment.
Removal of vlan doesn't affect data plane directly, but BGP docker and BGP will be affected and cause route flapping.
If there is vlan_filtering option change though unlikely for now, probably the easier way is to handle that explicitly.
There was a problem hiding this comment.
I am not particularly worry about vlan_filtering, I am more worry about future bridge attribute, maybe disable unknown multicast, unknown unicast options.
for bgp docker, I think we can do docker pause.
There was a problem hiding this comment.
Delete/create of bridge and vlan is done in linux kernel, and zebra listen on that, pausing docker in this case may trigger unknown side effect since we don't know the exact timing of netlink message. It also makes interface handling more complex.
For disabling unknown multicast, unknown unicast, probably we should add configuration option for them, that was in the original vlan trunk pull request. We may bring it back and refine the change later.
There was a problem hiding this comment.
my point is that in the future you never know what you are going to add for the bridge. Therefore, we should create exactly the same one as we create in cold boot.
To ensure that, the cleanest approach is to remove and recreate, then we can share the same code path as the cold boot.
if this can cause control plane disruption, we should then stop the bgp container and do the bgp gr.
In reply to: 205924420 [](ancestors = 205924420)
There was a problem hiding this comment.
Besides system level warm reboot, we want to support docker warm restart. We intend to have same code path for cold boot and warm boot whenever possible.
For this case, it is in constructor phase. I don't see why new option has to be put here instead of as a configuration option. Also we use docker to separate the services, stopping other docker when doing operation in one docker doesn't seem that clean.
|
For VS testing, I was thinking of adding them after the basic framework is ready. it looks we could add some basic test like verifying bridge vlan info, restart count, and expand the test later with more verification. Will work on it. |
|
jipan@sonic-build-2:~/warm_reboot/sonic-buildimage/src/sonic-swss/tests$ sudo pytest -v --dvsname=vs test_warm_reboot.py test_warm_reboot.py::test_VlanMgrdWarmRestart PASSED [100%] ==================================================================== 1 passed in 39.69 seconds ==================================================================== |
Signed-off-by: Jipan Yang <jipan.yang@alibaba-inc.com>
adb8c08 to
c146887
Compare
Signed-off-by: Jipan Yang <jipan.yang@alibaba-inc.com>
|
retest this please |
Signed-off-by: Jipan Yang <jipan.yang@alibaba-inc.com>
|
Any reason why it failed on Jenkins server? it works for me locally and CFG_WARM_RESTART_TABLE_NAME has been defined in swss-common schema.h. |
…s PR Signed-off-by: Jipan Yang <jipan.yang@alibaba-inc.com>
…ME to avoid jenkins environment error Signed-off-by: Jipan Yang <jipan.yang@alibaba-inc.com>
37a4a5f to
9b83c1e
Compare
Signed-off-by: Jipan Yang <jipan.yang@alibaba-inc.com>
…ic-net#550) * [sairedis] Add SkipRecordAttrContainer class * [sairedis] Start using SkipRecordAttrContainer class
* Support vlanmgrd process warm restart Signed-off-by: Jipan Yang <jipan.yang@alibaba-inc.com> * [VS]: add test case for vlanmgrd warm restart Signed-off-by: Jipan Yang <jipan.yang@alibaba-inc.com> * Adapt to the new warm reboot schema Signed-off-by: Jipan Yang <jipan.yang@alibaba-inc.com> * Update warm_restart common functions Signed-off-by: Jipan Yang <jipan.yang@alibaba-inc.com> * warm_restart common functions already available, remove them from this PR Signed-off-by: Jipan Yang <jipan.yang@alibaba-inc.com> * Use fixed CFG_WARM_RESTART_TABLE_NAME and STATE_WARM_RESTART_TABLE_NAME to avoid jenkins environment error Signed-off-by: Jipan Yang <jipan.yang@alibaba-inc.com> * Remove hardcoded names for warm restart config table and state table Signed-off-by: Jipan Yang <jipan.yang@alibaba-inc.com>
Signed-off-by: Jipan Yang jipan.yang@alibaba-inc.com
What I did
Add support for vlanmgrd process warm restart.
Why I did it
How I verified it
Check warm restart count of the swss processes
Kill vlanmgrd process and start it again
No traffic loss, also check restart count again, vlanmgrd restart_count incremented by 1
Details if related
Has dependency on
sonic-net/sonic-swss-common#211
#547