[teamsyncd] remove m_stateLagTablePreserved#3782
[teamsyncd] remove m_stateLagTablePreserved#3782stepanblyschak wants to merge 10 commits intosonic-net:masterfrom
Conversation
Once LAG is created in the kernel set its state to STATE_DB immediately. STATE_DB flag triggers kernel LAG IP configuration, which does not need to be delayed. Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
@stepanblyschak , can you point to the PR that added this feature. IIRC, this was done for some WB corner cases and seems to be an impactful change. @vaibhavhd for viz |
|
@stepanblyschak is the lack of this change the reason that portchannels are coming as OPER DOWN after warm-reboot to 202505? We are trying to triage that issue, and we have reasons to suspect that the regression started after this PR #3563 Issue - sonic-net/sonic-buildimage#23347 |
|
What is that corner case? I ran all relevant tests I know:
Sad path LAG cases should cover teamsyncd reconciliation flow. |
@vaibhavhd no it's different, I see what's the issue, working on it. |
|
/azpw run |
|
/AzurePipelines run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
During continuous testing of warm-reboot it was found that this delay is intended to postpone establishment of BGP sessions with upstream devices. Otherwise, if we have downstream BGP sessions which are not yet established and the device sends incomplete routing table and then sends EOIU marker to upstream device, causing upstream device to delete routes. Ideally, could be solved by postponing EOIU from FRR to not delay kernel config. |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
PR in draft due to FRR issue found causing the above |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Wouldn't this mean that if we remove this delay, and the upstream LAGs have an IP address assigned first, and BGP gets established first with upstream devices, then that'll result in a deletion of many routes? |
|
@saiarcot895 Issue described in the comment you refer to is solved by sonic-net/sonic-buildimage#24174. |
|
@yxieca please help to assign someone to review the suggested improvements in fastboot. this is req for SKUs which have more and more ports but will definitely help with existing SKUs as well. |
|
@stepanblyschak , @liat-grozovik , Both Vaibhav and Saikrishna have already engaged in reviewing. However, this PR is still in DRAFT mode. What can make it ready for review? |
|
@yxieca This PR depends on a merge of a bug fix in FRR sonic-net/sonic-buildimage#24174 |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Once LAG is created in the kernel set its state to STATE_DB immediately. STATE_DB flag triggers kernel LAG IP configuration, which does not need to be delayed.
Only LAG experience this behaviour, any other type of interface gets kernel IP configuration as soon as netdev is created.
What I did
Remove m_stateLagTablePreserved.
Why I did it
To not delay kernel LAG IP configuration.
Before:
After:
How I verified it
Ran following tests on Mellanox-SN4600C-C64 (master.0-e7331e3cb_Internal):
Merge after - sonic-net/sonic-buildimage#24174
Details if related