Move ports operational status initial sync to PortInitDone handling code#704
Move ports operational status initial sync to PortInitDone handling code#704stepanblyschak wants to merge 4 commits intosonic-net:masterfrom
Conversation
|
retest this please |
|
During the restore phase of warm restart, the data including port oper status should be restored from db, otherwise changing the oper status only may cause data inconsistencies and disturb data plane traffic. Port oper status sync up will be done after data restore has finished. |
|
@jipanyang , is this breaking warm reboot? |
We had the discussion last week when syncd view comparison logic kept bringing down the port oper status during restore. The original attempted fix is to get port oper status from ASIC then finish the restore, we realized that the asic port oper status may have changed compared with that saved before shutdown @qiluo-msft , port oper status change usually will cause changes in related objects like neighbor, router, lag and so on. changing port oper status only will prevent the restore processing from reaching the pre-shutdown state. For warm reboot, we should perform state sync up only after restore (apply view). |
|
@jipanyang @qiluo-msft In my understanding, during warm start, port init done is handled before apply view, so sairedis will return oper status from restored ASIC DB(?), which should be the same as in restored APPL DB as I understand. At this point portsorch does not inform neighorch about oper status change. VS tests passed on my vm, including warm restart tests. |
|
@https://github.com/stepanblyschak sairedis get operation always reaches asic to get latest data. We do need to add virtual switch test cases to cover this scenario and other sad path scenarios like link flapping, lag/lag members down during restore phase. |
|
@jipanyang , agree with you, can you add such vstest to prevent future regression? @stepanblyschak , can you make changes based on jipan's comments? |
| SWSS_LOG_NOTICE("Get port state change notification id:%lx status:%d", id, status); | ||
|
|
||
| Port port; | ||
| if (!getPort(id, port)) |
There was a problem hiding this comment.
getPort [](start = 17, length = 7)
anything wrong with existing getPort? if any, fix inside?
orchagent/portsorch.cpp
Outdated
|
|
||
| updateDbPortOperStatus(port, status); | ||
| bool isUp = status == SAI_PORT_OPER_STATUS_UP; | ||
| setHostIntfsOperStatus(port, isUp); |
There was a problem hiding this comment.
setHostIntfsOperStatus [](start = 4, length = 22)
Don't ignore return value
qiluo-msft
left a comment
There was a problem hiding this comment.
I believe
Move ports operational status initial sync in PortInitDone handling code
this fix is for fast-fast reboot and may be challenging to implement and review. Could you please isolate this task into another PR, and keep all others in this PR since they are simple improvements for both cold and warm reboot.
|
agree with @qiluo-msft on separating this pull request into smaller tasks |
|
@lguohan Yes, I could try to come up a vs test case for port oper change check during swss warm restart. |
Signed-off-by: Stepan Blyschak <[email protected]>
- Change 'p' varaible name to 'port' - Pass 'Port' struct object to set/update methods instead of SAI OID to avoid unncessary 'for' loops that search port object in m_portList - minor simple changes Signed-off-by: Stepan Blyschak <[email protected]>
Signed-off-by: Stepan Blyschak <[email protected]>
Signed-off-by: Stepan Blyschak <[email protected]>
5f7a619 to
8430def
Compare
|
Closed in favor of #718 |
Signed-off-by: Ying Xie <[email protected]>
Add MACsec meta methods
*PINS Extension tables support
What I did
Move ports operational status initial sync in PortInitDone handling code
Throw "runtime_error" instead of "const char*"
Why I did it
The original motivation was that in mlnx fast-fast boot solution orchagent starts in cold mode, so orchagent assumes port operational status is down, however on HW it could be up and SDK will not generate operational status update event.
Also, tried to unify the flow for warm, cold boot.
How I verified it
Run vs tests, installed swss debian package on DUT, verified oper status is in sync with HW
Details if related