Skip to content

Move ports operational status initial sync to PortInitDone handling code#704

Closed
stepanblyschak wants to merge 4 commits intosonic-net:masterfrom
stepanblyschak:port_oper_sync
Closed

Move ports operational status initial sync to PortInitDone handling code#704
stepanblyschak wants to merge 4 commits intosonic-net:masterfrom
stepanblyschak:port_oper_sync

Conversation

@stepanblyschak
Copy link
Copy Markdown
Contributor

@stepanblyschak stepanblyschak commented Nov 22, 2018

What I did
Move ports operational status initial sync in PortInitDone handling code
Throw "runtime_error" instead of "const char*"

Why I did it
The original motivation was that in mlnx fast-fast boot solution orchagent starts in cold mode, so orchagent assumes port operational status is down, however on HW it could be up and SDK will not generate operational status update event.
Also, tried to unify the flow for warm, cold boot.

How I verified it
Run vs tests, installed swss debian package on DUT, verified oper status is in sync with HW

Details if related

@stepanblyschak stepanblyschak changed the title Move ports operational status initial sync in PortInitDone handling code Move ports operational status initial sync to PortInitDone handling code Nov 22, 2018
@lguohan
Copy link
Copy Markdown
Contributor

lguohan commented Nov 22, 2018

retest this please

@Jipan-Yang
Copy link
Copy Markdown

During the restore phase of warm restart, the data including port oper status should be restored from db, otherwise changing the oper status only may cause data inconsistencies and disturb data plane traffic. Port oper status sync up will be done after data restore has finished.

@lguohan
Copy link
Copy Markdown
Contributor

lguohan commented Nov 23, 2018

@jipanyang , is this breaking warm reboot?

@jipanyang
Copy link
Copy Markdown
Contributor

@jipanyang , is this breaking warm reboot?

We had the discussion last week when syncd view comparison logic kept bringing down the port oper status during restore. The original attempted fix is to get port oper status from ASIC then finish the restore, we realized that the asic port oper status may have changed compared with that saved before shutdown @qiluo-msft , port oper status change usually will cause changes in related objects like neighbor, router, lag and so on. changing port oper status only will prevent the restore processing from reaching the pre-shutdown state.

For warm reboot, we should perform state sync up only after restore (apply view).

@qiluo-msft qiluo-msft self-requested a review November 23, 2018 08:09
@stepanblyschak
Copy link
Copy Markdown
Contributor Author

@jipanyang @qiluo-msft In my understanding, during warm start, port init done is handled before apply view, so sairedis will return oper status from restored ASIC DB(?), which should be the same as in restored APPL DB as I understand. At this point portsorch does not inform neighorch about oper status change.
After apply view refreshPortStatus will be called anyway to sync with real HW oper status and inform neighorch about oper status change.

VS tests passed on my vm, including warm restart tests.
I have no ability to verify system level warm reboot. Could you please test this with system level warm reboot?

@jipanyang
Copy link
Copy Markdown
Contributor

@https://github.com/stepanblyschak sairedis get operation always reaches asic to get latest data. We do need to add virtual switch test cases to cover this scenario and other sad path scenarios like link flapping, lag/lag members down during restore phase.

@sonic-net sonic-net deleted a comment from lguohan Nov 23, 2018
@lguohan
Copy link
Copy Markdown
Contributor

lguohan commented Nov 23, 2018

@jipanyang , agree with you, can you add such vstest to prevent future regression? @stepanblyschak , can you make changes based on jipan's comments?

SWSS_LOG_NOTICE("Get port state change notification id:%lx status:%d", id, status);

Port port;
if (!getPort(id, port))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getPort [](start = 17, length = 7)

anything wrong with existing getPort? if any, fix inside?


updateDbPortOperStatus(port, status);
bool isUp = status == SAI_PORT_OPER_STATUS_UP;
setHostIntfsOperStatus(port, isUp);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

setHostIntfsOperStatus [](start = 4, length = 22)

Don't ignore return value

Copy link
Copy Markdown
Contributor

@qiluo-msft qiluo-msft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe

Move ports operational status initial sync in PortInitDone handling code

this fix is for fast-fast reboot and may be challenging to implement and review. Could you please isolate this task into another PR, and keep all others in this PR since they are simple improvements for both cold and warm reboot.

@stcheng
Copy link
Copy Markdown
Contributor

stcheng commented Nov 24, 2018

agree with @qiluo-msft on separating this pull request into smaller tasks

@jipanyang
Copy link
Copy Markdown
Contributor

@lguohan Yes, I could try to come up a vs test case for port oper change check during swss warm restart.

Stepan Blyschak added 4 commits December 3, 2018 13:35
 - Change 'p' varaible name to 'port'
 - Pass 'Port' struct object to set/update methods instead of SAI OID to avoid
   unncessary 'for' loops that search port object in m_portList
 - minor simple changes

Signed-off-by: Stepan Blyschak <[email protected]>
Signed-off-by: Stepan Blyschak <[email protected]>
@stepanblyschak
Copy link
Copy Markdown
Contributor Author

stepanblyschak commented Dec 3, 2018

Closed in favor of #718

@stepanblyschak stepanblyschak deleted the port_oper_sync branch December 3, 2018 15:24
EdenGri pushed a commit to EdenGri/sonic-swss that referenced this pull request Feb 28, 2022
oleksandrivantsiv pushed a commit to oleksandrivantsiv/sonic-swss that referenced this pull request Mar 1, 2023
jianyuewu pushed a commit to jianyuewu/sonic-swss that referenced this pull request Dec 24, 2025
*PINS Extension tables support
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants