Warm reboot: Add support for orchagent pre-shutdown warm-restart state check#562
Conversation
ab6ff12 to
10c5bd1
Compare
|
can you resolve conflict? |
edbd459 to
8896686
Compare
Signed-off-by: Jipan Yang <[email protected]>
…heck Signed-off-by: Jipan Yang <[email protected]>
Signed-off-by: Jipan Yang <[email protected]>
Signed-off-by: Jipan Yang <[email protected]>
Signed-off-by: Jipan Yang <[email protected]>
8896686 to
53d8b25
Compare
|
I believe current implementation and roadmap will allow unplanned orchagent warm start, so this PR may be not necessary. I agree it is conservative and super safe condition for a warm start. #Pending |
|
@qiluo-msft With unplanned restart, the program shutdown could happen at any point of orchagent execution, if you are confident that current implementation is able to handle all the scenarios, it is great. Supporting unplanned warm restart is the ultimate goal for each component of SONiC as we have discussed at the very beginning of warm reboot development. I'll be happy to see everything works without the planned shutdown, or at least all the potential problems could be exposed and fixed accordingly :) |
qiluo-msft
left a comment
There was a problem hiding this comment.
As discussed, my review focuses on harmful or not, even I think it is not necessary.
|
code looks good to me. but i would like to model where the tool send request, if orchagent not ready, it will wait till orchagent is ready and then the tool is unblock. |
…b_7_pre_warm_restart_check
Signed-off-by: Jipan Yang <[email protected]>
| assert result == "RESTARTCHECK failed\n" | ||
|
|
||
| # recover for test cases after this one. | ||
| stop_swss(dvs) |
There was a problem hiding this comment.
stop_swss [](start = 4, length = 9)
Not defined. It will break vs test.
qiluo-msft
left a comment
There was a problem hiding this comment.
Please help fix vs test.
|
@qiluo-msft if possible, could #557 be merged first. The vs test for this PR uses start_swss(dvs) & stop_swss(dvs) defined there. I could replicate start_swss(dvs) & stop_swss(dvs) to this PR too if this is preferred. |
|
Sure. Check #557 first |
|
Help solve conflict? |
…b_7_pre_warm_restart_check
|
done. It looks VS test env has run mad. |
| swss::Logger::getInstance().setMinPrio(swss::Logger::SWSS_INFO); | ||
| SWSS_LOG_ENTER(); | ||
|
|
||
| std::string skipPendingTaskCheck = "fasle"; |
There was a problem hiding this comment.
Ok. Fortunately it is not causing problem due to value "true" is checked
https://github.com/Azure/sonic-swss/blob/master/orchagent/switchorch.cpp#L179
According to [HLD](https://github.com/Azure/SONiC/blob/master/doc/rates-and-utilization/Rates_and_utilization_HLD.md) Signed-off-by: Mykola Faryma <[email protected]>
…e check (sonic-net#562) * Add orchagent pre-warm-restart check mechanism * Add orchagent_restart_check options: --noFreeze & --skipPendingTaskCheck * Add waitTime option for response from orchagent * Fix build issue with latest master * adapt to new dvs.runcmd() signature * Move standard header before local headers
Updating the README to reflect the new dependency. Also, removing duplicate libgtest-dev from the azure-pipeline script.
Signed-off-by: Jipan Yang [email protected]
What I did
Before stopping orchagent for warm restart, basic state check is preferred to ensure orchagent is not in transient state, so a deterministic state may be restored after restart.
Here is to implement orchagent_restart_check binary which may talk to orchagent and ask it to do self-check, return "READY " signal and freeze if everything is ok, otherwise "NOT_READY" signal should be returned.
The exact condition that is treated as restart ready need further discussion. For now, checking no un-met dependency in orchagent is used as example to verify the end to end flow. The other possible check is to make sure all config data to ASIC DB has been flushed.
Why I did it
How I verified it
VS test:
Details if related