Skip to content

Warm reboot: Add support for orchagent pre-shutdown warm-restart state check#562

Merged
qiluo-msft merged 8 commits intosonic-net:masterfrom
jipanyang:warm_reboot_collab_7_pre_warm_restart_check
Sep 15, 2018
Merged

Warm reboot: Add support for orchagent pre-shutdown warm-restart state check#562
qiluo-msft merged 8 commits intosonic-net:masterfrom
jipanyang:warm_reboot_collab_7_pre_warm_restart_check

Conversation

@jipanyang
Copy link
Copy Markdown
Contributor

Signed-off-by: Jipan Yang [email protected]

What I did
Before stopping orchagent for warm restart, basic state check is preferred to ensure orchagent is not in transient state, so a deterministic state may be restored after restart.

Here is to implement orchagent_restart_check binary which may talk to orchagent and ask it to do self-check, return "READY " signal and freeze if everything is ok, otherwise "NOT_READY" signal should be returned.

The exact condition that is treated as restart ready need further discussion. For now, checking no un-met dependency in orchagent is used as example to verify the end to end flow. The other possible check is to make sure all config data to ASIC DB has been flushed.

Why I did it

How I verified it
VS test:

jipan@sonic-build-2:~/warm_reboot/sonic-buildimage/src/sonic-swss/tests$ sudo pytest -v --dvsname=vs  test_warm_reboot.py
[sudo] password for jipan: 
======================================================================= test session starts =======================================================================
platform linux2 -- Python 2.7.12, pytest-3.3.0, py-1.5.4, pluggy-0.6.0 -- /usr/bin/python
cachedir: .cache
rootdir: /home/jipan/warm_reboot/sonic-buildimage/src/sonic-swss/tests, inifile:
collected 1 item                                                                                                                                                  

test_warm_reboot.py::test_OrchagentWarmRestartReadyCheck PASSED                                                                                             [100%]

Details if related

@jipanyang jipanyang changed the title Add orchagent pre-warm-restart check mechanism Warm reboot: Add orchagent pre-warm-restart check mechanism Aug 3, 2018
@jipanyang jipanyang force-pushed the warm_reboot_collab_7_pre_warm_restart_check branch from ab6ff12 to 10c5bd1 Compare August 13, 2018 18:53
@lguohan
Copy link
Copy Markdown
Contributor

lguohan commented Aug 16, 2018

can you resolve conflict?

@jipanyang jipanyang changed the title Warm reboot: Add orchagent pre-warm-restart check mechanism Warm reboot: Add support for orchagent pre-shutdown warm-restart state check Aug 17, 2018
@jipanyang jipanyang force-pushed the warm_reboot_collab_7_pre_warm_restart_check branch 3 times, most recently from edbd459 to 8896686 Compare August 22, 2018 04:12
@jipanyang jipanyang force-pushed the warm_reboot_collab_7_pre_warm_restart_check branch from 8896686 to 53d8b25 Compare August 22, 2018 17:50
@qiluo-msft
Copy link
Copy Markdown
Contributor

qiluo-msft commented Aug 30, 2018

I believe current implementation and roadmap will allow unplanned orchagent warm start, so this PR may be not necessary. I agree it is conservative and super safe condition for a warm start. #Pending

@jipanyang
Copy link
Copy Markdown
Contributor Author

@qiluo-msft With unplanned restart, the program shutdown could happen at any point of orchagent execution, if you are confident that current implementation is able to handle all the scenarios, it is great. Supporting unplanned warm restart is the ultimate goal for each component of SONiC as we have discussed at the very beginning of warm reboot development.

I'll be happy to see everything works without the planned shutdown, or at least all the potential problems could be exposed and fixed accordingly :)

Copy link
Copy Markdown
Contributor

@qiluo-msft qiluo-msft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, my review focuses on harmful or not, even I think it is not necessary.

@lguohan
Copy link
Copy Markdown
Contributor

lguohan commented Sep 12, 2018

code looks good to me. but i would like to model where the tool send request, if orchagent not ready, it will wait till orchagent is ready and then the tool is unblock.

assert result == "RESTARTCHECK failed\n"

# recover for test cases after this one.
stop_swss(dvs)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stop_swss [](start = 4, length = 9)

Not defined. It will break vs test.

Copy link
Copy Markdown
Contributor

@qiluo-msft qiluo-msft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please help fix vs test.

@jipanyang
Copy link
Copy Markdown
Contributor Author

jipanyang commented Sep 14, 2018

@qiluo-msft if possible, could #557 be merged first. The vs test for this PR uses start_swss(dvs) & stop_swss(dvs) defined there.

I could replicate start_swss(dvs) & stop_swss(dvs) to this PR too if this is preferred.

@qiluo-msft
Copy link
Copy Markdown
Contributor

Sure. Check #557 first

@qiluo-msft
Copy link
Copy Markdown
Contributor

Help solve conflict?

@jipanyang
Copy link
Copy Markdown
Contributor Author

done. It looks VS test env has run mad.

@qiluo-msft qiluo-msft merged commit 9fda944 into sonic-net:master Sep 15, 2018
@jipanyang jipanyang deleted the warm_reboot_collab_7_pre_warm_restart_check branch February 9, 2019 02:33
swss::Logger::getInstance().setMinPrio(swss::Logger::SWSS_INFO);
SWSS_LOG_ENTER();

std::string skipPendingTaskCheck = "fasle";
Copy link
Copy Markdown
Contributor

@qiluo-msft qiluo-msft Jun 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please fix the typo? @jipanyang

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. Fortunately it is not causing problem due to value "true" is checked
https://github.com/Azure/sonic-swss/blob/master/orchagent/switchorch.cpp#L179

Janetxxx pushed a commit to Janetxxx/sonic-swss that referenced this pull request Nov 10, 2025
…e check (sonic-net#562)

* Add orchagent pre-warm-restart check mechanism
*  Add orchagent_restart_check options: --noFreeze & --skipPendingTaskCheck
* Add waitTime option for response from orchagent
* Fix build issue with latest master
* adapt to new dvs.runcmd() signature
* Move standard header before local headers
jianyuewu pushed a commit to jianyuewu/sonic-swss that referenced this pull request Dec 24, 2025
Updating the README to reflect the new dependency. Also, removing duplicate libgtest-dev from the azure-pipeline script.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants