[scripts/fast-reboot] Shutdown remaining containers through systemd#2133
Merged
liat-grozovik merged 2 commits intosonic-net:masterfrom Apr 13, 2022
Merged
[scripts/fast-reboot] Shutdown remaining containers through systemd#2133liat-grozovik merged 2 commits intosonic-net:masterfrom
liat-grozovik merged 2 commits intosonic-net:masterfrom
Conversation
The current implementation has two issues: 1. In case containers from "docker ps" output are ordered in a way that database is first in the list, the "systemctl stop database" followed by "docker kill database" will stop all other containers through systemd and ruin this optimization 2. After "docker kill database" there are lots of errors from daemons like hostcfgd, system-healthd, caclmgrd, etc. Also it causes those daemons to hang when received SIGTERM making a delay on following "systemctl stop database". In the new implementation, services are implicitelly stopped by systemd in the order that is correct. If a certain container needs an optimization that will kill the container instead of stopping it the container may implement this optimization in its /usr/local/bin/*.sh script. It is also more optimal since independent services might be stopped in parallel. NOTE: This fix is relevant for regular SONiC image and not for Kubernetes enabled SONiC image. Kubernetes integration in SONiC might have this issue still, which might be a design flawn. Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
vaibhavhd
reviewed
Apr 12, 2022
scripts/fast-reboot
Outdated
| debug "Stopping all remaining containers ..." | ||
| if test -f /usr/local/bin/ctrmgr_tools.py | ||
| then | ||
| debug "Stopping all remaining containers ..." |
Contributor
There was a problem hiding this comment.
Is this debug flag misplaced, considering the kill/shutdown happens implicitly now through systemd?
Contributor
Author
There was a problem hiding this comment.
IMO this branch is for kubernetes and can be removed. @renukamanavalan
Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
Collaborator
|
@renukamanavalan could you please refer to the question above? if this can be removed lets go a head |
renukamanavalan
approved these changes
Apr 13, 2022
Contributor
renukamanavalan
left a comment
There was a problem hiding this comment.
The current solution of stopping docker service would work.
qiluo-msft
pushed a commit
that referenced
this pull request
Apr 13, 2022
…2133) The current implementation has two issues: 1. In case containers from "docker ps" output are ordered in a way that database is first in the list, the "systemctl stop database" followed by "docker kill database" will stop all other containers through systemd and ruin this optimization 2. After "docker kill database" there are lots of errors from daemons like hostcfgd, system-healthd, caclmgrd, etc. Also it causes those daemons to hang when received SIGTERM making a delay on following "systemctl stop database". In the new implementation, services are implicitly stopped by systemd in the order that is correct. If a certain container needs an optimization that will kill the container instead of stopping it the container may implement this optimization in its /usr/local/bin/*.sh script. It is also more optimal since independent services might be stopped in parallel. - What I did Stop services using systemd - How I did it Stop services using systemd - How to verify it Run warm-reboot. Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
judyjoseph
pushed a commit
that referenced
this pull request
Apr 18, 2022
…2133) The current implementation has two issues: 1. In case containers from "docker ps" output are ordered in a way that database is first in the list, the "systemctl stop database" followed by "docker kill database" will stop all other containers through systemd and ruin this optimization 2. After "docker kill database" there are lots of errors from daemons like hostcfgd, system-healthd, caclmgrd, etc. Also it causes those daemons to hang when received SIGTERM making a delay on following "systemctl stop database". In the new implementation, services are implicitly stopped by systemd in the order that is correct. If a certain container needs an optimization that will kill the container instead of stopping it the container may implement this optimization in its /usr/local/bin/*.sh script. It is also more optimal since independent services might be stopped in parallel. - What I did Stop services using systemd - How I did it Stop services using systemd - How to verify it Run warm-reboot. Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
stepanblyschak
pushed a commit
to stepanblyschak/sonic-utilities
that referenced
this pull request
Apr 18, 2022
How I did it Advance swss submodule head to include: c3fb52b 2022-02-04 | Fix for missing lossless PG profile on certain ports (sonic-swss-common update for Vnet tables sonic-net#2133) (HEAD -> 202012, github/202012) [Ying Xie] Signed-off-by: Ying Xie ying.xie@microsoft.com
stepanblyschak
added a commit
to stepanblyschak/sonic-utilities
that referenced
this pull request
Apr 18, 2022
``` 08495be [scripts/fast-reboot] Shutdown remaining containers through systemd (sonic-net#2133) fa07373 [scripts/fast-reboot] stop timers in advance (sonic-net#2131) 00ef80e [show][muxcable] Decrease the timeout for show mux status/hwmode (sonic-net#2130) ``` Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
Junchao-Mellanox
added a commit
to Junchao-Mellanox/sonic-utilities
that referenced
this pull request
May 11, 2022
…ystemd (sonic-net#2133)" This reverts commit 23e9398.
liat-grozovik
pushed a commit
that referenced
this pull request
May 11, 2022
…ystemd (#2133)" (#2161) This reverts commit 23e9398. - What I did Revert "[scripts/fast-reboot] Shutdown remaining containers through systemd (#2133)" This reverted PR is part of a story that refactors warm/fast shutdown sequence to gracefully stop services instead of killing them without any ordering and dependency requirements which creates several issues and is error prone for the future. This PR must come together with sonic-net/sonic-buildimage#10510. However, #10510 is blocked due to an issue in swss-common sonic-net/sonic-swss-common#603 And a fix by MSFT is in review sonic-net/sonic-swss-common#606 I am reverting it because its dependency is still blocked and we cannot update submodule pointer. Once the dependency of the reverted PR is resolved, it shall be re-committed.
dprital
added a commit
to dprital/sonic-utilities
that referenced
this pull request
May 11, 2022
…ystemd (sonic-net#2133)" This reverts commit a5f55aa.
liat-grozovik
pushed a commit
that referenced
this pull request
May 11, 2022
…ystemd (#2133)" (#2166) - What I did This reverts commit a5f55aa. Revert "[scripts/fast-reboot] Shutdown remaining containers through systemd (#2133)" This reverted PR is part of a story that refactors warm/fast shutdown sequence to gracefully stop services instead of killing them without any ordering and dependency requirements which creates several issues and is error prone for the future. This PR must come together with sonic-net/sonic-buildimage#10510. However, #10510 is blocked due to an issue in swss-common sonic-net/sonic-swss-common#603 And a fix by MSFT is in review sonic-net/sonic-swss-common#606 I am reverting it because its dependency is still blocked and we cannot update submodule pointer. Once the dependency of the reverted PR is resolved, it shall be re-committed. - How I did it git revert a5f55aa - How to verify it Run tests
dprital
added a commit
to dprital/sonic-utilities
that referenced
this pull request
May 11, 2022
…ystemd (sonic-net#2133)" This reverts commit a5f55aa.
6 tasks
liat-grozovik
pushed a commit
to sonic-net/sonic-buildimage
that referenced
this pull request
May 15, 2022
Revert "[scripts/fast-reboot] Shutdown remaining containers through systemd (sonic-net/sonic-utilities#2133)" (sonic-net/sonic-utilities#2166)
stepanblyschak
added a commit
to stepanblyschak/sonic-utilities
that referenced
this pull request
May 24, 2022
…hrough systemd (sonic-net#2133)" (sonic-net#2161)" This reverts commit 288c2d8.
vaibhavhd
pushed a commit
that referenced
this pull request
Jul 25, 2022
…hrough systemd (#2133)" (#2161)" (#2184) Reverts #2161 Revert a revert. This must be merged together with sonic-net/sonic-buildimage#10510
yxieca
pushed a commit
that referenced
this pull request
Dec 7, 2022
…2133) The current implementation has two issues: 1. In case containers from "docker ps" output are ordered in a way that database is first in the list, the "systemctl stop database" followed by "docker kill database" will stop all other containers through systemd and ruin this optimization 2. After "docker kill database" there are lots of errors from daemons like hostcfgd, system-healthd, caclmgrd, etc. Also it causes those daemons to hang when received SIGTERM making a delay on following "systemctl stop database". In the new implementation, services are implicitly stopped by systemd in the order that is correct. If a certain container needs an optimization that will kill the container instead of stopping it the container may implement this optimization in its /usr/local/bin/*.sh script. It is also more optimal since independent services might be stopped in parallel. - What I did Stop services using systemd - How I did it Stop services using systemd - How to verify it Run warm-reboot. Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The current implementation has two issues:
In the new implementation, services are implicitly stopped by systemd in the order that is correct. If a certain container needs an optimization that will kill the container instead of stopping it the container may implement this optimization in its /usr/local/bin/*.sh script.
It is also more optimal since independent services might be stopped in parallel.
Signed-off-by: Stepan Blyschak stepanb@nvidia.com
What I did
Stop services using systemd
How I did it
Stop services using systemd
How to verify it
Run warm-reboot.
Previous command output (if the output of a command-line utility has changed)
New command output (if the output of a command-line utility has changed)
Required to chery-pick: 202012, 202111.
DEPENDS ON: sonic-net/sonic-buildimage#10510 sonic-net/sonic-buildimage#10511