[hostcfgd] Initialize Restart= in feature's systemd config by the value of auto_restart in CONFIG_DB#10915
Merged
yozhao101 merged 5 commits intosonic-net:masterfrom Jun 2, 2022
Conversation
… `hostcfgd` was started/restarted. Signed-off-by: Yong Zhao <yozhao@microsoft.com>
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
Contributor
Author
|
/AzurePipelines run Azure.sonic-buildimage |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Contributor
Author
|
stepanblyschak Can you please help me review this PR? |
Contributor
Author
|
alexrallen Can you please help me review this PR? |
Contributor
Author
|
@yxieca Can you please help me review this PR? |
different namespace. Signed-off-by: Yong Zhao <yozhao@microsoft.com>
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
yxieca
approved these changes
Jun 1, 2022
yxieca
pushed a commit
that referenced
this pull request
Jun 17, 2022
…alue of `auto_restart` in `CONFIG_DB` (#10915) Why I did it Recently the nightly testing pipeline found that the autorestart test case was failed when it was run against master image. The reason is Restart= field in each container's systemd configuration file was set to Restart=no even the value of auto_restart field in FEATURE table of CONFIG_DB is enabled. This issue introduced by #10168 can be reproduced by the following steps: Issues the config command to disable the auto-restart feature of a container Runs command config reload or config reload minigraph to enable auto-restart of the container Checks Restart= field in the container's systemd config file mentioned in step 1 by running the command sudo systemctl cat <container_name>.service Initially this PR (#10168) wants to revert the changes proposed by this: #8861. However, it did not fully revert all the changes. How I did it When hostcfgd started or was restarted, the Restart= field in each container's systemd configuration file should be initialized according to the value of auto_restart field in FEATURE table of CONFIG_DB. How to verify it I verified this change by running auto-restart test case against newly built master image and also ran the unittest:
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Signed-off-by: Yong Zhao yozhao@microsoft.com
Why I did it
Recently the nightly testing pipeline found that the
autorestarttest case was failed when it was run against master image. The reason isRestart=field in each container's systemd configuration file was set toRestart=noeven the value ofauto_restartfield inFEATUREtable ofCONFIG_DBisenabled.This issue introduced by #10168 can be reproduced by the following steps:
configcommand to disable theauto-restartfeature of a containerconfig reloadorconfig reload minigraphto enableauto-restartof the containerRestart=field in the container's systemd config file mentioned in step 1 by running the commandsudo systemctl cat <container_name>.serviceInitially this PR (#10168) wants to revert the changes proposed by this: #8861. However, it did not fully revert all the changes.
Following is the full story to tell how this regression did happen:
Step 1: Initially the field
Restart=alwayswas set in each container's systemd configuration file. Then Nvidia team submitted aPR ([hostcfgd] Configure service auto-restart in hostcfgd. by stepanblyschak · Pull Request #5744 · Azure/sonic-buildimage (github.com)) to dynamically change this field according to the value of
auto_restartfield in CONFIG_DB. I agreed with this proposal.In this PR, the
Restart=field in each container's systemd configuration file was set when eitherhostcfgdservice was restarted(https://github.com/stepanblyschak/sonic-buildimage/blob/32df167af7e5c494b4a8585abebbcd65f05ef0a3/src/sonic-host-services/scripts/hostcfgd#L150) or a user issued
configcommand to change theauto_restartfield inCONFIG_DB.If
hostcfgdservice was started/restarted due to device was rebooted or other reasons, the value ofRestart=field in systemdconfiguration file will be reset according to value of
auto_restartfield inCONFIG_DB. After this, systemd daemon should bereloaded since its configuration files are changed.
Step 2: However, reloading systemd daemon will need around 10 seconds as stated by this issue
[hostcfgd] hoscfgd doesn't honor CFG DB updates if they arrive in a specific time interval · Issue #8619 · Azure/sonic-buildimage (github.com).
Since
hostcfgdservice will listen to the notifications fromCONFIG_DBonly after systemd daemon was reloaded, any change in tables of CONFIG_DB during systemd daemon reload will be lost. As such, another PR was submitted to address this issue[hostcfgd] Fixed the brief blackout in hostcfgd using SubscriberStateTable by vivekreddynv · Pull Request #8861 · Azure/sonic-buildimage (github.com).
In this PR,
SubscriberStateTable and Selectorwere used to send and handle notifications fromCONFIG_DBinstead ofconfig_db.subscribe() and config_db.listen(). The benefits of this change are: any existing data and new change in tables ofCONFIG_DBwill be processed; do not need explicitly initializeRestart=field in each container's systemd configuration file.Step 3: However,
SusbscriberStateTablewill create multiple file descriptors against the Redis DB which is inefficient compared toConfigDBConnectorwhich only opens a single file descriptor.As discussed in Step 2, disadvantages of
config_db.subcribe() and config_db.listen()is that any change in the tables ofCONFIG_DBwill be lost beforeconfig_db.listen()was called. Then Nvidia team submitted a PR to fix this issue:Add API endpoints to ConfigDBConnector to support pre-loading data without blackout by alexrallen · Pull Request #587 · Azure/sonic-swss-common (github.com). At the same time, a PR was submitted to revert the change proposed in Step 2: [hostcfgd] Move hostcfgd back to ConfigDBConnector for subscribing to updates by alexrallen · Pull Request #10168 · Azure/sonic-buildimage (github.com). However, the change was not fully reverted.
Specifically in this PR, the
Restart=field in each container's systemd configuration file only needs to be initialized according to the value ofauto_restartfield inCONFIG_DB. But the change (line 269 ~ 282) proposed in Step 2 was not removed.How I did it
When
hostcfgdstarted or was restarted, theRestart=field in each container's systemd configuration file should be initialized according to the value ofauto_restartfield inFEATUREtable ofCONFIG_DB.How to verify it
I verified this change by running
auto-restarttest case against newly builtmasterimage and also ran the unittest:Which release branch to backport (provide reason below if selected)
N/A
Description for the changelog
Link to config_db schema for YANG module changes
A picture of a cute animal (not mandatory but encouraged)