[config reload] Fixing config reload when timer based delayed services are disabled#1967
Conversation
…abled Signed-off-by: Sudharsan Dhamal Gopalarathnam <sudharsand@nvidia.com>
Correcting indentation
…ies into config_rel_fix2
| # Verify "systemctl reset-failed" is called for services under sonic-delayed.target | ||
| mock_run_command.assert_any_call('systemctl reset-failed snmp') | ||
| assert mock_run_command.call_count == 10 | ||
| assert mock_run_command.call_count == 11 |
There was a problem hiding this comment.
@dgsudharsan what this hard coded number stands for? should we count it in a better way than just a number?
There was a problem hiding this comment.
I believe the logic is testing for number of calls to clicommon.run_command which is internal implementation. There is no way of knowing it in test script (even in the code). If tomorrow an additional command gets added this needs to be changed again. I agree this is not a deterministic method but its an existing code and I am not sure of the motivation.
For e.g below are the calls internally invoked.
sudo systemctl stop sonic.target --job-mode replace-irreversibly
/usr/local/bin/sonic-cfggen -H -m --write-to-db
config qos reload --no-dynamic-buffer
pfcwd start_default
systemctl list-dependencies --plain sonic.target | sed '1d'
systemctl list-dependencies --plain sonic-delayed.target | sed '1d'
systemctl is-enabled snmp.timer
systemctl reset-failed swss
systemctl reset-failed snmp
sudo systemctl restart sonic.target
sudo monit reload
config/main.py
Outdated
| def _get_delayed_sonic_services(): | ||
| out = clicommon.run_command("systemctl list-dependencies --plain sonic-delayed.target | sed '1d'", return_cmd=True) | ||
| return (unit.strip().rstrip('.timer') for unit in out.splitlines()) | ||
| timers = [unit for unit in out.splitlines() if(clicommon.run_command("systemctl is-enabled {}".format(unit), |
config/main.py
Outdated
| def _get_delayed_sonic_services(): | ||
| out = clicommon.run_command("systemctl list-dependencies --plain sonic-delayed.target | sed '1d'", return_cmd=True) | ||
| return (unit.strip().rstrip('.timer') for unit in out.splitlines()) | ||
| timers = [unit for unit in out.splitlines() if(clicommon.run_command("systemctl is-enabled {}".format(unit), |
config/main.py
Outdated
| timers = [] | ||
| for unit in out.splitlines(): | ||
| state = clicommon.run_command("systemctl is-enabled {}".format(unit), return_cmd=True) | ||
| if(state.strip() == "enabled"): |
|
@dgsudharsan is this also required for 202111? if so, i will add the label |
|
/azp run Azure.sonic-utilities |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run Azure.sonic-utilities |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azpw run Azure.sonic-utilities (Test vstest) |
|
/AzurePipelines run Azure.sonic-utilities (Test vstest) |
|
No pipelines are associated with this pull request. |
|
/azpw run Azure.sonic-utilities |
|
/AzurePipelines run Azure.sonic-utilities |
|
Azure Pipelines successfully started running 1 pipeline(s). |
…s are disabled (#1967) #### What I did When timer based delayed services like mgmt-framework, telemetry and snmp are disabled and config reload is execute it fails Failed to reset failed state of unit mgmt-framework.service: Unit mgmt-framework.service not loaded. The reason is these services don't get masked like regular services and these are derived from timers. So when reset-failed is tried on these services it leads to exception. #### How I did it When the feature related to these services are disabled their timers would be masked and wouldn't be "enabled". So when deriving the services from timers the services which are not enabled will be skipped. #### How to verify it Disable services like mgmt-framework, snmp and telemetry and execute config reload. The config reload should execute without failure
…s are disabled (#1967) #### What I did When timer based delayed services like mgmt-framework, telemetry and snmp are disabled and config reload is execute it fails Failed to reset failed state of unit mgmt-framework.service: Unit mgmt-framework.service not loaded. The reason is these services don't get masked like regular services and these are derived from timers. So when reset-failed is tried on these services it leads to exception. #### How I did it When the feature related to these services are disabled their timers would be masked and wouldn't be "enabled". So when deriving the services from timers the services which are not enabled will be skipped. #### How to verify it Disable services like mgmt-framework, snmp and telemetry and execute config reload. The config reload should execute without failure
… services are disabled (sonic-net#1967)" This reverts commit 055ed4f.
…s are disabled (#1967) #### What I did When timer based delayed services like mgmt-framework, telemetry and snmp are disabled and config reload is execute it fails Failed to reset failed state of unit mgmt-framework.service: Unit mgmt-framework.service not loaded. The reason is these services don't get masked like regular services and these are derived from timers. So when reset-failed is tried on these services it leads to exception. #### How I did it When the feature related to these services are disabled their timers would be masked and wouldn't be "enabled". So when deriving the services from timers the services which are not enabled will be skipped. #### How to verify it Disable services like mgmt-framework, snmp and telemetry and execute config reload. The config reload should execute without failure
c31a362 - 2021-11-18 : [202012][Mux orch] set default as standby, change mux orch priority (sonic-net#2015) [Prince Sunny] 9a9e8e6 - 2021-11-18 : [202012] Check VS test failure (sonic-net#2033) [Prince Sunny] 7eaabca - 2021-11-11 : [202012] Fix random failure in PR/CI build. (sonic-net#2016) [Shilong Liu] 85230fe - 2021-11-04 : [orchagent] Fix group name of port-buffer-drop in flexcounterorch.cpp (sonic-net#1967) [Junchao-Mellanox] a55c2ca - 2021-11-03 : [teammgrd]: Handle LAGs cleanup gracefully on Warm/Fast reboot. (sonic-net#1934) [Nazarii Hnydyn]
Signed-off-by: Sudharsan Dhamal Gopalarathnam sudharsand@nvidia.com
What I did
When timer based delayed services like mgmt-framework, telemetry and snmp are disabled and config reload is execute it fails Failed to reset failed state of unit mgmt-framework.service: Unit mgmt-framework.service not loaded.
The reason is these services don't get masked like regular services and these are derived from timers. So when reset-failed is tried on these services it leads to exception.
How I did it
When the feature related to these services are disabled their timers would be masked and wouldn't be "enabled". So when deriving the services from timers the services which are not enabled will be skipped.
How to verify it
Disable services like mgmt-framework, snmp and telemetry and execute config reload. The config reload should execute without failure
Previous command output (if the output of a command-line utility has changed)
New command output (if the output of a command-line utility has changed)