Skip to content

[config reload] Fixing config reload when timer based delayed services are disabled#1967

Merged
qiluo-msft merged 11 commits intosonic-net:masterfrom
dgsudharsan:config_rel_fix2
Jan 6, 2022
Merged

[config reload] Fixing config reload when timer based delayed services are disabled#1967
qiluo-msft merged 11 commits intosonic-net:masterfrom
dgsudharsan:config_rel_fix2

Conversation

@dgsudharsan
Copy link
Collaborator

Signed-off-by: Sudharsan Dhamal Gopalarathnam sudharsand@nvidia.com

What I did

When timer based delayed services like mgmt-framework, telemetry and snmp are disabled and config reload is execute it fails Failed to reset failed state of unit mgmt-framework.service: Unit mgmt-framework.service not loaded.
The reason is these services don't get masked like regular services and these are derived from timers. So when reset-failed is tried on these services it leads to exception.

How I did it

When the feature related to these services are disabled their timers would be masked and wouldn't be "enabled". So when deriving the services from timers the services which are not enabled will be skipped.

How to verify it

Disable services like mgmt-framework, snmp and telemetry and execute config reload. The config reload should execute without failure

Previous command output (if the output of a command-line utility has changed)

New command output (if the output of a command-line utility has changed)

dgsudharsan and others added 4 commits December 9, 2021 02:16
…abled

Signed-off-by: Sudharsan Dhamal Gopalarathnam <sudharsand@nvidia.com>
Correcting indentation
# Verify "systemctl reset-failed" is called for services under sonic-delayed.target
mock_run_command.assert_any_call('systemctl reset-failed snmp')
assert mock_run_command.call_count == 10
assert mock_run_command.call_count == 11
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dgsudharsan what this hard coded number stands for? should we count it in a better way than just a number?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the logic is testing for number of calls to clicommon.run_command which is internal implementation. There is no way of knowing it in test script (even in the code). If tomorrow an additional command gets added this needs to be changed again. I agree this is not a deterministic method but its an existing code and I am not sure of the motivation.
For e.g below are the calls internally invoked.
sudo systemctl stop sonic.target --job-mode replace-irreversibly
/usr/local/bin/sonic-cfggen -H -m --write-to-db
config qos reload --no-dynamic-buffer
pfcwd start_default
systemctl list-dependencies --plain sonic.target | sed '1d'
systemctl list-dependencies --plain sonic-delayed.target | sed '1d'
systemctl is-enabled snmp.timer
systemctl reset-failed swss
systemctl reset-failed snmp
sudo systemctl restart sonic.target
sudo monit reload

@qiluo-msft qiluo-msft changed the title [config reload]Fixing config reload when timer based delayed services are disabled [config reload] Fixing config reload when timer based delayed services are disabled Dec 15, 2021
config/main.py Outdated
def _get_delayed_sonic_services():
out = clicommon.run_command("systemctl list-dependencies --plain sonic-delayed.target | sed '1d'", return_cmd=True)
return (unit.strip().rstrip('.timer') for unit in out.splitlines())
timers = [unit for unit in out.splitlines() if(clicommon.run_command("systemctl is-enabled {}".format(unit),
Copy link
Contributor

@qiluo-msft qiluo-msft Dec 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

systemctl

you can use one command to check all the units. You may use xargs to bypass the command line too long issue. #Pending

config/main.py Outdated
def _get_delayed_sonic_services():
out = clicommon.run_command("systemctl list-dependencies --plain sonic-delayed.target | sed '1d'", return_cmd=True)
return (unit.strip().rstrip('.timer') for unit in out.splitlines())
timers = [unit for unit in out.splitlines() if(clicommon.run_command("systemctl is-enabled {}".format(unit),
Copy link
Contributor

@qiluo-msft qiluo-msft Dec 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unit

This statement is too complex. Could you rewrite in traditional control flow, like for-loop or if-block? #Closed

config/main.py Outdated
timers = []
for unit in out.splitlines():
state = clicommon.run_command("systemctl is-enabled {}".format(unit), return_cmd=True)
if(state.strip() == "enabled"):
Copy link
Contributor

@qiluo-msft qiluo-msft Dec 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(

Add a blank after if, you can remove the outer (). #Closed

@liat-grozovik
Copy link
Collaborator

@dgsudharsan is this also required for 202111? if so, i will add the label

@liat-grozovik
Copy link
Collaborator

/azp run Azure.sonic-utilities

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@liat-grozovik
Copy link
Collaborator

/azp run Azure.sonic-utilities

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@dgsudharsan
Copy link
Collaborator Author

/azpw run Azure.sonic-utilities (Test vstest)

@mssonicbld
Copy link
Collaborator

/AzurePipelines run Azure.sonic-utilities (Test vstest)

@azure-pipelines
Copy link

No pipelines are associated with this pull request.

@dgsudharsan
Copy link
Collaborator Author

/azpw run Azure.sonic-utilities

@mssonicbld
Copy link
Collaborator

/AzurePipelines run Azure.sonic-utilities

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@qiluo-msft qiluo-msft merged commit 055ed4f into sonic-net:master Jan 6, 2022
judyjoseph pushed a commit that referenced this pull request Jan 9, 2022
…s are disabled (#1967)

#### What I did
When timer based delayed services like mgmt-framework, telemetry and snmp are disabled and config reload is execute it fails Failed to reset failed state of unit mgmt-framework.service: Unit mgmt-framework.service not loaded.
The reason is these services don't get masked like regular services and these are derived from timers. So when reset-failed is tried on these services it leads to exception.


#### How I did it
When the feature related to these services are disabled their timers would be masked and wouldn't be "enabled". So when deriving the services from timers the services which are not enabled will be skipped.


#### How to verify it
Disable services like mgmt-framework, snmp and telemetry and execute config reload. The config reload should execute without failure
qiluo-msft pushed a commit that referenced this pull request Jan 10, 2022
…s are disabled (#1967)

#### What I did
When timer based delayed services like mgmt-framework, telemetry and snmp are disabled and config reload is execute it fails Failed to reset failed state of unit mgmt-framework.service: Unit mgmt-framework.service not loaded.
The reason is these services don't get masked like regular services and these are derived from timers. So when reset-failed is tried on these services it leads to exception.


#### How I did it
When the feature related to these services are disabled their timers would be masked and wouldn't be "enabled". So when deriving the services from timers the services which are not enabled will be skipped.


#### How to verify it
Disable services like mgmt-framework, snmp and telemetry and execute config reload. The config reload should execute without failure
qiluo-msft added a commit to qiluo-msft/sonic-utilities that referenced this pull request Jan 16, 2022
arlakshm pushed a commit that referenced this pull request Mar 1, 2022
…s are disabled (#1967)

#### What I did
When timer based delayed services like mgmt-framework, telemetry and snmp are disabled and config reload is execute it fails Failed to reset failed state of unit mgmt-framework.service: Unit mgmt-framework.service not loaded.
The reason is these services don't get masked like regular services and these are derived from timers. So when reset-failed is tried on these services it leads to exception.


#### How I did it
When the feature related to these services are disabled their timers would be masked and wouldn't be "enabled". So when deriving the services from timers the services which are not enabled will be skipped.


#### How to verify it
Disable services like mgmt-framework, snmp and telemetry and execute config reload. The config reload should execute without failure
stepanblyschak pushed a commit to stepanblyschak/sonic-utilities that referenced this pull request Apr 18, 2022
c31a362 - 2021-11-18 : [202012][Mux orch] set default as standby, change mux orch priority (sonic-net#2015) [Prince Sunny]
9a9e8e6 - 2021-11-18 : [202012] Check VS test failure (sonic-net#2033) [Prince Sunny]
7eaabca - 2021-11-11 : [202012] Fix random failure in PR/CI build. (sonic-net#2016) [Shilong Liu]
85230fe - 2021-11-04 : [orchagent] Fix group name of port-buffer-drop in flexcounterorch.cpp (sonic-net#1967) [Junchao-Mellanox]
a55c2ca - 2021-11-03 : [teammgrd]: Handle LAGs cleanup gracefully on Warm/Fast reboot. (sonic-net#1934) [Nazarii Hnydyn]
@dgsudharsan dgsudharsan deleted the config_rel_fix2 branch March 9, 2023 02:06
@dgsudharsan dgsudharsan restored the config_rel_fix2 branch March 9, 2023 02:06
@dgsudharsan dgsudharsan deleted the config_rel_fix2 branch March 9, 2023 02:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants