Skip to content

[config] Enable/disable container monitoring when starting/stopping the services.#1471

Merged
yozhao101 merged 8 commits intosonic-net:masterfrom
yozhao101:prevent_false_alarming
Mar 3, 2021
Merged

[config] Enable/disable container monitoring when starting/stopping the services.#1471
yozhao101 merged 8 commits intosonic-net:masterfrom
yozhao101:prevent_false_alarming

Conversation

@yozhao101
Copy link
Contributor

@yozhao101 yozhao101 commented Mar 2, 2021

Signed-off-by: Yong Zhao yozhao@microsoft.com

What I did

When we ran the command sudo config load, sudo config reload or sudo config load_minigraph, the containers swss, snmp, lldp, teamd, syncd, snmp, bgp, radv, pmon, dhcp_relay, telemetry and restapi would be stopped and then restarted. The script container_checker ran by Monit will generate false alerting messages into syslog to indicate some containers were not running during such stopping and restarting process. So this PR aims to prevent Monit from generating false alarm messages.

How I did it

Before stopping services, we disable Monit to monitor the running status of containers. After restarting services, we enable Monit to monitor the running status of containers again.

How to verify it

I deliberately reduce the monitoring interval of Monit from 60 seconds to 10 seconds to ensure the alerting messages from the script container_checker was generated during sudo config reload, sudo config load and sudo config load_minigraph. After this change was added into _stop_services(...) and _restart_services(...) , I checked that the alerting messages from container_checker did not appear in the syslog.

I verified this change on the device str-a7050-acs-3.

Previous command output (if the output of a command-line utility has changed)

admin@vlab-01:~$ sudo config reload -y
Executing stop of service telemetry...
Warning: Stopping telemetry.service, but it can still be activated by:
  telemetry.timer
Executing stop of service swss...
Executing stop of service lldp...
Executing stop of service pmon...
Executing stop of service bgp...
Running command: /usr/local/bin/sonic-cfggen -j /etc/sonic/init_cfg.json -j /etc/sonic/config_db.json --write-to-db
Running command: /usr/local/bin/db_migrator.py -o migrate
Executing reset-failed of service bgp...
Executing reset-failed of service dhcp_relay...
Executing reset-failed of service hostname-config...
Executing reset-failed of service interfaces-config...
Executing reset-failed of service lldp...
Executing reset-failed of service ntp-config...
Executing reset-failed of service pmon...
Executing reset-failed of service radv...
Executing reset-failed of service rsyslog-config...
Executing reset-failed of service snmp...
Executing reset-failed of service swss...
Executing reset-failed of service syncd...
Executing reset-failed of service teamd...
Executing reset-failed of service telemetry...
Executing restart of service hostname-config...
Executing restart of service interfaces-config...
Executing restart of service ntp-config...
Executing restart of service rsyslog-config...
Executing restart of service swss...
Executing restart of service bgp...
Executing restart of service pmon...
Executing restart of service lldp...
Executing restart of service telemetry...
Reloading Monit configuration ...
Reinitializing monit daemon

New command output (if the output of a command-line utility has changed)

admin@vlab-01:~$ sudo config reload -y
Disabling container monitoring ...
Executing stop of service telemetry...
Warning: Stopping telemetry.service, but it can still be activated by:
  telemetry.timer
Executing stop of service swss...
Executing stop of service lldp...
Executing stop of service pmon...
Executing stop of service bgp...
Running command: /usr/local/bin/sonic-cfggen -j /etc/sonic/init_cfg.json -j /etc/sonic/config_db.json --write-to-db
Running command: /usr/local/bin/db_migrator.py -o migrate
Executing reset-failed of service bgp...
Executing reset-failed of service dhcp_relay...
Executing reset-failed of service hostname-config...
Executing reset-failed of service interfaces-config...
Executing reset-failed of service lldp...
Executing reset-failed of service ntp-config...
Executing reset-failed of service pmon...
Executing reset-failed of service radv...
Executing reset-failed of service rsyslog-config...
Executing reset-failed of service snmp...
Executing reset-failed of service swss...
Executing reset-failed of service syncd...
Executing reset-failed of service teamd...
Executing reset-failed of service telemetry...
Executing restart of service hostname-config...
Executing restart of service interfaces-config...
Executing restart of service ntp-config...
Executing restart of service rsyslog-config...
Executing restart of service swss...
Executing restart of service bgp...
Executing restart of service pmon...
Executing restart of service lldp...
Executing restart of service telemetry...
Enabling container monitoring ...
Reloading Monit configuration ...
Reinitializing monit daemon

stopping services and monitor it again after restarting services when
ran the commands `sudo config reload`, `sudo config load` and `sudo
config load_minigraph`.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
@yozhao101 yozhao101 requested a review from jleveque March 2, 2021 06:29
Copy link
Contributor

@jleveque jleveque left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR title is deceptive. This change doesn't completely disable Monit, it only disables the monitoring of containers. Please update accordingly.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>
@yozhao101 yozhao101 changed the title [config] Enable/disable Monit when starting/stopping the services. [config] Enable/disable container monitoring when starting/stopping the services. Mar 2, 2021
@yozhao101 yozhao101 requested a review from lguohan March 2, 2021 17:51
@yozhao101
Copy link
Contributor Author

PR title is deceptive. This change doesn't completely disable Monit, it only disables the monitoring of containers. Please update accordingly.

Great suggestion, Updated!

Signed-off-by: Yong Zhao <yozhao@microsoft.com>
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
jleveque
jleveque previously approved these changes Mar 2, 2021
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
@yozhao101 yozhao101 merged commit 4a78c01 into sonic-net:master Mar 3, 2021
@yozhao101
Copy link
Contributor Author

@jleveque I will create a separate PR to update the submodule.

@yxieca
Copy link
Contributor

yxieca commented Mar 4, 2021

@yozhao101, @jleveque this change cannot be cherry-picked to 202012 cleanly, Is there a dependency needs to be cherry-picked?

I think this PR is depended on this one: #1199.

@jleveque
Copy link
Contributor

This commit has been reverted from the master branch, so I am removing the "Request for 202012 branch" label.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants