[generic-config-updater] Handle failed service restarts#2020
[generic-config-updater] Handle failed service restarts#2020renukamanavalan merged 26 commits intosonic-net:masterfrom
Conversation
2) Read conf file from install dir 3) Drop empty keys & tables upon jsonpatch.JsonPatch.apply to be in sync with redis update 4) Prefix service_validator module path with "generic_updater"
2) Added vlan validator 3) Added test code for vlan validator
|
This pull request introduces 1 alert when merging 3bd91b5 into 5cc9dd5 - view on LGTM.com new alerts:
|
|
This pull request introduces 1 alert when merging f520752 into d9f3afe - view on LGTM.com new alerts:
|
| rc = os.system(f"systemctl restart {svc_name}") | ||
| logger.log(logger.LOG_PRIORITY_NOTICE, | ||
| f"Restarted {svc_name}", print_to_console) | ||
| if rc != 0: |
There was a problem hiding this comment.
rc == or != 0 is stable comparison for success/failure
There was a problem hiding this comment.
Sorry, I mean is there a specific exit code for "too many restarts" failure? if yes, we can check that code specifically.
There was a problem hiding this comment.
There are no specific error code. 1 is a generic error code, which is most commonly we see. But this could change in future. Moreover if there is any inherent service related error, the monitor catches it and leads to ICM.
In our case, for any failure, we try our best, before giving up.
ref: link
"In case of an error while processing any init-script action except for status, the init script shall print an error message and exit with a non-zero status code:"
"1 generic or unspecified error (current practice)"
| print_to_console) | ||
|
|
||
| rc = os.system(f"systemctl restart {svc_name}") | ||
| if rc != 0: |
There was a problem hiding this comment.
I don't get your comments. But rc is the absolute way to check success / failure.
|
|
||
|
|
||
| def rsyslog_validator(old_config, upd_config, keys): | ||
| return _service_restart("rsyslog-config") |
There was a problem hiding this comment.
What we need is to run /usr/bin/rsyslog-config.sh.
What it does is, update /etc/rsyslog.conf from CONFG-DB & restart rsyslog.
We do the same and additionally handle the possibility of rsyslog restart failure.
rsyslog-config is not a service, but a one shot wrapper to run /usr/bin/rsyslog-config.sh, after updategraph.service, at the startup.
What I did Missed update from review comments in PR #2020 s/os.system("sleep 10s")/time.sleep(10)/
What I did During config update, update of certain tables do demand service restart. With multiple related updates are not grouped together, this might result in too many service restarts, which could fail with "hitting start limit". When that happens, call reset-failed, try to restart. If it fails again, take a pause and try to restart again. How I did it When service restart fails, call reset-failed, try, pause and then call service restart again.
What I did Missed update from review comments in PR #2020 s/os.system("sleep 10s")/time.sleep(10)/
What I did
During config update, update of certain tables do demand service restart.
With multiple related updates are not grouped together, this might result in too many service restarts, which could fail with "hitting start limit". When that happens, call reset-failed, try to restart. If it fails again, take a pause and try to restart again.
How I did it
When service restart fails, call reset-failed, try, pause and then call service restart again.
How to verify it
Previous command output (if the output of a command-line utility has changed)
New command output (if the output of a command-line utility has changed)