-
Notifications
You must be signed in to change notification settings - Fork 810
[generic-config-updater] Handle failed service restarts #2020
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
7863b59
84f493b
dd337eb
6e0bba0
cabcf23
a075606
bd7781c
f839e53
beb1ea7
fe25d2e
fec8f8f
7d90266
91f8f7e
6cd9dd8
5a9434b
30feec2
8cd17fd
7533ed1
9abb300
a41052b
d37b350
3bd91b5
f520752
e66ce4a
d280814
f30ff36
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -17,13 +17,42 @@ def set_verbose(verbose=False): | |
|
|
||
| def _service_restart(svc_name): | ||
| rc = os.system(f"systemctl restart {svc_name}") | ||
| logger.log(logger.LOG_PRIORITY_NOTICE, | ||
| f"Restarted {svc_name}", print_to_console) | ||
| if rc != 0: | ||
| # This failure is likely due to too many restarts | ||
| # | ||
| rc = os.system(f"systemctl reset-failed {svc_name}") | ||
| logger.log(logger.LOG_PRIORITY_ERROR, | ||
| f"Service has been reset. rc={rc}; Try restart again...", | ||
| print_to_console) | ||
|
|
||
| rc = os.system(f"systemctl restart {svc_name}") | ||
| if rc != 0: | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't get your comments. But rc is the absolute way to check success / failure. |
||
| # Even with reset-failed, restart fails. | ||
| # Give a pause before retry. | ||
| # | ||
| logger.log(logger.LOG_PRIORITY_ERROR, | ||
| f"Restart failed for {svc_name} rc={rc} after reset; Pause for 10s & retry", | ||
| print_to_console) | ||
| os.system("sleep 10s") | ||
renukamanavalan marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| rc = os.system(f"systemctl restart {svc_name}") | ||
|
|
||
| if rc == 0: | ||
| logger.log(logger.LOG_PRIORITY_NOTICE, | ||
| f"Restart succeeded for {svc_name}", | ||
| print_to_console) | ||
| else: | ||
| logger.log(logger.LOG_PRIORITY_ERROR, | ||
| f"Restart failed for {svc_name} rc={rc}", | ||
| print_to_console) | ||
| return rc == 0 | ||
|
|
||
|
|
||
| def rsyslog_validator(old_config, upd_config, keys): | ||
| return _service_restart("rsyslog-config") | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What we need is to run /usr/bin/rsyslog-config.sh. rsyslog-config is not a service, but a one shot wrapper to run /usr/bin/rsyslog-config.sh, after updategraph.service, at the startup. |
||
| rc = os.system("/usr/bin/rsyslog-config.sh") | ||
| if rc != 0: | ||
| return _service_restart("rsyslog") | ||
| else: | ||
| return True | ||
|
|
||
|
|
||
| def dhcp_validator(old_config, upd_config, keys): | ||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the
rca stable code when too many restarts happen? If yes, just checkif rc == <code>:#ClosedThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rc == or != 0 is stable comparison for success/failure
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I mean is there a specific exit code for "too many restarts" failure? if yes, we can check that code specifically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are no specific error code. 1 is a generic error code, which is most commonly we see. But this could change in future. Moreover if there is any inherent service related error, the monitor catches it and leads to ICM.
In our case, for any failure, we try our best, before giving up.
ref: link
"In case of an error while processing any init-script action except for status, the init script shall print an error message and exit with a non-zero status code:"
"1 generic or unspecified error (current practice)"