[201803][monit] Restart rsyslog service if rsyslogd consumes > 800 MB memory#2963
[201803][monit] Restart rsyslog service if rsyslogd consumes > 800 MB memory#2963lguohan merged 1 commit intosonic-net:201803from jleveque:rsyslog_mem_limit_201803
Conversation
| if memory usage > 50% for 5 times within 10 cycles then alert | ||
| if cpu usage (user) > 90% for 5 times within 10 cycles then alert | ||
| if cpu usage (system) > 90% for 5 times within 10 cycles then alert | ||
| check process rsyslog with pidfile /var/run/rsyslogd.pid |
There was a problem hiding this comment.
/var/run/rsyslogd.pid [](start = 35, length = 21)
How about the rsyslog processes inside docker? Do they matter?
There was a problem hiding this comment.
We have not seen the rsyslogd memory leak occur on a rsyslogd process inside any Docker container. The assumption is that those rsyslogd processes have a very light load, whereas the rsyslogd process in the host image also acts as the rsyslog server for all of those processes, so it handles a much higher load of messages.
There was a problem hiding this comment.
rsyslog within container is better managed within the container, for example use superlance with supervisord or use container option to limit the whole memory consumption for the container.
| check process rsyslog with pidfile /var/run/rsyslogd.pid | ||
| start program = "/bin/systemctl start rsyslog.service" | ||
| stop program = "/bin/systemctl stop rsyslog.service" | ||
| if totalmem > 800 MB for 5 times within 10 cycles then restart |
There was a problem hiding this comment.
restart [](start = 57, length = 7)
Do we need to keep a restart counter somewhere?
There was a problem hiding this comment.
good question, what is the log message for such restart. we can search the syslog for such cases.
There was a problem hiding this comment.
Each cycle that monit detects the memory has exceeded the threshold it will log the following:
ERR monit[607]: 'rsyslog' total mem amount of 1.6 GB matches resource limit [total mem amount>800.0 MB]
And if it meets the criteria (5 of these within 10 cycles), it will log the following when it attempts to restart the service:
INFO monit[607]: 'rsyslog' trying to restart
|
Get this change in 201811 branch until we have a better memory resource monitor/mitigation in place. |
Configure monit to monitor the resident memory consumption of rsyslogd. If memory usage is > 800 MB for 5 out of 10 checks (2-minute cycle interval, so 10 out of 20 minutes), restart the rsyslog service, because rsyslogd is most likely leaking memory.