Syslog rate limit design#1049
Conversation
|
Hi @wen587 , comment fixed. Could you please review again? |
|
LGTM |
wen587
left a comment
There was a problem hiding this comment.
LGTM. Please ask for more approvals to merge.
|
@venkatmahalingam , @lguohan , @madhupalu , @tzack000 , could you please review and sign-off? |
|
|
||
| . | ||
|
|
||
| > Note: according to test, syslog rate limit configuration on host side would not affect container side. |
There was a problem hiding this comment.
@Junchao-Mellanox IMHO, it's bit confusing comparing to how it's done in AUTO TECHSUPPORT, where global configuration overrides feature rate limit configuration. Here, using 'GLOBAL', does not reflect the actual intentions.
There was a problem hiding this comment.
How about HOST?
There was a problem hiding this comment.
@Junchao-Mellanox unfortunately i don't have neither a strong opinion here, nor a good suggestion
|
|
||
|  | ||
|
|
||
| #### Host side flow |
There was a problem hiding this comment.
@Junchao-Mellanox this approach introduces double host side configuration: first time with rsyslog-config service and second time with hostcfgd
There was a problem hiding this comment.
no, if configuration does not change, hostcfgd will not restart rsyslog
There was a problem hiding this comment.
@Junchao-Mellanox how do you plan to achieve that? The only reliable way which i see is to parse the config_db.json (taking into consideration also init_cfg.json) and fill daemon's internal cache on start
There was a problem hiding this comment.
We will parse /etc/rsyslog.conf for the first time, and comparing its configuration with CONFIG DB value.
There was a problem hiding this comment.
@Junchao-Mellanox what is the guarantee that rsyslog.conf will be up-to-date before the hostcfgd is started?
There was a problem hiding this comment.
Please check hostcfgd logic: https://github.com/sonic-net/sonic-host-services/blob/bc8698d1d760fefedaeb4742ad19b25ef2b3c17b/scripts/hostcfgd#L1661. Hostcfgd will delay itself by waiting for other system services. Thus, rsyslog-config.service and rsyslog.service should be started before hostcfgd,
|
|
||
| . | ||
|
|
||
| #### Container side flow |
There was a problem hiding this comment.
@Junchao-Mellanox this approach introduces double container side configuration: first time with container preStartAction and second time with containercfgd.
There was a problem hiding this comment.
no, if configuration does not change, hostcfgd will not restart rsyslog
There was a problem hiding this comment.
@Junchao-Mellanox how do you plan to achieve that? The only reliable way which i see is to parse the config_db.json (taking into consideration also init_cfg.json) and fill daemon's internal cache on start
There was a problem hiding this comment.
We will parse /etc/rsyslog.conf for the first time, and comparing its configuration with CONFIG DB value.
|
|
||
| #### CLI change | ||
|
|
||
| Config rate limit: |
There was a problem hiding this comment.
@Junchao-Mellanox why not to have like this:
config
|--- syslog
|--- rate-limit
|--- host --interval <interval> --burst <burst>
|--- container <service_name> --interval <interval> --burst <burst>
There was a problem hiding this comment.
Good suggestion. But I don't see an obviously benefit comparing to the current design. As the code has been implemented for a long time, I would like to keep it as rate-limit-host and rate-limit-container
|
|
||
| > Note: set interval or burst to 0 will disable rate limit. | ||
|
|
||
| Show rate limit: |
There was a problem hiding this comment.
@Junchao-Mellanox why not to have like this:
show
|--- syslog
|--- rate-limit
|--- host
|--- container <service_name>
There was a problem hiding this comment.
Good suggestion. But I don't see an obviously benefit comparing to the current design. As the code has been implemented for a long time, I would like to keep it as rate-limit-host and rate-limit-container
|
Last call on HLD review, @tzack000 and @venkatmahalingam if you have further comments this is the time. Otherwise I will cont with merging it. |
|
@qiluo-msft can you please merge this PR? Thanks. |
|
@Junchao-Mellanox @liat-grozovik The document seems to have special characters and hence the page deployment workflow is failing. Can you check if the failure on page deployment workflow is not due to this merge ? |
Hi, what do you mean by "page deployment workflow"? where can I get the log? |
|
@StormLiangMS can you please back port this to 202211? Thanks. |
Overview:
Logging in SONiC is organized with rsyslogd. Each container has its own rsyslogd instance plus a daemon running on a host. The rsyslogd instance which is running on the host is used to collect the messages from within containers and store them at /var/log/syslog path. Rsyslog config file are generated from templates:
Currently, each container has hardcoded message rate limiting to avoid receiving flooded log messages:
There is no rate limiting configured on host side for now.
The SystemLogRateLimitInterval determines the amount of time that is being measured for rate limiting. The SystemLogRateLimitBurst defines the amount of messages, that have to occur in the time limit of SystemLogRateLimitInterval, to trigger rate limiting. For example, SystemLogRateLimitInterval=300, SystemLogRateLimitBurst=20000, it means that if one daemon generate more than 20000 messages in 300 seconds, rsyslogd will start to drop messages after that(FIFO).
This feature allows user to configure SystemLogRateLimitInterval and SystemLogRateLimitBurst for host, containers.
Related PR: