Skip to content

[rsyslog] Disable rsyslog rate limit in pre_test and do a recover in post_test#2378

Merged
bingwang-ms merged 2 commits intosonic-net:masterfrom
bingwang-ms:disable_rate_limit_for_rsyslog
Oct 22, 2020
Merged

[rsyslog] Disable rsyslog rate limit in pre_test and do a recover in post_test#2378
bingwang-ms merged 2 commits intosonic-net:masterfrom
bingwang-ms:disable_rate_limit_for_rsyslog

Conversation

@bingwang-ms
Copy link
Copy Markdown
Collaborator

@bingwang-ms bingwang-ms commented Oct 21, 2020

Description of PR

Summary:
This is a workaround for sonic-net/sonic-buildimage#5667
Syslog messages will sometimes be rate-limited for no apparent reason. The rate-limiting threshhold is set at 20k messages sent in 5 minutes, however syslog will sometimes start rate-limiting messages from orchagent when fewer than 20k messages have been logged in the past 5 minutes.
Some ERROR or WARNING log are supressed if rsyslog begin limiting rate, and errors might not be detected. This PR add a workaround for this issue.

Type of change

  • Bug fix
  • Testbed and Framework(new/improvement)
  • Test case(new/improvement)

Approach

What is the motivation for this PR?

This PR is to disable the rate limit of rsyslog to ensure that all logs are recorded in syslog.

How did you do it?

  1. Add a new case test_disable_rsyslog_rate_limit in test_pretest.py to update the configuration of rsyslog to disable rate limit, and then reload the rsyslogd service in each container;
  2. Add a new case test_recover_rsyslog_rate_limit in test_posttest.py to do a recover after all tests finish.

How did you verify/test it?

Verified on dx010 with a sample script to generate logs running in one of containers:

import syslog
index = 1
while index < 100000:
    msg="Test message at INFO priority index = {}".format(index)
    index += 1
    syslog.syslog(syslog.LOG_INFO, msg)

Before disable rate limit, we can see a rate-limiting in syslog

Oct 21 05:23:40.128727 str-dx010-acs-4 INFO swss#run.py: Test message at INFO priority index = 19995
Oct 21 05:23:40.128766 str-dx010-acs-4 INFO swss#run.py: Test message at INFO priority index = 19996
Oct 21 05:23:40.128766 str-dx010-acs-4 INFO swss#run.py: Test message at INFO priority index = 19997
Oct 21 05:23:40.128791 str-dx010-acs-4 INFO swss#run.py: Test message at INFO priority index = 19998
Oct 21 05:23:40.128791 str-dx010-acs-4 INFO swss#run.py: Test message at INFO priority index = 19999
Oct 21 05:23:40.128829 str-dx010-acs-4 INFO swss#run.py: Test message at INFO priority index = 20000
Oct 21 05:23:40.128829 str-dx010-acs-4 INFO swss#rsyslogd: imuxsock[pid: 2210, name: python] from <str-dx010-acs-4:run.py>: begin to drop messages due to rate-limiting

After disable rate limit, all logs are recorded in syslog

Oct 21 05:20:08.373985 str-dx010-acs-4 INFO swss#run.py: Test message at INFO priority index = 99994
Oct 21 05:20:08.374006 str-dx010-acs-4 INFO swss#run.py: Test message at INFO priority index = 99995
Oct 21 05:20:08.374023 str-dx010-acs-4 INFO swss#run.py: Test message at INFO priority index = 99996
Oct 21 05:20:08.374023 str-dx010-acs-4 INFO swss#run.py: Test message at INFO priority index = 99997
Oct 21 05:20:08.374065 str-dx010-acs-4 INFO swss#run.py: Test message at INFO priority index = 99998
Oct 21 05:20:08.374065 str-dx010-acs-4 INFO swss#run.py: Test message at INFO priority index = 99999

And after recover, the rate-limiting begins to work again.

Any platform specific information?

No.

Supported testbed topology if it's a new test case?

No.

Documentation

No.

@theasianpianist
Copy link
Copy Markdown
Contributor

LGTM

Should we revert the fix in sonic-buildimage after this gets merged?

Signed-off-by: bingwang <[email protected]>
@bingwang-ms
Copy link
Copy Markdown
Collaborator Author

Updated. Thanks @daall

@bingwang-ms bingwang-ms merged commit 1b5539d into sonic-net:master Oct 22, 2020
@bingwang-ms
Copy link
Copy Markdown
Collaborator Author

LGTM

Should we revert the fix in sonic-buildimage after this gets merged?

I don't think so. It's only a workaround for test. The issue needed to be addressed for production environment.

kazinator-arista pushed a commit to kazinator-arista/sonic-mgmt that referenced this pull request Mar 4, 2026
swss:
* 7841930 2022-07-15 | [vxlan]Fixing L2MC vlan member caching issue (sonic-net#2378) (HEAD -> 202205) [Sudharsan Dhamal Gopalarathnam]
* b8cd435 2022-07-14 | [muxorch] Always use direct link for SoC IPs (sonic-net#2369) [Longxiang Lyu]
* 6158d5c 2022-07-08 | Add BGP profile to Vnet routes (sonic-net#2337) [Prince Sunny]
* bdb7ffd 2022-07-06 | [teammgr]: Waiting MACsec ready before doLagMemberTask (sonic-net#2286) [Ze Gan]

sairedis:
* 58359d4 2022-06-30 | [sairedis] Perform log rotate on request (sonic-net#1058) (HEAD -> 202205, github/202205) [Kamil Cudnik]
* cad0268 2022-07-13 | Enable cisco debug shell by default (sonic-net#1078) [VenkatCisco]

Signed-off-by: Ying Xie <[email protected]>
kazinator-arista pushed a commit to kazinator-arista/sonic-mgmt that referenced this pull request Mar 4, 2026
…aemons] advance submodule head (sonic-net#13755)

linkmgrd:
* e191338 2023-02-10 | Fix the warning of unused variables (sonic-net#167) (HEAD -> 202205) [Longxiang Lyu]

utilities:
* 2c933b0a 2023-02-07 | [sai_failure_dump]Invoking dump during SAI failure (sonic-net#2633) (HEAD -> 202205) [Sudharsan Dhamal Gopalarathnam]
* e949f318 2023-02-07 | [show] add support for gRPC show commands for `active-active` (sonic-net#2629) [vdahiya12]
* 77723927 2023-01-30 | Fixed admin state config CLI for Backport interfaces (sonic-net#2557) [anamehra]
* 32b1d4d6 2023-02-01 | [masic support] 'show run bgp' support for multi-asic (sonic-net#2427) [wenyiz2021]
* a2252d8a 2022-10-11 | Filter port invalid MTU configuration (sonic-net#2378) [pettershao-ragilenetworks]
* 0ffb4b6a 2023-02-09 | Add Transceiver PM basic CLI support to show output from TRANSCEIVER_PM table for ZR (sonic-net#2655) (github/202205) [longhuan-cisco]
* 496a0774 2023-02-09 | Add asic id for linecards so "show fabric counters queue/port" can work for single chip systems (sonic-net#2656) [jfeng-arista]
* 2591e8b5 2023-02-03 | multi asic support for show queue counter (sonic-net#2647) [zhixzhu]

swss:
* e0373a4 2023-02-07 | [autoneg]Fixing adv interface types to be set when AN is disabled (sonic-net#2638) (HEAD -> 202205, github/202205) [Sudharsan Dhamal Gopalarathnam]
* 62a09a0 2023-02-09 | [sai_failure_dump]Invoking dump during SAI failure (sonic-net#2644) (sonic-net#2661) [Sudharsan Dhamal Gopalarathnam]
* 076f63e 2023-02-08 | [202205] Revert "Revert "[voq][chassis]Add show fabric counters port/queue commands (sonic-net#2522)" (sonic-net#2612)" (sonic-net#2655) [kenneth-arista]
* a35e074 2023-02-06 | [202205][voq][chassis] Remove created ports from the default vlan. (sonic-net#2651) [arista-nwolfe]

swss-common:
* b9d4284 2023-02-08 | [202205] Fix epoll and socket resource leak issue. (sonic-net#651) (sonic-net#741) (github/202205) [Kevin Petremann]

sairedis:
* 9d8e731 2023-02-08 | [Mellanox] Enable DSCP remapping by using SAI attribute (sonic-net#1188) (HEAD -> 202205, github/202205) [Stephen Sun]
* 272a8bd 2023-02-10 | Fixing race condition for rif counters sonic-net#1136 (sonic-net#1202) [Suman Kumar]
* 211365a 2023-02-08 | [202205][submodule][SAI]Advance SAI header (sonic-net#1207) [Richard.Yu]
* 939c14b 2023-02-08 | [Submodule][upgrade]Upgrade SAI submodule (sonic-net#1203) [Richard.Yu]

platform-daemons:
* e5ccd40 2022-10-03 | [ycabled] fix naming error for error condition for CLI handling (sonic-net#302) (HEAD -> 202205, github/202205) [vdahiya12]
* cdd354d 2022-09-29 | [ycabled] add some exception catching logic to some vendor specific API's (sonic-net#301) [vdahiya12]
* cf58c08 2023-02-01 | Chassisd do an explicit stop of the config_manager (sonic-net#328) (sonic-net#336) [judyjoseph]

Signed-off-by: Ying Xie <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants