diff --git a/tests/acl/test_acl.py b/tests/acl/test_acl.py index 73711ead0a9..e17656c979a 100644 --- a/tests/acl/test_acl.py +++ b/tests/acl/test_acl.py @@ -35,6 +35,7 @@ pytest.mark.acl, pytest.mark.disable_loganalyzer, # Disable automatic loganalyzer, since we use it for the test pytest.mark.topology("t0", "t1", "t2", "m0", "mx", "m1", "m2", "m3"), + pytest.mark.disable_memory_utilization ] MAX_WAIT_TIME_FOR_INTERFACES = 360 diff --git a/tests/common/plugins/memory_utilization/README.md b/tests/common/plugins/memory_utilization/README.md index 306216a6769..22208255eba 100644 --- a/tests/common/plugins/memory_utilization/README.md +++ b/tests/common/plugins/memory_utilization/README.md @@ -1,722 +1,395 @@ -### Scope -This document provides a high level description of the memory utilization verification design and instructions on using the "memory_utilization" fixture in SONiC testing. - -### Overview -During testing, the memory usage of the DUT can vary due to different configurations, environment setups, and test steps. To ensure the safe use of memory resources, it is necessary to check the memory usage after the test to confirm that it has not exceeded the high memory usage threshold and that no memory leaks have occurred. - -The purpose of the current feature is to verify that memory resources do not increase during test runs and do not exceed the high memory usage threshold. - -### Module Design -Newly introduced a plugin "memory_utilization" and config files. - -#### Config files -Config files, including both common file and platform dependence file. -- **memory_utilization_common.json**: Common configurations for memory utilization. - - The file is in JSON format and is defined in the public branch. - - It includes "COMMON" key which holds all publicly defined common memory items. -- **memory_utilization_dependence.json**: Dependency configruations for memory utilization, currently supporting special configurations for specific HwSku devices. - - The dependency file is also in JSON format and is used in internal branch. - - It also includes a 'COMMON' key which holds all internal common memory items. If an internal memory item has the same name as an item in the public common configuration file, it will overwrite the public common item. - - The dependency file also could includes a "HWSKU" key for all hwsku related memory items. If the hwsku memory item is same as a common one, it will overwrite it. - - This dependency file should be left empty in the public branch, with special configurations added in internal branch. - -Memory utilization config files include memory items which need to check. -Each memory item include "name", "cmd", "memory_params" and "memory_check". -- **name**: The name of the memory check item. -- **cmd**: The shell command is run on the DUT to collect memory information. -- **memory_params**: The items and thresholds for memory usage, defined as a Dict type in the configuration JSON file. It could be modified in the test case. -- **memory_check**: The function used to parse the output of the shell command takes two input parameters: cmd's output string and memory_params. It returns the parsered memory inforamtion, which will be compared with "memory_params" to check for memory threshold. - -#### Workflow -1. Collect memory information based on memory utilization config files before running a test case. - - Execute the "cmd" and use the "memory_check" function to collect the memory information before running the test case. -2. Collect memory information based on memory utilization config files after the test case is completed. - - Execute the "cmd" and use the "memory_check" function to collect the memory information after running the test case. -3. Compare the memory information from before and after the test run. - - Compare the collected memory information with the thresholds in "memory_params". -4. Raise an alarm if there is any memory leak or if the memory usage exceeds the high memory threshold based on the memory utilization config files. - ``` - > pytest.fail(message) - E Failed: [ALARM]: monit:memory_usage memory usage 77.8 exceeds high threshold 70.0 - ``` - - -### Memory Utilization usage example - -Below is a description of the possible uses for the "memory_utilization" fixture/module. - -##### memory_utilization fixture -In the root conftest there is an implemented "memory_utilization" pytest fixture that starts automatically for all test cases. -The main flow of the fixture is as follows: -- memory_utilization collects memory information before the test case starts. -- memory_utilization collects memory information after the test case finishes. -- memory_utilization compares DUT memory usage and displays the results. -- if memory_utilization finds any exceeded thresholds for high memory usage or memory increase, it will display the result and pytest will generate an 'error'. - -#### To skip memory_utilization for: - -memory_utilization is enabled by default, if you want to skip the memory_utilization, please follow below steps -- For all test cases - use pytest command line option ```--disable_memory_utilization``` -- Per test case: mark test case with ```@pytest.mark.disable_memory_utilization``` decorator. Example is shown below. - ```python - pytestmark = [ - pytest.mark.disable_memory_utilization - ] - ``` - -#### Example of memory items configuration in json file -Current we support the "monit", "free", "docker" and "free" memory items. "monit" is already defined in common file, we can also define "top", "free" or "docker" in internal branch with below example. -The value in the following examples are for demonstration purposes only and are not actual values. -Please adjust the corrrsponding values according to your specific platform. - -##### "monit" -"monit" uses the command "sudo monit status" to get the memory item's information. -"monit" has three configurations in the example below, the first in the common file, the second in the dependency common configuration, and the third in the hwsku configuration. Therefore, the configuration from the hwsku should be use. -The threshold for high memory is 80%, and for an increase is 5%. -The function "parse_monit_status_output" parses the output of the command "sudo monit status" and returns the memory information 'monit':{'memory_usage':41.2}. -The Memory utilization fixture uses the function "parse_monit_status_output" to parse the output of "sudo monit status" before and after the test case. It then compares the value with the threshold. If the value exceeds the threshold, an 'error' will be raised. - - -###### "monit" configuration in memory_utilization_common.json -```json - "COMMON": [ - { - "name": "monit", - "cmd": "sudo monit status", - "memory_params": { - "memory_usage": { - "memory_increase_threshold": 5, - "memory_high_threshold": 70 - } - }, - "memory_check": "parse_monit_status_output" - } +# Memory Utilization Plugin for SONiC Testing + +## Table of Contents +- [Overview](#overview) +- [Plugin Design](#plugin-design) + - [Configuration Files](#configuration-files) + - [Memory Item Structure](#memory-item-structure) + - [Threshold Types](#threshold-types) + - [Behavior for Combined Thresholds](#behavior-for-combined-thresholds) + - [Workflow](#workflow) +- [Usage Guide](#usage-guide) + - [Enabling and Disabling](#enabling-and-disabling) + - [Configuration Examples](#configuration-examples) +- [Supported Memory Monitors](#supported-memory-monitors) + - [Monit Status Monitor](#monit-status-monitor) + - [Free Memory Monitor](#free-memory-monitor) + - [Docker Stats Monitor](#docker-stats-monitor) + - [Top Monitor](#top-monitor) + - [FRR Memory Monitor](#frr-memory-monitor) +- [Advanced Usage](#advanced-usage) + - [Global Memory Items](#global-memory-items) + - [HWSKU-Specific Memory Items](#hwsku-specific-memory-items) + - [Test-Specific Memory Items](#test-specific-memory-items) +- [Troubleshooting](#troubleshooting) + +## Overview + +During testing, memory usage on the Device Under Test (DUT) can vary due to different configurations, environment setups, and test operations. To ensure safe memory resource utilization, this plugin checks memory usage before and after tests to verify that: + +1. Memory usage doesn't exceed high memory thresholds +2. No memory leaks occur during test execution +3. Memory increases stay within acceptable limits + +The memory utilization plugin automatically monitors memory resources and generates test failures when thresholds are exceeded. + +## Plugin Design + +### Configuration Files + +The plugin uses two JSON configuration files: + +1. **memory_utilization_common.json** - Common configurations for all platforms + - Used in public branches + - Contains the `COMMON` section with default memory items + +2. **memory_utilization_dependence.json** - Platform-specific configurations + - Used primarily in internal branches + - Contains a `COMMON` section that overrides common.json if needed + - Contains `HWSKU` section for hardware-specific thresholds + +### Memory Item Structure + +Each memory item in the configuration includes: + +| Field | Description | +|-------|-------------| +| `name` | Identifier for the memory check item | +| `cmd` | Shell command to execute on the DUT | +| `memory_params` | Dictionary of items to monitor with their thresholds | +| `memory_check` | Function name used to parse command output | + +Each `memory_params` entry can define: +- `memory_high_threshold`: Maximum acceptable value (fails if exceeded) +- `memory_increase_threshold`: Maximum acceptable increase (fails if exceeded) + +### Threshold Types + +Memory thresholds must be defined using the following structured formats: + +1. **Absolute value threshold**: + ```json + "memory_high_threshold": { + "type": "value", + "value": 128 + } + ``` + +2. **Percentage threshold**: + ```json + "memory_increase_threshold": { + "type": "percentage", + "value": "10%" + } + ``` + +3. **Combined thresholds** (when you want both types): + ```json + "memory_high_threshold": [ + {"type": "value", "value": 128}, + {"type": "percentage", "value": "75%"} + ] + ``` + +4. **Disabled threshold**: + ```json + "memory_high_threshold": null + ``` + This explicitly disables high threshold checking for this item. + +### Behavior for Combined Thresholds + +When thresholds include both **value** and **percentage** types, the plugin calculates both thresholds and applies the most restrictive one (i.e., the smallest value). + +For example: +- If `memory_increase_threshold` includes: + ```json + [ + {"type": "value", "value": 128}, + {"type": "percentage", "value": "10%"} ] -``` -###### "monit" configuration in memory_utilization_dependence.json + ``` + and the baseline memory usage is `1000`, the plugin will calculate: + - Absolute value threshold: `128` + - Percentage threshold: `10% of 1000 = 100` + - The plugin will use `100` as the threshold since it is the smaller value. + +This ensures that the plugin enforces the strictest memory usage limits. + +### Workflow + +1. **Before Test**: + - Executes each command in the configuration + - Parses output using the specified function + - Stores baseline memory values + +2. **After Test**: + - Executes the same commands again + - Parses output to get current memory values + - Compares with baseline and thresholds + +3. **Validation**: + - Checks if current values exceed high thresholds + - Checks if increases exceed increase thresholds + - Fails the test if any threshold is exceeded + +## Usage Guide + +### Enabling and Disabling + +Memory utilization checking is enabled by default for all tests. To disable it: + +1. **For all test cases**: + - Use the command line option `--disable_memory_utilization` + +2. **For specific test cases**: + - Add the `disable_memory_utilization` marker: + ```python + pytestmark = [ + pytest.mark.disable_memory_utilization + ] + ``` + +## Supported Memory Monitors and Configuration Examples + +### Monit Status Monitor + +Monitors system memory usage via Monit. + +- **Command**: `sudo monit status` +- **Parser Function**: `parse_monit_status_output` +- **Monitored Parameters**: + - `memory_usage`: System memory utilization percentage + +Example configuration: ```json - "HWSKU" : { - "Arista-7050QX": ["Arista-7050-QX-32S", "Arista-7050QX32S-Q32"] - }, - "COMMON": [ - { - "name": "monit", - "cmd": "sudo monit status", - "memory_params": { - "memory_usage": { - "memory_increase_threshold": 5, - "memory_high_threshold": 60 - } +{ + "name": "monit", + "cmd": "sudo monit status", + "memory_params": { + "memory_usage": { + "memory_increase_threshold": { + "type": "value", + "value": 5 }, - "memory_check": "parse_monit_status_output" - } - ], - "Arista-7050QX": [ - { - "name": "monit", - "cmd": "sudo monit status", - "memory_params": { - "memory_usage": { - "memory_increase_threshold": 5, - "memory_high_threshold": 80 - } - }, - "memory_check": "parse_monit_status_output" + "memory_high_threshold": { + "type": "value", + "value": 75 + } } - ] + }, + "memory_check": "parse_monit_status_output" +} ``` -###### "sudo monit status" output -```shell -System 'sonic' - status Running - monitoring status Monitored - monitoring mode active - on reboot start - load average [1.44] [1.10] [1.04] - cpu 22.7%us 3.3%sy 0.0%wa - memory usage 3.2 GB [41.2%] - swap usage 0 B [0.0%] - uptime 4d 3h 55m - boot time Thu, 11 Jul 2024 06:41:46 - data collected Mon, 15 Jul 2024 10:36:45 -``` -###### "parse_monit_status_output" return value -```shell - 'monit': { - 'memory_usage': 41.2 - } -``` +### Free Memory Monitor + +Monitors available system memory. -##### "free" -"free" uses the command "free -m" to get the memory item's information. -The threshold for high memory is 1500, and for an increase is 100. -The function "parse_free_output" parses the output of the command "free -m" and returns the memory information 'free':{'used':2533}. -The Memory utilization fixture uses the function "parse_free_output" to parse the output of "free -m" before and after the test case. It then compares the value with the threshold. If the value exceeds the threshold, an 'error' will be raised. +- **Command**: `free -m` +- **Parser Function**: `parse_free_output` +- **Monitored Parameters**: + - `used`: Used memory in MB -###### "free" configuration +Example configuration: ```json - { - "name": "free", - "cmd": "free -m", - "memory_params": { - "used": { - "memory_increase_threshold": 100, - "memory_high_threshold": 3000 - } +{ + "name": "free", + "cmd": "free -m", + "memory_params": { + "used": { + "memory_increase_threshold": { + "type": "percentage", + "value": "10%" }, - "memory_check": "parse_free_output" + "memory_high_threshold": null } + }, + "memory_check": "parse_free_output" +} ``` -###### "free -m" output -```shell - total used free shared buff/cache available -Mem: 3897 2533 256 183 1108 956 -Swap: 0 0 0 -``` -###### "parse_free_output" return value -```shell - 'free': { - 'used': 2533 - } -``` -##### "docker" -"docker" uses the command "docker stats --no-stream" to get the memory item's information. -30The memory_params could have several sub memory items, the example below is "snmp", "pmon" and other dockers. -The "snmp" threshold for high memory is 30, and for an increase is 3. -The function "parse_docker_stats_output" parses the output of the command "docker stats --no-stream" and returns the memory information {'snmp': '2.07', 'pmon': '5.64', 'lldp': '1.74', 'gnmi': '3.46', 'radv': '1.02', 'syncd': '13.16', 'teamd': '1.67', 'bgp': '8.98', 'swss': '3.75', 'acms': '2.22', 'database': '3.54'} -The Memory utilization fixture uses the function "parse_docker_stats_output" to parse the output of "docker stats --no-stream" before and after the test case. It then compares the value with the threshold. If the value exceeds the threshold, an 'error' will be raised. +### Docker Stats Monitor + +Monitors memory usage of Docker containers. + +- **Command**: `docker stats --no-stream` +- **Parser Function**: `parse_docker_stats_output` +- **Monitored Parameters**: + - Multiple container names (snmp, pmon, lldp, etc.) + - Each container's memory usage as a percentage -###### "docker" configuration +Example configuration: ```json - { - "name": "docker", - "cmd": "docker stats --no-stream", - "memory_params": { - "snmp": { - "memory_increase_threshold": 3, - "memory_high_threshold": 30 - }, - "pmon": { - "memory_increase_threshold": 3, - "memory_high_threshold": 30 - }, - "lldp": { - "memory_increase_threshold": 3, - "memory_high_threshold": 30 - }, - "gnmi": { - "memory_increase_threshold": 3, - "memory_high_threshold": 30 - }, - "radv": { - "memory_increase_threshold": 3, - "memory_high_threshold": 30 - }, - "syncd": { - "memory_increase_threshold": 3, - "memory_high_threshold": 30 - }, - "bgp": { - "memory_increase_threshold": 3, - "memory_high_threshold": 30 - }, - "teamd": { - "memory_increase_threshold": 3, - "memory_high_threshold": 30 - }, - "swss": { - "memory_increase_threshold": 3, - "memory_high_threshold": 30 - }, - "acms": { - "memory_increase_threshold": 3, - "memory_high_threshold": 30 - }, - "database": { - "memory_increase_threshold": 3, - "memory_high_threshold": 30 - } +{ + "name": "docker", + "cmd": "docker stats --no-stream", + "memory_params": { + "snmp": { + "memory_increase_threshold": { + "type": "value", + "value": 2 + }, + "memory_high_threshold": { + "type": "value", + "value": 4 + } + }, + "swss": { + "memory_increase_threshold": { + "type": "value", + "value": 2 }, - "memory_check": "parse_docker_stats_output" + "memory_high_threshold": { + "type": "value", + "value": 8 + } } + }, + "memory_check": "parse_docker_stats_output" +} ``` -###### "docker stats --no-stream" output -```shell -CONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS -52958334481e snmp 5.27% 80.8MiB / 3.807GiB 2.07% 0B / 0B 4.76MB / 180kB 10 -5acbd9b57da7 pmon 2.51% 220MiB / 3.807GiB 5.64% 0B / 0B 11.6MB / 123kB 19 -dc281fd005f8 lldp 0.36% 67.84MiB / 3.807GiB 1.74% 0B / 0B 5.9MB / 98.3kB 13 -9f7c67031b80 gnmi 2.05% 134.7MiB / 3.807GiB 3.46% 0B / 0B 22.8MB / 823kB 28 -e5b96437643b radv 0.42% 39.84MiB / 3.807GiB 1.02% 0B / 0B 713kB / 65.5kB 8 -48de97a88314 syncd 16.26% 513.2MiB / 3.807GiB 13.16% 0B / 0B 273MB / 778kB 48 -26a72f50a90a teamd 1.38% 65.09MiB / 3.807GiB 1.67% 0B / 0B 1.4MB / 90.1kB 22 -2995cc0130a7 bgp 16.07% 350.2MiB / 3.807GiB 8.98% 0B / 0B 20.2MB / 537MB 27 -45c759cb9770 swss 0.66% 146.2MiB / 3.807GiB 3.75% 0B / 0B 27.2MB / 209kB 41 -91903b37d1cd acms 0.35% 86.38MiB / 3.807GiB 2.22% 0B / 0B 266kB / 1.06MB 11 -5ffa57081cb8 database 35.64% 138.1MiB / 3.807GiB 3.54% 0B / 0B 50.2MB / 73.7kB 13 -``` -###### "parse_docker_stats_output" return value -```shell - 'docker': { - 'snmp': '2.07', - 'pmon': '5.64', - 'lldp': '1.74', - 'gnmi': '3.46', - 'radv': '1.02', - 'syncd': '13.16', - 'teamd': '1.67', - 'bgp': '8.98', - 'swss': '3.75', - 'acms': '2.22', - 'database': '3.54' - } -``` +### Top Monitor +Monitors memory usage of specific processes. -##### "top" -"top" uses the command "top -b -n 1" to get the memory item's information. -The memory_params could have several sub memory items, the example below is "bgpd" and "zebra". -The "bgpd" threshold for high memory is 200000, and for an increase is 10000. -The "zebra" threshold for high memory is 200000, and for an increase is 10000. -The function "parse_top_output" parses the output of the command "top -b -n 1" and returns the memory information 'top':{'zebra':67780,'bgpd':197416}. -The Memory utilization fixture uses the function "parse_top_output" to parse the output of "top -b -n 1" before and after the test case. It then compares the value with the threshold. If the value exceeds the threshold, an 'error' will be raised. +- **Command**: `top -b -n 1` +- **Parser Function**: `parse_top_output` +- **Monitored Parameters**: + - Process names (bgpd, zebra, etc.) + - Each process's memory usage in MB (from RES column) -###### "json" configuration +Example configuration: ```json - { - "name": "top", - "cmd": "top -b -n 1", - "memory_params": { - "bgpd": { - "memory_increase_threshold": 10000, - "memory_high_threshold": 200000 - }, - "zebra": { - "memory_increase_threshold": 10000, - "memory_high_threshold": 200000 - } +{ + "name": "top", + "cmd": "top -b -n 1", + "memory_params": { + "bgpd": { + "memory_increase_threshold": { + "type": "value", + "value": 128 + }, + "memory_high_threshold": null + }, + "zebra": { + "memory_increase_threshold": { + "type": "value", + "value": 64 }, - "memory_check": "parse_top_output" + "memory_high_threshold": null } + }, + "memory_check": "parse_top_output" +} ``` -###### "top -b -n 1" output -```shell -top - 03:01:19 up 4 days, 11 min, 0 users, load average: 1.56, 1.38, 1.16 -Tasks: 252 total, 2 running, 246 sleeping, 0 stopped, 4 zombie -%Cpu(s): 23.5 us, 8.6 sy, 0.0 ni, 66.7 id, 0.0 wa, 0.0 hi, 1.2 si, 0.0 st -MiB Mem : 3897.9 total, 254.8 free, 2534.7 used, 1108.4 buff/cache -MiB Swap: 0.0 total, 0.0 free, 0.0 used. 954.9 avail Mem - - PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND - 1063 message+ 20 0 125476 73676 8184 R 35.3 1.8 2254:29 redis-s+ - 6205 root 20 0 60392 41044 17348 S 29.4 1.0 553:32.09 python3 -2049149 root 20 0 11276 4116 3504 R 17.6 0.1 0:00.05 top - 2808 root 20 0 2701756 639364 283328 S 11.8 16.0 1041:03 syncd - 2529 root 20 0 95860 11568 10196 S 5.9 0.3 36:50.14 tlm_tea+ - 2536 root 20 0 19208 3824 2476 S 5.9 0.1 5:08.00 teamd - 5588 root 20 0 129120 32012 15944 S 5.9 0.8 19:47.47 python3 - 5730 root 20 0 1417896 78608 28512 S 5.9 2.0 54:33.37 telemet+ - 6082 root 20 0 38956 31736 10536 S 5.9 0.8 3:07.13 supervi+ - 6193 root 20 0 282292 48244 17000 S 5.9 1.2 90:43.80 python3 - 1 root 20 0 166628 13728 10192 S 0.0 0.3 10:58.80 systemd - 2 root 20 0 0 0 0 S 0.0 0.0 0:00.12 kthreadd - 3 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 rcu_gp - 4 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 rcu_par+ - 6 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 kworker+ - 8 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 mm_perc+ - 9 root 20 0 0 0 0 S 0.0 0.0 0:00.00 rcu_tas+ - 10 root 20 0 0 0 0 S 0.0 0.0 0:00.00 rcu_tas+ - 11 root 20 0 0 0 0 S 0.0 0.0 0:26.79 ksoftir+ - 12 root 20 0 0 0 0 I 0.0 0.0 15:09.90 rcu_sch+ - 13 root rt 0 0 0 0 S 0.0 0.0 0:02.01 migrati+ - 15 root 20 0 0 0 0 S 0.0 0.0 0:00.00 cpuhp/0 - 16 root 20 0 0 0 0 S 0.0 0.0 0:00.00 cpuhp/1 - 17 root rt 0 0 0 0 S 0.0 0.0 0:02.30 migrati+ - 18 root 20 0 0 0 0 S 0.0 0.0 0:26.84 ksoftir+ - 20 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 kworker+ - 21 root 20 0 0 0 0 S 0.0 0.0 0:00.00 cpuhp/2 - 22 root rt 0 0 0 0 S 0.0 0.0 0:02.20 migrati+ - 23 root 20 0 0 0 0 S 0.0 0.0 0:24.26 ksoftir+ - 25 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 kworker+ - 26 root 20 0 0 0 0 S 0.0 0.0 0:00.00 cpuhp/3 - 27 root rt 0 0 0 0 S 0.0 0.0 0:03.03 migrati+ - 28 root 20 0 0 0 0 S 0.0 0.0 0:25.04 ksoftir+ - 30 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 kworker+ - 33 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kdevtmp+ - 34 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 netns - 35 root 20 0 0 0 0 S 0.0 0.0 0:00.07 kauditd - 36 root 20 0 0 0 0 S 0.0 0.0 0:00.23 khungta+ - 37 root 20 0 0 0 0 S 0.0 0.0 0:00.00 oom_rea+ - 38 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 writeba+ - 39 root 20 0 0 0 0 S 0.0 0.0 0:17.86 kcompac+ - 40 root 25 5 0 0 0 S 0.0 0.0 0:00.00 ksmd - 41 root 39 19 0 0 0 S 0.0 0.0 0:14.38 khugepa+ - 60 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 kintegr+ - 61 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 kblockd - 62 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 blkcg_p+ - 63 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 edac-po+ - 64 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 devfreq+ - 66 root 0 -20 0 0 0 I 0.0 0.0 0:04.09 kworker+ - 67 root 20 0 0 0 0 S 0.0 0.0 0:00.59 kswapd0 - 68 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 kthrotld - 69 root -51 0 0 0 0 S 0.0 0.0 0:00.00 irq/24-+ - 70 root -51 0 0 0 0 S 0.0 0.0 0:00.00 irq/25-+ - 71 root -51 0 0 0 0 S 0.0 0.0 0:00.00 irq/26-+ - 72 root -51 0 0 0 0 S 0.0 0.0 0:00.00 irq/27-+ - 73 root -51 0 0 0 0 S 0.0 0.0 0:00.00 irq/28-+ - 75 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 acpi_th+ - 76 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 ipv6_ad+ - 85 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 kstrp - 88 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 zswap-s+ - 89 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 kworker+ - 111 root 0 -20 0 0 0 I 0.0 0.0 0:04.06 kworker+ - 141 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 ata_sff - 142 root 20 0 0 0 0 S 0.0 0.0 0:00.00 scsi_eh+ - 143 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 scsi_tm+ - 144 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 sdhci - 145 root 20 0 0 0 0 S 0.0 0.0 0:00.00 scsi_eh+ - 146 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 scsi_tm+ - 147 root -51 0 0 0 0 S 0.0 0.0 0:00.00 irq/16-+ - 151 root 0 -20 0 0 0 I 0.0 0.0 0:04.33 kworker+ - 168 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 nvme-wq - 169 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 nvme-re+ - 170 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 nvme-de+ - 184 root 20 0 0 0 0 S 0.0 0.0 0:00.00 scsi_eh+ - 185 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 scsi_tm+ - 186 root 20 0 0 0 0 S 0.0 0.0 0:19.38 usb-sto+ - 187 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 uas - 191 root 0 -20 0 0 0 I 0.0 0.0 0:04.10 kworker+ - 263 root 20 0 0 0 0 S 0.0 0.0 0:11.26 jbd2/sd+ - 264 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 ext4-rs+ - 278 root 0 -20 0 0 0 S 0.0 0.0 0:08.12 loop0 - 360 root 20 0 48024 17748 14244 S 0.0 0.4 0:13.98 systemd+ - 361 root 20 0 22932 7344 5764 S 0.0 0.2 0:01.14 systemd+ - 412 root 16 -4 91408 3132 2200 S 0.0 0.1 0:00.53 auditd - 417 root 16 -4 7244 5104 4432 S 0.0 0.1 0:00.08 audisp-+ - 479 root 20 0 8180 6796 1752 S 0.0 0.2 0:13.09 haveged - 494 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 cryptd - 495 root 20 0 7372 2968 2708 S 0.0 0.1 0:01.75 cron - 496 message+ 20 0 7988 4416 3848 S 0.0 0.1 0:08.29 dbus-da+ - 523 root 20 0 11152 5320 4376 S 0.0 0.1 0:00.13 smartd - 532 root 20 0 14552 9028 8048 S 0.0 0.2 0:04.42 systemd+ - 534 root 20 0 1579288 65448 33076 S 0.0 1.6 16:40.26 contain+ - 629 root 20 0 84096 3416 2388 S 0.0 0.1 3:46.33 monit - 650 root 20 0 1836628 100396 44680 S 0.0 2.5 41:36.89 dockerd - 654 root 20 0 6472 1732 1624 S 0.0 0.0 0:00.00 agetty - 655 root 20 0 6104 1976 1864 S 0.0 0.0 0:00.00 agetty - 658 root 20 0 14368 8700 7500 S 0.0 0.2 0:00.34 sshd - 780 root 20 0 720760 17092 6500 S 0.0 0.4 2:13.22 contain+ - 801 root 20 0 32148 28064 10316 S 0.0 0.7 2:47.49 supervi+ - 859 root 20 0 37176 27352 10344 S 0.0 0.7 0:43.02 arista - 1062 root 20 0 125080 27668 15768 S 0.0 0.7 18:21.61 python3 - 1152 root 20 0 12040 6920 6128 S 0.0 0.2 0:00.01 databas+ - 1157 root 20 0 12304 7200 6240 S 0.0 0.2 0:00.02 databas+ - 1160 root 20 0 1254740 33820 16584 S 0.0 0.8 0:17.96 docker - 1202 root 20 0 222220 6212 3664 S 0.0 0.2 0:00.06 rsyslogd - 1240 root 20 0 236476 61884 20724 S 0.0 1.6 12:58.16 healthd - 1311 root 20 0 720504 17896 6312 S 0.0 0.4 2:13.54 contain+ - 1333 root 20 0 36256 32032 10276 S 0.0 0.8 3:13.31 supervi+ - 1339 root 20 0 386980 48952 8248 S 0.0 1.2 0:59.09 healthd - 1344 root 20 0 175300 52568 10300 S 0.0 1.3 0:34.17 healthd - 1347 root 20 0 180060 52160 11056 S 0.0 1.3 0:01.20 healthd - 1350 root 20 0 170468 47420 7020 S 0.0 1.2 0:53.54 healthd - 1411 admin 20 0 12172 6484 5720 S 0.0 0.2 0:00.02 acms.sh - 1412 root 20 0 121940 27460 15048 S 0.0 0.7 0:40.33 caclmgrd - 1416 root 20 0 46296 25168 14756 S 0.0 0.6 8:52.39 procdoc+ - 1422 admin 20 0 69388 42444 18400 S 0.0 1.1 0:00.95 python3 - 1730 root 20 0 129144 31712 15616 S 0.0 0.8 19:57.12 python3 - 1732 root 20 0 44784 29552 15108 S 0.0 0.7 0:00.47 start.py - 1734 root 20 0 45820 29696 14840 S 0.0 0.7 0:00.49 CA_cert+ - 1735 root 20 0 44572 27912 13496 S 0.0 0.7 0:01.74 cert_co+ - 1869 root 20 0 222220 4188 3684 S 0.0 0.1 0:00.81 rsyslogd - 1960 root 20 0 720760 16332 6500 S 0.0 0.4 2:11.96 contain+ - 1981 root 20 0 36224 32124 10320 S 0.0 0.8 6:54.24 supervi+ - 2051 root 20 0 292104 5940 3028 S 0.0 0.1 0:07.35 rsyslogd - 2088 root 20 0 12304 7180 6200 S 0.0 0.2 0:00.03 swss.sh - 2272 root 20 0 720504 17260 6500 S 0.0 0.4 2:15.91 contain+ - 2299 root 20 0 720504 17048 6500 S 0.0 0.4 2:13.86 contain+ - 2307 root 20 0 36292 32128 10340 S 0.0 0.8 2:59.93 supervi+ - 2332 root 20 0 720760 18280 6500 S 0.0 0.5 2:15.46 contain+ - 2346 root 20 0 36292 32076 10268 S 0.0 0.8 3:23.37 supervi+ - 2361 root 20 0 35612 31380 10236 S 0.0 0.8 4:19.78 supervi+ - 2398 root 20 0 12172 7128 6212 S 0.0 0.2 0:00.02 syncd.sh - 2403 root 20 0 12172 6656 5888 S 0.0 0.2 0:00.02 syncd.sh - 2408 root 20 0 69388 42304 18268 S 0.0 1.1 0:00.94 python3 - 2411 admin 20 0 12172 6288 5520 S 0.0 0.2 0:00.02 teamd.sh - 2421 admin 20 0 12172 6452 5684 S 0.0 0.2 0:00.02 teamd.sh - 2423 admin 20 0 69388 42348 18312 S 0.0 1.1 0:00.93 python3 - 2430 admin 20 0 12040 6572 5804 S 0.0 0.2 0:00.02 bgp.sh - 2436 admin 20 0 12172 6440 5672 S 0.0 0.2 0:00.02 bgp.sh - 2439 admin 20 0 69388 42252 18216 S 0.0 1.1 0:00.93 python3 - 2443 root 20 0 125104 27596 15668 S 0.0 0.7 18:41.56 python3 - 2488 root 20 0 222220 6160 3660 S 0.0 0.2 0:04.46 rsyslogd - 2493 root 20 0 129144 31796 15700 S 0.0 0.8 18:19.70 python3 - 2500 root 20 0 129144 32524 15628 S 0.0 0.8 20:00.72 python3 - 2503 root 20 0 92180 11028 9824 S 0.0 0.3 0:30.90 portsyn+ - 2520 root 20 0 222220 4216 3716 S 0.0 0.1 0:00.84 rsyslogd - 2524 root 20 0 222220 6348 3716 S 0.0 0.2 3:18.10 rsyslogd - 2528 root 20 0 92608 11256 9776 S 0.0 0.3 0:35.48 teammgrd - 2571 root 20 0 19212 3804 2444 S 0.0 0.1 4:55.60 teamd - 2580 root 20 0 19212 3828 2480 S 0.0 0.1 5:02.54 teamd - 2600 root 20 0 19216 3908 2560 S 0.0 0.1 4:56.09 teamd - 2602 root 20 0 620592 87972 22340 S 0.0 2.2 16:54.01 orchage+ - 2611 root 20 0 19212 3800 2444 S 0.0 0.1 5:04.60 teamd - 2621 root 20 0 19212 3884 2532 S 0.0 0.1 4:56.33 teamd - 2632 root 20 0 87704 1540 1372 S 0.0 0.0 0:00.02 dsserve - 2633 root 20 0 92992 11580 9716 S 0.0 0.3 0:35.32 teamsyn+ - 2643 root 20 0 19212 3832 2484 S 0.0 0.1 5:10.30 teamd - 2655 root 20 0 19212 3912 2564 S 0.0 0.1 5:27.09 teamd - 2664 root 20 0 720760 18556 6436 S 0.0 0.5 2:14.23 contain+ - 2691 root 20 0 36224 32148 10320 S 0.0 0.8 2:37.63 supervi+ - 2709 admin 20 0 12040 7024 6232 S 0.0 0.2 0:00.02 radv.sh - 2718 admin 20 0 12172 6640 5872 S 0.0 0.2 0:00.02 radv.sh - 2725 admin 20 0 69388 42464 18424 S 0.0 1.1 0:00.94 python3 - 2742 root 20 0 293620 42828 18256 S 0.0 1.1 0:00.93 python3 - 2755 root 20 0 92336 10048 8928 S 0.0 0.3 0:31.47 coppmgrd - 2777 root 20 0 125104 27796 15796 S 0.0 0.7 20:01.54 python3 - 2813 root 20 0 222220 4224 3720 S 0.0 0.1 0:00.23 rsyslogd - 2826 300 20 0 665588 67780 6916 S 0.0 1.7 0:23.12 zebra - 2864 300 20 0 44084 13616 5320 S 0.0 0.3 0:07.68 staticd - 2882 root 20 0 125080 27604 15700 S 0.0 0.7 20:04.30 python3 - 2895 ntp 20 0 74488 3648 2912 S 0.0 0.1 0:43.96 ntpd - 2907 root 20 0 92172 9892 8768 S 0.0 0.2 0:16.61 neighsy+ - 2908 root 20 0 92336 11124 9904 S 0.0 0.3 0:31.36 vlanmgrd - 2909 root 20 0 92496 11268 9936 S 0.0 0.3 0:32.85 intfmgrd - 2910 root 20 0 92308 11128 9864 S 0.0 0.3 0:31.84 portmgrd - 2911 root 20 0 92544 11068 9768 S 0.0 0.3 0:34.04 bufferm+ - 2914 root 20 0 92328 11044 9772 S 0.0 0.3 0:31.67 vrfmgrd - 2916 root 20 0 92216 9988 8868 S 0.0 0.3 0:31.45 nbrmgrd - 2917 root 20 0 92360 11112 9876 S 0.0 0.3 0:31.86 vxlanmg+ - 2920 root 20 0 92108 10136 9016 S 0.0 0.3 0:15.75 fdbsyncd - 2924 root 20 0 92308 11024 9812 S 0.0 0.3 0:32.53 tunnelm+ - 3034 root 20 0 222220 4208 3708 S 0.0 0.1 0:00.06 rsyslogd - 3327 root 20 0 50524 30004 15580 S 0.0 0.8 0:47.96 featured - 3328 root 20 0 61100 40044 16576 S 0.0 1.0 0:31.69 hostcfgd - 3331 root 20 0 5472 3020 2772 S 0.0 0.1 0:00.02 rasdaem+ - 4346 300 20 0 498388 197416 8156 S 0.0 4.9 3:00.32 bgpd - 4351 root 20 0 123448 33264 16476 S 0.0 0.8 0:40.03 bgpcfgd - 4359 root 20 0 37472 22380 15156 S 0.0 0.6 2:10.32 bgpmon - 4361 root 20 0 93460 11936 9548 S 0.0 0.3 0:18.78 fpmsyncd - 4377 root 20 0 37664 22160 14580 S 0.0 0.6 0:35.31 staticr+ - 5413 root 20 0 32180 19580 10260 S 0.0 0.5 0:00.29 python3 - 5465 root 20 0 720760 17684 6372 S 0.0 0.4 3:19.35 contain+ - 5486 root 20 0 36284 32056 10232 S 0.0 0.8 3:03.55 supervi+ - 5499 admin 20 0 12040 6984 6192 S 0.0 0.2 0:00.02 gnmi.sh - 5501 admin 20 0 12172 6396 5632 S 0.0 0.2 0:00.02 gnmi.sh - 5503 admin 20 0 69388 42216 18168 S 0.0 1.1 0:00.87 python3 - 5591 root 20 0 222220 6212 3688 S 0.0 0.2 0:00.07 rsyslogd - 5614 root 20 0 720760 16616 6372 S 0.0 0.4 2:08.40 contain+ - 5640 root 20 0 36224 32148 10328 S 0.0 0.8 3:28.26 supervi+ - 5668 admin 20 0 12040 6440 5676 S 0.0 0.2 0:00.01 lldp.sh - 5671 admin 20 0 12172 6540 5772 S 0.0 0.2 0:00.02 lldp.sh - 5673 admin 20 0 69388 42228 18176 S 0.0 1.1 0:00.89 python3 - 5738 root 20 0 1418152 73376 28400 S 0.0 1.8 54:30.22 telemet+ - 5877 root 20 0 125104 27600 15608 S 0.0 0.7 19:59.83 python3 - 5884 root 20 0 721016 16468 6308 S 0.0 0.4 2:19.75 contain+ - 5915 root 20 0 36232 32132 10336 S 0.0 0.8 4:06.02 supervi+ - 5938 admin 20 0 12172 6568 5808 S 0.0 0.2 0:00.02 pmon.sh - 5940 admin 20 0 69388 42344 18304 S 0.0 1.1 0:00.88 python3 - 5963 root 20 0 222220 4128 3624 S 0.0 0.1 0:00.08 rsyslogd - 6035 tcpdump 20 0 20252 8844 7644 S 0.0 0.2 0:00.31 lldpd - 6037 tcpdump 20 0 20384 3720 2388 S 0.0 0.1 4:04.87 lldpd - 6061 root 20 0 720760 17832 6500 S 0.0 0.4 3:15.76 contain+ - 6089 root 20 0 115064 28844 14628 S 0.0 0.7 32:49.56 python3 - 6098 root 20 0 12120 7108 6324 S 0.0 0.2 0:00.02 snmp.sh - 6100 root 20 0 12252 7096 6304 S 0.0 0.2 0:00.02 snmp.sh - 6103 root 20 0 69468 42572 18512 S 0.0 1.1 0:00.89 python3 - 6128 root 20 0 42324 25320 15492 S 0.0 0.6 0:04.51 python3 - 6164 root 20 0 125104 27568 15636 S 0.0 0.7 18:36.36 python3 - 6166 root 20 0 132408 33104 15988 S 0.0 0.8 19:49.98 python3 - 6176 root 20 0 222224 4016 3516 S 0.0 0.1 0:01.39 rsyslogd - 6180 root 20 0 222220 8280 3712 S 0.0 0.2 0:00.18 rsyslogd - 6188 tcpdump 20 0 26616 15068 9736 S 0.0 0.4 6:39.60 snmpd - 6191 root 20 0 52408 35208 16700 S 0.0 0.9 0:36.22 python3 - 6194 root 20 0 59796 42796 16824 S 0.0 1.1 26:20.79 python3 - 6195 root 20 0 60552 43180 16556 S 0.0 1.1 0:02.77 python3 - 6197 root 20 0 61708 44256 16576 S 0.0 1.1 5:20.56 python3 - 6198 root 20 0 72120 55252 16952 S 0.0 1.4 1:32.29 pcied - 6204 root 20 0 55332 1140 0 S 0.0 0.0 0:22.51 sensord - 6206 root 20 0 61708 33988 6308 S 0.0 0.9 3:52.65 python3 - 110653 root 20 0 4160 3392 2896 S 0.0 0.1 0:00.06 bash -2002226 root 20 0 0 0 0 I 0.0 0.0 0:01.10 kworker+ -2016391 root 20 0 0 0 0 I 0.0 0.0 0:00.40 kworker+ -2026996 root 20 0 0 0 0 I 0.0 0.0 0:00.44 kworker+ -2030519 root 20 0 0 0 0 I 0.0 0.0 0:00.00 kworker+ -2034114 root 20 0 0 0 0 I 0.0 0.0 0:00.24 kworker+ -2036072 root 20 0 0 0 0 I 0.0 0.0 0:00.00 kworker+ -2036243 root 20 0 0 0 0 I 0.0 0.0 0:00.06 kworker+ -2041185 root 20 0 0 0 0 I 0.0 0.0 0:00.17 kworker+ -2044722 root 20 0 0 0 0 I 0.0 0.0 0:00.10 kworker+ -2044751 root 20 0 0 0 0 I 0.0 0.0 0:00.00 kworker+ -2047210 root 20 0 0 0 0 I 0.0 0.0 0:00.00 kworker+ -2048525 root 20 0 0 0 0 I 0.0 0.0 0:00.00 kworker+ -2048528 root 20 0 14852 9136 7744 S 0.0 0.2 0:00.22 sshd -2048534 admin 20 0 15308 6584 4672 S 0.0 0.2 0:00.18 sshd -2048690 root 20 0 0 0 0 Z 0.0 0.0 0:01.06 python3 -2048691 root 20 0 0 0 0 Z 0.0 0.0 0:00.19 memory_+ -2048692 root 20 0 0 0 0 Z 0.0 0.0 0:00.94 python3 -2048693 root 20 0 0 0 0 Z 0.0 0.0 0:00.95 python3 -2049145 admin 20 0 2480 512 448 S 0.0 0.0 0:00.02 sh -2049146 root 20 0 15008 8240 7260 S 0.0 0.2 0:00.03 sudo -2049147 root 20 0 2480 516 448 S 0.0 0.0 0:00.00 sh -2049148 root 20 0 33600 23920 11384 S 0.0 0.6 0:00.43 python -``` +### FRR Memory Monitor + +Monitors memory usage of FRR routing daemons. + +- **Commands**: + - `vtysh -c "show memory bgp"` + - `vtysh -c "show memory zebra"` +- **Parser Function**: `parse_frr_memory_output` +- **Monitored Parameters**: + - `used`: Memory usage in MB -###### "parse_top_output" return value -```shell - 'top': { - 'zebra': 67780, - 'bgpd': 197416 - } +Example configuration: +```json +{ + "name": "frr_bgp", + "cmd": "vtysh -c \"show memory bgp\"", + "memory_params": { + "used": { + "memory_increase_threshold": [ + {"type": "percentage", "value": "10%"}, + {"type": "value", "value": 5} + ], + "memory_high_threshold": { + "type": "value", + "value": 128 + } + } + }, + "memory_check": "parse_frr_memory_output" +} ``` -#### Example Use Cases for memory items +## Advanced Usage -##### Define a memory item globally -- We can define a global memory item in "memory_utilization_common.json" which is usually used in public branch. This memory item will apply to all test cases. -- If we prefer not to define it in the public branch, we can alternatively define it in "memory_utilization_dependence.json" which is usually used in the internal branch. This memory item will also apply to all test cases. -- If a memory item is defined with the same name in both "memory_utilization_common.json" and "memory_utilization_dependence.json" under their "COMMON" sections, the definition in the "memory_utilization_dependence.json" will take priority. +### HWSKU-Specific Memory Items +To define HWSKU-specific configurations in memory_utilization_dependence.json: +1. Define the `HWSKU` section mapping collection names to specific SKUs +2. Create sections using collection names containing memory items for those SKUs - ```python - # define in "memory_utilization_common.json" - "COMMON": [ - { - "name": "monit", - "cmd": "sudo monit status", - "memory_params": { - "memory_usage": { - "memory_increase_threshold": 5, - "memory_high_threshold": 70 +Example: +```json +{ + "HWSKU": { + "Arista-7050QX": ["Arista-7050-QX-32S", "Arista-7050-QX32", "Arista-7050QX-32S-S4Q31", "Arista-7050QX32S-Q32"] + }, + "Arista-7050QX": [ + { + "name": "monit", + "cmd": "sudo monit status", + "memory_params": { + "memory_usage": { + "memory_increase_threshold": { + "type": "value", + "value": 5 + }, + "memory_high_threshold": { + "type": "value", + "value": 85 } - }, - "memory_check": "parse_monit_status_output" - } - ] - ``` + } + }, + "memory_check": "parse_monit_status_output" + } + ] +} +``` - ```python - # define in memory_utilization_dependence.json - # "memory_high_threshold" overwrited to 60 - "COMMON": [ - { - "name": "monit", - "cmd": "sudo monit status", - "memory_params": { - "memory_usage": { - "memory_increase_threshold": 5, - "memory_high_threshold": 60 - } - }, - "memory_check": "parse_monit_status_output" - } - ] - ``` +In this example, specific Arista SKUs will use a higher memory_high_threshold (85 instead of 75). +### Test-Specific Memory Items -##### Define a memory item per HWSKU -We can define a memory item per HwSku in "memory_utilization_dependence.json". -- First, define a dict named "HWSKU" to manage all HwSku collections. -- Second, specify each HwSku collections. use the collection name as the key and a list of HwSku names included in that HwSku collection as the value. -- Finally, define the memory items for each HwSku collection, the configuration will take priority. +You can modify or add memory items within individual tests by: +1. Modifying thresholds of existing items +2. Registering new memory monitors - ```python - # memory_utilization_dependence.json - "HWSKU" : { - "Arista-7050QX": ["Arista-7050-QX-32S", "Arista-7050QX32S-Q32"] - }, - "COMMON": [ - { - "name": "monit", - "cmd": "sudo monit status", - "memory_params": { - "memory_usage": { - "memory_increase_threshold": 5, - "memory_high_threshold": 60 - } - }, - "memory_check": "parse_monit_status_output" - } - ], - # the "memory_high_threshold" value would be overwrite to "80" for HwSku "Arista-7050-QX-32S", "Arista-7050QX32S-Q32" - "Arista-7050QX": [ - { - "name": "monit", - "cmd": "sudo monit status", - "memory_params": { - "memory_usage": { - "memory_increase_threshold": 5, - "memory_high_threshold": 80 - } - }, - "memory_check": "parse_monit_status_output" - } - ] - ``` +```python +def test_case_example(duthosts, enum_frontend_dut_hostname, memory_utilization): + # Get memory monitors and values + memory_monitors, memory_values = memory_utilization + # Get memory monitor for the current DUT + duthost = duthosts[enum_frontend_dut_hostname] + memory_monitor = memory_monitors[duthost.hostname] -##### Define a memory item per test case -We can modify the threshold of existing memory items within the test case, but we cannot change the cmd and function of the memory items. -However, we can add new memory items within the test case by using memory_utilization fixture and then registering them. + # Update existing monitor thresholds or register new monitors + # ...existing code... +``` -This functionality has not been verified yet, the following examples are provided for reference only. -Updates will be made once the verification process is complete. +## Troubleshooting - ```python - # memory item config per test case - per_test_case_config = [ - # exist memory item - { - "name": "monit", - "cmd": "sudo monit status", - "memory_params": { - "memory_usage": { - "memory_increase_threshold": 10, - "memory_high_threshold": 90 - } - }, - "memory_check": "parse_monit_status_output" - }, - # new memory item per test case - { - "name": "memory_item_per_test_case", - "cmd": "cmd per test case", - "memory_params": { - "used": { - "memory_increase_threshold": 100, - "memory_high_threshold": 1500 - } - }, - "memory_check": "parse_output_per_test_case" - } - ] +When a memory threshold is exceeded, the plugin will fail the test with a detailed message showing: +- The name of the memory item that triggered the failure +- The specific memory parameter that exceeded the threshold +- The current value compared to the threshold value +- Additional contextual information about memory usage - # use the fixture memory_utilization - def test_case_example(duthosts, enum_frontend_dut_hostname, memory_utilization): - ... - for memory_item in per_test_case_config: - is_exist = False - # for exist memory item - for i, exist_commands in enumerate(memory_monitor.commands): - exist_name, exist_cmd, exist_memory_params, exist_memory_check = exist_commands - if memory_item["name"] == exist_name: - memory_monitor.commands[i] = ( - exist_name, - exist_cmd, - memory_item["memory_params"], - exist_memory_check - ) - is_exist = True - break - # if memory item not exist, register a new memory item. - if not is_exist: - memory_monitor.register_command(memory_item["name"], memory_item["cmd"], memory_item["memory_params"], memory_item["memory_check"]) - output = memory_monitor.execute_command(memory_item["cmd"]) - - initial_memory_value[memory_item["name"]] = memory_monitor.run_command_parser_function(memory_item["name"], output) - ``` +Example failure message: +``` +[ALARM]: frr_bgp:used memory usage increased by 15.5, exceeds increase threshold 5% +``` + +Common troubleshooting steps: +1. Check if the thresholds are appropriate for your test environment +2. Verify if the memory increase is expected due to test operations +3. Analyze the DUT logs for potential memory leaks or resource issues +4. For persistent issues, consider increasing the threshold or disabling the specific check diff --git a/tests/common/plugins/memory_utilization/__init__.py b/tests/common/plugins/memory_utilization/__init__.py index a0dfe0bb4bd..cf5f56d93a3 100644 --- a/tests/common/plugins/memory_utilization/__init__.py +++ b/tests/common/plugins/memory_utilization/__init__.py @@ -2,6 +2,9 @@ import pytest from tests.common.plugins.memory_utilization.memory_utilization import MemoryMonitor +logger = logging.getLogger(__name__) +logger.setLevel(logging.INFO) + def pytest_addoption(parser): parser.addoption( @@ -17,7 +20,7 @@ def store_fixture_values(request, duthosts, memory_utilization): if request.config.getoption("--disable_memory_utilization") or "disable_memory_utilization" in request.keywords: return - logging.info("store memory_utilization {}".format(request.node.name)) + logger.debug("store memory_utilization {}".format(request.node.name)) request.config.store_duthosts = duthosts request.config.store_memory_utilization = memory_utilization @@ -31,7 +34,7 @@ def pytest_runtest_setup(item): "disable_memory_utilization" in request.keywords: return - logging.info("collect memory before test {}".format(item.name)) + logger.debug("collect memory before test {}".format(item.name)) duthosts = getattr(item.config, 'store_duthosts', None) memory_utilization = getattr(item.config, 'store_memory_utilization', None) @@ -39,8 +42,8 @@ def pytest_runtest_setup(item): return memory_monitors, memory_values = memory_utilization - - logging.debug("memory_values {} ".format(memory_values)) + logger.debug("Memory monitors ready: {}".format(list(memory_monitors.keys()) if memory_monitors else "None")) + logger.debug("memory_values {} ".format(memory_values)) for duthost in duthosts: if duthost.topo_type == 't2': @@ -48,10 +51,14 @@ def pytest_runtest_setup(item): # Initial memory check for all registered commands for name, cmd, memory_params, memory_check in memory_monitors[duthost.hostname].commands: - output = memory_monitors[duthost.hostname].execute_command(cmd) - memory_values["before_test"][duthost.hostname][name] = memory_check(output, memory_params) + try: + output = memory_monitors[duthost.hostname].execute_command(cmd) + memory_values["before_test"][duthost.hostname][name] = memory_check(output, memory_params) + except Exception as e: + logger.warning("Error collecting initial memory data for {}: {}".format(name, str(e))) + memory_values["before_test"][duthost.hostname][name] = {} - logging.info("Before test: collected memory_values {}".format(memory_values)) + logger.info("Before test: collected memory_values {}".format(memory_values)) @pytest.hookimpl(tryfirst=True) @@ -63,7 +70,7 @@ def pytest_runtest_teardown(item, nextitem): "disable_memory_utilization" in request.keywords: return - logging.info("collect memory after test {}".format(item.name)) + logger.debug("collect memory after test {}".format(item.name)) duthosts = getattr(item.config, 'store_duthosts', None) memory_utilization = getattr(item.config, 'store_memory_utilization', None) @@ -72,27 +79,37 @@ def pytest_runtest_teardown(item, nextitem): memory_monitors, memory_values = memory_utilization - logging.debug("memory_values {} ".format(memory_values)) - for duthost in duthosts: if duthost.topo_type == 't2': continue # memory check for all registered commands for name, cmd, memory_params, memory_check in memory_monitors[duthost.hostname].commands: - output = memory_monitors[duthost.hostname].execute_command(cmd) - memory_values["after_test"][duthost.hostname][name] = memory_check(output, memory_params) - - memory_monitors[duthost.hostname].check_memory_thresholds( - memory_values["after_test"][duthost.hostname], memory_values["before_test"][duthost.hostname]) - - logging.info("After test: collected memory_values {}".format(memory_values)) + try: + output = memory_monitors[duthost.hostname].execute_command(cmd) + memory_values["after_test"][duthost.hostname][name] = memory_check(output, memory_params) + except Exception as e: + logger.warning("Error collecting final memory data for {}: {}".format(name, str(e))) + memory_values["after_test"][duthost.hostname][name] = {} + + # Only check thresholds if we have data to compare + if any(memory_values["before_test"][duthost.hostname]) and any(memory_values["after_test"][duthost.hostname]): + try: + memory_monitors[duthost.hostname].check_memory_thresholds( + memory_values["after_test"][duthost.hostname], memory_values["before_test"][duthost.hostname]) + except Exception as e: + logger.error("Error checking memory thresholds: {}".format(str(e))) + # Don't fail the test for threshold checking errors + else: + logger.warning("Skipping memory threshold check for {} due to missing data".format(duthost.hostname)) + + logger.info("After test: collected memory_values {}".format(memory_values)) @pytest.fixture(autouse=True) def memory_utilization(duthosts, request): if request.config.getoption("--disable_memory_utilization") or "disable_memory_utilization" in request.keywords: - logging.info("Memory utilization is disabled") + logger.info("Memory utilization monitoring is disabled") yield None, None return @@ -105,9 +122,11 @@ def memory_utilization(duthosts, request): memory_monitor = MemoryMonitor(ansible_host=duthost) memory_values["before_test"][duthost.hostname] = {} memory_values["after_test"][duthost.hostname] = {} - logging.info("Hostname: {}, Hwsku: {}, Platform: {}".format( + logger.info("Hostname: {}, Hwsku: {}, Platform: {}".format( duthost.hostname, duthost.sonichost._facts["hwsku"], duthost.sonichost._facts["platform"])) memory_monitor.parse_and_register_commands(hwsku=duthost.sonichost._facts["hwsku"]) memory_monitors[duthost.hostname] = memory_monitor yield memory_monitors, memory_values + + logger.debug("Memory utilization fixture cleanup") diff --git a/tests/common/plugins/memory_utilization/memory_utilization.py b/tests/common/plugins/memory_utilization/memory_utilization.py index 4e1338f19dc..f83a0944ad0 100755 --- a/tests/common/plugins/memory_utilization/memory_utilization.py +++ b/tests/common/plugins/memory_utilization/memory_utilization.py @@ -5,6 +5,7 @@ import pytest logger = logging.getLogger(__name__) +logger.setLevel(logging.INFO) MEMORY_UTILIZATION_COMMON_JSON_FILE = join(split(__file__)[0], "memory_utilization_common.json") MEMORY_UTILIZATION_DEPENDENCE_JSON_FILE = join(split(__file__)[0], "memory_utilization_dependence.json") @@ -12,29 +13,52 @@ class MemoryMonitor: def __init__(self, ansible_host): + logger.debug("Initializing MemoryMonitor for host: {}".format(ansible_host.hostname)) self.ansible_host = ansible_host self.commands = [] self.memory_values = {} def register_command(self, name, cmd, memory_params, memory_check_fn): """Register a command with its associated memory parameters and check function.""" + logger.info( + "Registering command: name={}, cmd={}, memory_params={}, " + "memory_check={}".format(name, cmd, memory_params, memory_check_fn) + ) self.commands.append((name, cmd, memory_params, memory_check_fn)) self.memory_values[name] = {} def execute_command(self, cmd): """Execute a shell command and return its output.""" - response = self.ansible_host.command(cmd, module_ignore_errors=True) - stdout = response.get('stdout', None) - # logger.debug("Command '{}' response: {}".format(cmd, stdout)) - return stdout + logger.debug("Executing command: {}".format(cmd)) + try: + response = self.ansible_host.command(cmd, module_ignore_errors=True) + stdout = response.get('stdout', '') + if stdout: + logger.debug("Command output length: {} bytes".format(len(stdout))) + else: + logger.warning("Command '{}' returned no output".format(cmd)) + return stdout or "" # Ensure we always return at least an empty string + except Exception as e: + logger.warning("Error executing command '{}': {}".format(cmd, str(e))) + return "" # Return empty string on error def check_memory_thresholds(self, current_values, previous_values): """Check memory usage against thresholds. """ + logger.debug("Starting memory threshold check") logger.debug("Previous values: {}".format(previous_values)) logger.debug("Current values: {}".format(current_values)) for name, cmd, memory_params, memory_check_fn in self.commands: + logger.debug("Checking thresholds for command: {}".format(name)) + for mem_item, thresholds in memory_params.items(): + logger.debug("Processing memory item: {}".format(mem_item)) + + # Convert thresholds to structured format for consistency + logger.debug("Original thresholds: {}".format(thresholds)) + normalized_thresholds = self._normalize_thresholds(thresholds) + logger.debug("Normalized thresholds: {}".format(normalized_thresholds)) + current_value = float(current_values.get(name, {}).get(mem_item, 0)) previous_value = float(previous_values.get(name, {}).get(mem_item, 0)) @@ -42,55 +66,233 @@ def check_memory_thresholds(self, current_values, previous_values): logger.warning("Skipping memory check for {}-{} due to zero value".format(name, mem_item)) continue - high_threshold = float(thresholds.get("memory_high_threshold", float('inf'))) - increase_threshold = float(thresholds.get("memory_increase_threshold", float('inf'))) - - if previous_value > high_threshold: - self.handle_memory_threshold_exceeded( - name, mem_item, previous_value, high_threshold, - previous_values, current_values, is_current=False - ) - - if current_value > high_threshold: - self.handle_memory_threshold_exceeded( - name, mem_item, current_value, high_threshold, - previous_values, current_values, is_current=True - ) + logger.debug("Processing thresholds for {}:{} - previous: {}, current: {}".format( + name, mem_item, previous_value, current_value)) + + # Skip high threshold check if explicitly set to null + high_threshold_raw = normalized_thresholds.get("memory_high_threshold", None) + if high_threshold_raw is not None: + logger.debug("Raw high threshold for {}:{}: {}".format(name, mem_item, high_threshold_raw)) + high_threshold = self._parse_threshold(high_threshold_raw, previous_value) + logger.debug("Calculated high threshold for {}:{}: {}".format(name, mem_item, high_threshold)) + + if previous_value > high_threshold: + self._handle_memory_threshold_exceeded( + name, mem_item, previous_value, high_threshold_raw, + previous_values, current_values, is_current=False + ) + + if current_value > high_threshold: + self._handle_memory_threshold_exceeded( + name, mem_item, current_value, high_threshold_raw, + previous_values, current_values, is_current=True + ) + + # Get increase threshold and determine if it's a percentage or absolute value + increase_threshold_raw = normalized_thresholds.get("memory_increase_threshold", float('inf')) + logger.debug("Raw increase threshold for {}:{}: {}".format(name, mem_item, increase_threshold_raw)) + increase_threshold = self._parse_threshold(increase_threshold_raw, previous_value) + logger.info("Calculated increase threshold for {}:{}: {}".format(name, mem_item, increase_threshold)) increase = current_value - previous_value if increase > increase_threshold: - self.handle_memory_threshold_exceeded( - name, mem_item, increase, increase_threshold, + self._handle_memory_threshold_exceeded( + name, mem_item, increase, increase_threshold_raw, previous_values, current_values, is_increase=True ) - def handle_memory_threshold_exceeded(self, name, mem_item, value, threshold, - previous_values, current_values, is_current=False, is_increase=False): + def _normalize_thresholds(self, thresholds): + """Normalize threshold values to structured format.""" + normalized = {} + + for key, value in thresholds.items(): + if isinstance(value, (int, float)): + logger.warning("Legacy threshold format detected: {}={}. Please use structured format.".format( + key, value)) + normalized[key] = {"type": "value", "value": value} + logger.debug("Converted simple value {} to structured format: {}".format( + value, normalized[key])) + elif isinstance(value, str) and value.endswith('%'): + logger.warning("Legacy threshold format detected: {}={}. Please use structured format.".format( + key, value)) + normalized[key] = {"type": "percentage", "value": value} + logger.debug("Converted percentage string {} to structured format: {}".format( + value, normalized[key])) + else: + normalized[key] = value + + return normalized + + def _parse_threshold(self, threshold, base_value): + """ + Parse threshold value which can be either: + 1. A dict with type and value fields + 2. A list of dicts for multiple threshold types + + Returns the most restrictive calculated threshold value. + """ + logger.debug("Parsing threshold: {} (type: {}) with base value: {}".format( + threshold, type(threshold).__name__, base_value)) + + # Handle structured format + if isinstance(threshold, dict) and 'type' in threshold and 'value' in threshold: + threshold_type = threshold['type'] + threshold_value = threshold['value'] + logger.debug("Structured threshold - type: {}, value: {}".format(threshold_type, threshold_value)) + + if threshold_type == 'percentage': + try: + # Strip % if present + if isinstance(threshold_value, str) and threshold_value.endswith('%'): + percentage = float(threshold_value.rstrip('%')) + else: + percentage = float(threshold_value) + + # Validate percentage is in reasonable range + if percentage < 0 or percentage > 100: + logger.warning("Percentage threshold outside normal range (0-100): {}%".format(percentage)) + + calculated = (percentage / 100.0) * base_value + logger.debug("Calculated percentage threshold: {}% of {} = {}".format( + percentage, base_value, calculated)) + return calculated + except (ValueError, TypeError) as e: + logger.error("Failed to process percentage threshold: {}".format(str(e)), exc_info=True) + return float('inf') + elif threshold_type == 'value': + try: + value = float(threshold_value) + if value < 0: + logger.warning("Negative threshold value: {}".format(value)) + logger.debug("Using absolute value: {}".format(value)) + return value + except (ValueError, TypeError) as e: + logger.error("Failed to process value threshold: {}".format(str(e)), exc_info=True) + return float('inf') + else: + logger.error("Unknown threshold type: {}".format(threshold_type)) + return float('inf') + + # Handle list format (multiple threshold types) + elif isinstance(threshold, list): + logger.debug("Processing a list of {} thresholds".format(len(threshold))) + thresholds = [] + for i, t in enumerate(threshold): + if isinstance(t, dict) and 'type' in t and 'value' in t: + parsed = self._parse_threshold(t, base_value) + logger.debug("List item {}: parsed value = {}".format(i, parsed)) + thresholds.append(parsed) + else: + logger.warning("Skipping invalid threshold list item: {}".format(t)) + + # Return the most restrictive (smallest) threshold + min_value = min(thresholds) if thresholds else float('inf') + logger.debug("Selected minimum threshold from list: {} (from values: {})".format( + min_value, thresholds)) + return min_value + + # Handle deprecated formats with warning + elif isinstance(threshold, (int, float, str)): + logger.warning( + "Using deprecated threshold format: {}. Please update to structured format.".format(threshold)) + + # For backward compatibility + if isinstance(threshold, str) and threshold.endswith('%'): + try: + percentage = float(threshold.rstrip('%')) + calculated = (percentage / 100.0) * base_value + logger.debug("Calculated legacy percentage threshold: {}% of {} = {}".format( + percentage, base_value, calculated)) + return calculated + except (ValueError, TypeError) as e: + logger.error("Invalid percentage threshold string: {} - Error: {}".format(threshold, e)) + return float('inf') + else: + # Simple value + try: + value = float(threshold) + logger.debug("Using legacy absolute threshold: {}".format(value)) + return value + except (ValueError, TypeError) as e: + logger.error("Invalid threshold value: {} - Error: {}".format(threshold, e)) + return float('inf') + else: + logger.warning("Unsupported threshold format: {}".format(threshold)) + return float('inf') + + def _handle_memory_threshold_exceeded(self, name, mem_item, value, threshold, + previous_values, current_values, is_current=False, is_increase=False): """Handle memory threshold or increase exceeded.""" logger.info("{}:{}, previous_values: {}".format(name, mem_item, previous_values)) logger.info("{}:{}, current_values: {}".format(name, mem_item, current_values)) + # Format threshold for display in a more readable format + threshold_str = self._format_threshold_for_display(threshold) + logger.debug("Threshold exceeded - measured value: {}, formatted threshold: {}".format( + value, threshold_str)) + if is_increase: message = ( "[ALARM]: {}:{} memory usage increased by {}, " "exceeds increase threshold {}".format( - name, mem_item, value, threshold + name, mem_item, value, threshold_str ) ) else: message = ( "[ALARM]: {}:{}, {} memory usage {} exceeds " "high threshold {}".format( - name, mem_item, "Current" if is_current else "Previous", value, threshold + name, mem_item, "Current" if is_current else "Previous", value, threshold_str ) ) - logger.warning(message) - pytest.fail(message) + # Not return failure on Virtual Switch + asic_type = self.ansible_host.facts['asic_type'] + if asic_type == "vs": + logger.warning(message) + else: + logger.error(message) + pytest.fail(message) + + def _format_threshold_for_display(self, threshold): + """Format a threshold value for better readability in messages.""" + logger.debug("Formatting threshold for display: {} (type: {})".format(threshold, type(threshold).__name__)) + + if isinstance(threshold, dict) and 'type' in threshold and 'value' in threshold: + if threshold['type'] == 'percentage': + # Ensure the percentage has a % symbol + value = threshold['value'] + if isinstance(value, str) and value.endswith('%'): + formatted = value + else: + formatted = "{}%".format(value) + logger.debug("Formatted percentage threshold as: {}".format(formatted)) + return formatted + else: + formatted = str(threshold['value']) + logger.debug("Formatted value threshold as: {}".format(formatted)) + return formatted + elif isinstance(threshold, list): + # Format a list of thresholds + logger.debug("Formatting list of {} thresholds".format(len(threshold))) + formatted = [] + for t in threshold: + item_format = self._format_threshold_for_display(t) + logger.debug("Formatted list item as: {}".format(item_format)) + formatted.append(item_format) + result = ", ".join(formatted) + logger.debug("Joined list thresholds as: {}".format(result)) + return result + else: + # Simple value or percentage string + result = str(threshold) + logger.debug("Formatted simple threshold as: {}".format(result)) + return result def parse_and_register_commands(self, hwsku=None): """Initialize the MemoryMonitor by reading commands from JSON files and registering them.""" + logger.debug("Loading memory monitoring commands for hwsku: {}".format(hwsku)) parameter_dict = {} with open(MEMORY_UTILIZATION_COMMON_JSON_FILE, 'r') as file: @@ -129,7 +331,6 @@ def parse_and_register_commands(self, hwsku=None): for key, value in data["HWSKU"].items(): if hwsku in value: for item in data[key]: - logger.info("#### CMD {} ".format(item)) name = item["name"] command = item["cmd"] memory_params = item["memory_params"] @@ -142,18 +343,21 @@ def parse_and_register_commands(self, hwsku=None): } for param in parameter_dict.values(): - logger.debug( - "Registering command: name={}, cmd={}, memory_params={}, " - "memory_check={}".format( - param['name'], param['cmd'], param['memory_params'], param['memory_check_fn'] - ) - ) + # Normalize thresholds in memory_params to ensure consistent behavior + for mem_item, thresholds in param['memory_params'].items(): + param['memory_params'][mem_item] = self._normalize_thresholds(thresholds) + self.register_command(param['name'], param['cmd'], param['memory_params'], eval(param['memory_check_fn'])) def parse_top_output(output, memory_params): """Parse the 'top' command output to extract memory usage information.""" memory_values = {} + + if not output: + logger.warning("Empty output for top command, returning empty values") + return memory_values + headers = [] length = 0 for line in output.split('\n'): @@ -169,9 +373,9 @@ def parse_top_output(output, memory_params): for mem_item, thresholds in memory_params.items(): if mem_item in process_info["COMMAND"]: if mem_item in memory_values: - memory_values[mem_item] += int(process_info["RES"]) + memory_values[mem_item] += float(int(process_info["RES"]) / 1024) else: - memory_values[mem_item] = int(process_info["RES"]) + memory_values[mem_item] = float(int(process_info["RES"]) / 1024) logger.debug("Parsed memory values: {}".format(memory_values)) return memory_values @@ -180,6 +384,11 @@ def parse_top_output(output, memory_params): def parse_free_output(output, memory_params): """Parse the 'free' command output to extract memory usage information.""" memory_values = {} + + if not output: + logger.warning("Empty output for free command, returning empty values") + return memory_values + headers, Mem, Swap = [], [], [] for line in output.split('\n'): if "total" in line: @@ -202,6 +411,11 @@ def parse_free_output(output, memory_params): def parse_monit_status_output(output, memory_params): """Parse the 'monit status' command output to extract memory usage information.""" memory_values = {} + + if not output: + logger.warning("Empty output for monit status command, returning empty values") + return memory_values + memory_pattern = r"memory usage\s+([\d\.]+ \w+)\s+\[(\d+\.\d+)%\]" swap_pattern = r"swap usage\s+([\d\.]+ \w+)\s+\[(\d+\.\d+)%\]" @@ -227,7 +441,13 @@ def parse_monit_status_output(output, memory_params): def parse_docker_stats_output(output, memory_params): + """Parse the 'docker stats' command output to extract memory usage information.""" memory_values = {} + + if not output: + logger.warning("Empty output for docker stats command, returning empty values") + return memory_values + length = 0 pattern = r"(\d+\.\d+)%.*?(\d+\.\d+)%" @@ -251,3 +471,42 @@ def parse_docker_stats_output(output, memory_params): logger.debug("Parsed memory values: {}".format(memory_values)) return memory_values + + +def parse_frr_memory_output(output, memory_params): + """Parse the 'vtysh -c "show memory bgp/zebra"' output to extract FRR daemon memory usage.""" + memory_values = {} + + if not output: + logger.warning("Empty output for FRR memory command, returning empty values") + return memory_values + + unit_multipliers = { + 'bytes': 1, + 'KiB': 1024, + 'MiB': 1024 * 1024, + 'GiB': 1024 * 1024 * 1024 + } + + for line in output.split('\n'): + if "Used ordinary blocks:" in line: + parts = line.split() + if len(parts) >= 5: + try: + value = float(parts[3]) + unit = parts[4] + + if unit in unit_multipliers: + memory_bytes = value * unit_multipliers[unit] + # Convert to MB for consistent measurement + memory_values['used'] = memory_bytes / (1024 * 1024) + else: + logger.warning("Unknown memory unit: {}, treating as bytes".format(unit)) + memory_values['used'] = value / (1024 * 1024) + + except (ValueError, TypeError) as e: + logger.error("Failed to parse FRR memory value: {}".format(e)) + break + + logger.debug("Parsed FRR memory values: {}".format(memory_values)) + return memory_values diff --git a/tests/common/plugins/memory_utilization/memory_utilization_common.json b/tests/common/plugins/memory_utilization/memory_utilization_common.json index b4137abcb00..8c0fa0649c7 100755 --- a/tests/common/plugins/memory_utilization/memory_utilization_common.json +++ b/tests/common/plugins/memory_utilization/memory_utilization_common.json @@ -5,8 +5,14 @@ "cmd": "sudo monit status", "memory_params": { "memory_usage": { - "memory_increase_threshold": 50, - "memory_high_threshold": 90 + "memory_increase_threshold": { + "type": "value", + "value": 50 + }, + "memory_high_threshold": { + "type": "value", + "value": 90 + } } }, "memory_check": "parse_monit_status_output" diff --git a/tests/common/plugins/memory_utilization/memory_utilization_dependence.json b/tests/common/plugins/memory_utilization/memory_utilization_dependence.json index 0967ef424bc..8d22c97af06 100755 --- a/tests/common/plugins/memory_utilization/memory_utilization_dependence.json +++ b/tests/common/plugins/memory_utilization/memory_utilization_dependence.json @@ -1 +1,219 @@ -{} +{ + "HWSKU": { + "Arista-7050QX": ["Arista-7050-QX-32S", "Arista-7050-QX32", "Arista-7050QX-32S-S4Q31", "Arista-7050QX32S-Q32"] + }, + "COMMON": [ + { + "name": "top", + "cmd": "top -b -n 1", + "memory_params": { + "bgpd": { + "memory_increase_threshold": { + "type": "value", + "value": 128 + }, + "memory_high_threshold": null + }, + "zebra": { + "memory_increase_threshold": { + "type": "value", + "value": 64 + }, + "memory_high_threshold": null + } + }, + "memory_check": "parse_top_output" + }, + { + "name": "free", + "cmd": "free -m", + "memory_params": { + "used": { + "memory_increase_threshold": { + "type": "percentage", + "value": "20%" + }, + "memory_high_threshold": null + } + }, + "memory_check": "parse_free_output" + }, + { + "name": "monit", + "cmd": "sudo monit status", + "memory_params": { + "memory_usage": { + "memory_increase_threshold": { + "type": "value", + "value": 10 + }, + "memory_high_threshold": { + "type": "value", + "value": 70 + } + } + }, + "memory_check": "parse_monit_status_output" + }, + { + "name": "docker", + "cmd": "docker stats --no-stream", + "memory_params": { + "snmp": { + "memory_increase_threshold": { + "type": "value", + "value": 2 + }, + "memory_high_threshold": { + "type": "value", + "value": 4 + } + }, + "pmon": { + "memory_increase_threshold": { + "type": "value", + "value": 2 + }, + "memory_high_threshold": { + "type": "value", + "value": 8 + } + }, + "lldp": { + "memory_increase_threshold": { + "type": "value", + "value": 2 + }, + "memory_high_threshold": { + "type": "value", + "value": 4 + } + }, + "gnmi": { + "memory_increase_threshold": { + "type": "value", + "value": 2 + }, + "memory_high_threshold": { + "type": "value", + "value": 6 + } + }, + "radv": { + "memory_increase_threshold": { + "type": "value", + "value": 2 + }, + "memory_high_threshold": { + "type": "value", + "value": 3 + } + }, + "syncd": { + "memory_increase_threshold": { + "type": "value", + "value": 5 + }, + "memory_high_threshold": { + "type": "value", + "value": 18 + } + }, + "bgp": { + "memory_increase_threshold": { + "type": "value", + "value": 4 + }, + "memory_high_threshold": { + "type": "value", + "value": 14 + } + }, + "teamd": { + "memory_increase_threshold": { + "type": "value", + "value": 2 + }, + "memory_high_threshold": { + "type": "value", + "value": 5 + } + }, + "swss": { + "memory_increase_threshold": { + "type": "value", + "value": 3 + }, + "memory_high_threshold": { + "type": "value", + "value": 8 + } + }, + "database": { + "memory_increase_threshold": { + "type": "value", + "value": 2 + }, + "memory_high_threshold": { + "type": "value", + "value": 6 + } + } + }, + "memory_check": "parse_docker_stats_output" + }, + { + "name": "frr_bgp", + "cmd": "vtysh -c \"show memory bgp\"", + "memory_params": { + "used": { + "memory_increase_threshold": [ + {"type": "percentage", "value": "50%"}, + {"type": "value", "value": 32} + ], + "memory_high_threshold": { + "type": "value", + "value": 128 + } + } + }, + "memory_check": "parse_frr_memory_output" + }, + { + "name": "frr_zebra", + "cmd": "vtysh -c \"show memory zebra\"", + "memory_params": { + "used": { + "memory_increase_threshold": [ + {"type": "percentage", "value": "50%"}, + {"type": "value", "value": 16} + ], + "memory_high_threshold": { + "type": "value", + "value": 128 + } + } + }, + "memory_check": "parse_frr_memory_output" + } + ], + "Arista-7050QX": [ + { + "name": "monit", + "cmd": "sudo monit status", + "memory_params": { + "memory_usage": { + "memory_increase_threshold": { + "type": "value", + "value": 10 + }, + "memory_high_threshold": { + "type": "value", + "value": 85 + } + } + }, + "memory_check": "parse_monit_status_output" + } + ] +}