From 1b267d609d4d61ff3c379e0c0034afa635394ac8 Mon Sep 17 00:00:00 2001 From: Arham-Nasir Date: Thu, 10 Apr 2025 20:24:46 +0500 Subject: [PATCH 1/5] Add HLD for Memory Statistics enhancement with new metrics, leak detection, and gNMI access Signed-off-by: Arham-Nasir --- .../images/gnmi_sequence_diagram.svg | 14 + .../mem_stats_architecture_diagram_v2.svg | 4 + .../images/mem_stats_configuration.svg | 4 - .../memory_statistics_enhancement.md | 377 ++++++++++++++++++ 4 files changed, 395 insertions(+), 4 deletions(-) create mode 100644 doc/memory_statistics/images/gnmi_sequence_diagram.svg create mode 100644 doc/memory_statistics/images/mem_stats_architecture_diagram_v2.svg delete mode 100644 doc/memory_statistics/images/mem_stats_configuration.svg create mode 100644 doc/memory_statistics/memory_statistics_enhancement.md diff --git a/doc/memory_statistics/images/gnmi_sequence_diagram.svg b/doc/memory_statistics/images/gnmi_sequence_diagram.svg new file mode 100644 index 00000000000..724ced5842a --- /dev/null +++ b/doc/memory_statistics/images/gnmi_sequence_diagram.svg @@ -0,0 +1,14 @@ + + + + + + + + + + + + + + diff --git a/doc/memory_statistics/images/mem_stats_architecture_diagram_v2.svg b/doc/memory_statistics/images/mem_stats_architecture_diagram_v2.svg new file mode 100644 index 00000000000..1ce37245ec4 --- /dev/null +++ b/doc/memory_statistics/images/mem_stats_architecture_diagram_v2.svg @@ -0,0 +1,4 @@ + + + +
Stores User-specific Configurations
Signals Daemon to Reload its Configurations
Socket 
Outside SONiC
gNMI Monitoring Client
(Monitoring System/User)
gNMI Server
(Processes gNMI Requests)
Send gNMI Request
Send gNMI Response
Extract Memory Stats
User Query via gNMI
User
Fetch Memory Data
SONiC System
MemoryStatsd Process
(Collects and Logs
Memory Data)
Data Storage
(Stores Memory Statistics Compressed Data in Persistent Memory) 
Unix Socket Communication
(Handles IPC for Data Transmission)
CLI Tools
(User Interface for Custom Memory Data Display and Configuration)
External User
Config_db
(memory_statistics_table
stores memory-stats configuration)
Input Commands
Display Results
in Config_db
Fetches Memory Statistics Table
Monitors changes
 
Communication 
Requests Data Query for Show Commands
Responds to Data Query for Show Commands
Logs Memory Data
Fetches Memory Data
Memory Stats
HostConfigDaemon
(Listens to changes in ConfigDB, detects updates, and signals MemoryStatsd process)
\ No newline at end of file diff --git a/doc/memory_statistics/images/mem_stats_configuration.svg b/doc/memory_statistics/images/mem_stats_configuration.svg deleted file mode 100644 index 25ca673bef2..00000000000 --- a/doc/memory_statistics/images/mem_stats_configuration.svg +++ /dev/null @@ -1,4 +0,0 @@ - - - - \ No newline at end of file diff --git a/doc/memory_statistics/memory_statistics_enhancement.md b/doc/memory_statistics/memory_statistics_enhancement.md new file mode 100644 index 00000000000..bf70ac124f0 --- /dev/null +++ b/doc/memory_statistics/memory_statistics_enhancement.md @@ -0,0 +1,377 @@ +# Memory Statistics Enhancement with New Metrics, Leak Detection, and gNMI Access + +## Revision History + +| Revision No. | Description | Editor | Date | +|-------------|------------------|--------------------------------------------|---------------| +| 1.0 | Document Creation | Hamza Hashmi, Arham Nasir and Kanza Latif | 09 April 2025 | + +## Table of Contents + +- [Scope](#scope) +- [Definitions/Abbreviations](#definitionsabbreviations) +- [Overview](#overview) +- [Functional Requirements](#functional-requirements) +- [Architecture Design](#architecture-design) +- [High-Level Design](#high-level-design) + - [Core Functionalities](#core-functionalities) + - [Sequence Diagrams](#sequence-diagrams) +- [SAI API](#sai-api) +- [Configuration and Management](#configuration-and-management) + - [CLI/YANG Model Enhancements](#cliyang-model-enhancements) + - [CLI Commands](#cli-commands) + - [Config Commands](#config-commands) + - [Show Commands](#show-commands) + - [Daemon Configuration Management](#daemon-configuration-management) + - [Config DB Enhancements](#config-db-enhancements) +- [Warmboot and Fastboot Design Impact](#warmboot-and-fastboot-design-impact) +- [Testing Requirements/Design](#testing-requirementsdesign) + - [Unit Test Cases](#unit-test-cases) + - [System Test Cases](#system-test-cases) +- [Future Work](#future-work) + +--- + +## Scope + +This High-Level Design (HLD) document enhances the existing Memory Statistics feature in SONiC by extending memory metrics to include Docker, process, and CPU memory, adding memory leak detection, and enabling remote log access via gNMI. + +--- + +## Definitions/Abbreviations + +| Sr No. | Term | Definition | +|--------|-------------|------------| +| 1 | gNMI | Network Management Interface, a protocol for network management and telemetry | +| 2 | gRPC | Remote Procedure Call, a high-performance framework for communication between network components | +| 3 | Telemetry | A mechanism to collect and export real-time operational data from the system | +| 4 | Log Processing | The method of formatting and organizing log files for efficient retrieval and analysis | + +--- + +## Overview +This High-Level Design (HLD) enhances SONiC’s Memory Statistics feature, originally limited to system-level memory monitoring via CLI, into a comprehensive solution that now includes Docker containers, individual processes, and CPU memory metrics. It introduces memory leak detection to prevent resource exhaustion and integrates gNMI for remote log access, reducing local dependency. This upgrade provides complete visibility across memory metrics, early memory leak detection, with customizable sampling and retention periods and efficient remote retrieval. + +--- + +## Functional Requirements +This section outlines the functional requirements necessary for implementing this HLD in SONiC. + +- **Monitoring Capabilities:** The system must monitor memory metrics for system (Total, Used, Free, Available, Cached, Buffers, Shared), Docker containers, individual processes, and CPU usage. +- **Memory Leak Detection:** The feature must analyze memory usage trends over time to detect potential leaks, reporting them via CLI. +- **Configurability:** The system must allow configuration of sampling intervals (3–15 minutes) and retention periods (1–30 days) via CLI. +- **CLI Enhancements:** The CLI must support displaying new metrics and leak analysis outputs with filtering options. +- **Remote Log Access:** gNMI integration must enable retrieval of memory logs from remote systems. + +--- + +## Architecture Design + +The enhancement fits within the existing framework without altering its core structure. The memorystatsd daemon is extended to collect additional metrics and detect leaks, interfacing with hostcfgd for ConfigDB updates. A new gNMI server processes logs into JSON and serves them remotely. This integrates seamlessly with SONiC’s modular design, leveraging existing daemons and adding gNMI capabilities. + +

+ Architecture Diagram v2 +
+ Figure 1: Architecture Diagram +

+ +--- + +## High-Level Design + +### Core Functionalities + +#### Data Collection and Storage +The `memorystatsd` daemon collects system (Total, Used, Free, Available, Cached, Buffers, Shared), Docker, process, and CPU memory metrics using `psutil` and Docker API, storing them as compressed log files for efficiency. + +#### Log Processing and Storage +Logs are processed into JSON for gNMI retrieval, with low overhead. + +#### Memory Leak Detection +Analyzes memory trends by comparing usage over time to detect steady increases, reported via `show memory-stats --type process --leak-analysis` (e.g., "Potential Leak Detected"). + +#### Remote Access via gNMI +The gNMI server, potentially within `memorystatsd`, serves JSON logs via `Get` requests, supporting snapshots or intervals based on time range parameters. + +#### User Interaction +Users view stats, configure settings (reusing `enable/disable`, `sampling_interval` and `retention_period` from [v1](https://github.com/Arham-Nasir/SONiC/blob/4cf0b5d0bc973cf3a72f91b7f0a9567fd42eeccd/doc/memory_statistics/memory_statistics_hld.md)), and analyze leaks via CLI; logs are fetched remotely via gNMI. + +### Sequence Diagrams +- **View Memory Usage**: + - **Description**: Shows the CLI-based retrieval of memory metrics (system, Docker, process, CPU). + - **Diagram**: +

+ View Memory Usage +
+ Figure 2: View Memory Usage +

+- **Memory Leak Detection**: + - **Description**: Depicts the process of trend analysis and leak reporting. + - **Diagram**: +

+ Leak Detection Sequence +
+ Figure 3: Sequence for memory leak detection +

+- **gNMI Log Retrieval**: + - **Description**: Outlines remote log access. + - **Diagram**: +

+ gNMI Log Retrieval Sequence +
+ Figure 4: Sequence for gNMI log retrieval and cleanup +

+ +--- + +## SAI API + +No SAI API changes are required. + +--- + +## Configuration and Management +### CLI/YANG Model Enhancements + +#### CLI Commands + + +#### Config Commands + +##### Config Commands +The following configuration commands are reused from [v1](https://github.com/Arham-Nasir/SONiC/blob/4cf0b5d0bc973cf3a72f91b7f0a9567fd42eeccd/doc/memory_statistics/memory_statistics_hld.md) without modification: +1. **`config memory-stats enable/disable`** + - Enables or disables monitoring (default: disabled). + - Example: `config memory-stats enable` → "Memory statistics monitoring enabled." +2. **`config memory-stats sampling-interval `** + - Sets sampling interval (3–15 minutes, default: 5). + - Example: `config memory-stats sampling-interval 10` → "Sampling interval set to 10 minutes." +3. **`config memory-stats retention-period `** + - Sets retention period (1–30 days, default: 15). + - Example: `config memory-stats retention-period 20` → "Retention period set to 20 days." + +##### Show Commands +Below are all upgraded CLI commands with their definitions and sample outputs, covering system, Docker, process, CPU metrics, and memory leak analysis. + +1. **View Memory Statistics** + - **Command:** `show memory-stats [--type ] [--from ] [--to ] [--select ] [--leak-analysis]` + - **Description:** + - Displays memory statistics for the specified type (system, Docker, process, or CPU; default: system, last 15 days). + - `--type `: Specifies the metric type. + - `--from/--to`: Defines the time range (ISO format or relative, e.g., "5 days ago"). + - `--select `: Filters specific metrics (e.g., total_memory, used_memory, or process/container name/ID). + - `--leak-analysis`: Enables leak detection mode. + - **Sample Outputs:** + + **System Memory Statistics (Default):** + ``` + admin@sonic:~$ show memory-stats --type system + Memory Statistics (System): + Codes: M - minutes, H - hours, D - days + -------------------------------------------------------------------------------- + Report Generated: 2025-03-10 14:30:00 + Analysis Period: From 2025-02-23 14:30:00 to 2025-03-10 14:30:00 + Interval: 2 Days + -------------------------------------------------------------------------------------------------------------------------------------------------- + Metric Current High Low D23-D25 D25-D27 D27-D01 D01-D03 D03-D05 D05-D07 D07-D09 D09-D10 + Value Value Value 23Feb25 25Feb25 27Feb25 01Mar25 03Mar25 05Mar25 07Mar25 09Mar25 + -------------------------------------------------------------------------------------------------------------------------------------------------- + total_memory 15.29GB 15.29GB 15.29GB 15.29GB 15.29GB 15.29GB 15.29GB 15.29GB 15.29GB 15.29GB 15.29GB + used_memory 8.87GB 9.35GB 8.15GB 8.15GB 9.10GB 8.20GB 8.30GB 9.05GB 8.40GB 9.35GB 8.87GB + free_memory 943.92MB 906.28MB 500.00MB 800.00MB 750.00MB 906.28MB 650.00MB 600.00MB 550.00MB 500.00MB 943.92MB + available_memory 4.78GB 4.74GB 4.35GB 4.65GB 4.60GB 4.55GB 4.74GB 4.45GB 4.40GB 4.35GB 4.78GB + cached_memory 5.17GB 5.08GB 4.96GB 5.08GB 5.06GB 5.04GB 5.02GB 5.00GB 4.98GB 4.96GB 5.17GB + buffers_memory 337.83MB 333.59MB 295.00MB 325.00MB 320.00MB 315.00MB 333.59MB 305.00MB 300.00MB 295.00MB 337.83MB + shared_memory 1.31GB 1.22GB 1.08GB 1.22GB 1.20GB 1.18GB 1.15GB 1.12GB 1.10GB 1.08GB 1.31GB + ``` + + **System Memory Statistics (Filtered with Time Range):** + ``` + admin@sonic:~$ show memory-stats --type system --from "5 days ago" --to "now" --select used_memory + Memory Statistics (System): + Codes: M - minutes, H - hours, D - days + -------------------------------------------------------------------------------- + Report Generated: 2025-03-10 14:30:00 + Analysis Period: From 2025-03-05 14:30:00 to 2025-03-10 14:30:00 + Interval: 1 Day + -------------------------------------------------------------------------------------------------------------------------------------------------- + Metric Current High Low D05-D06 D06-D07 D07-D08 D08-D09 D09-D10 + Value Value Value 05Mar25 06Mar25 07Mar25 08Mar25 09Mar25 + -------------------------------------------------------------------------------------------------------------------------------------------------- + used_memory 8.87GB 9.35GB 8.40GB 8.40GB 8.50GB 9.35GB 8.90GB 8.87GB + ``` + + **Docker Memory Statistics (Default):** + ``` + admin@sonic:~$ show memory-stats --type docker + Memory Statistics (Docker): + Codes: M - minutes, H - hours, D - days + -------------------------------------------------------------------------------- + Report Generated: 2025-03-10 14:30:00 + Analysis Period: From 2025-02-23 14:30:00 to 2025-03-10 14:30:00 + Interval: 2 Days + -------------------------------------------------------------------------------------------------------------------------------------------------- + Container Current High Low D23-D25 D25-D27 D27-D01 D01-D03 D03-D05 D05-D07 D07-D09 D09-D10 + Value Value Value 23Feb25 25Feb25 27Feb25 01Mar25 03Mar25 05Mar25 07Mar25 09Mar25 + -------------------------------------------------------------------------------------------------------------------------------------------------- + swss(7ef9d6a) 52.3MB 52.5MB 44.7MB 44.7MB 44.8MB 49.4MB 46.4MB 52.3MB 52.4MB 52.5MB 52.3MB + stp(adf43d2) 22.8MB 23.3MB 22.5MB 22.5MB 22.6MB 22.7MB 22.8MB 22.9MB 23.0MB 23.3MB 22.8MB + telemetry(b752462) 31.9MB 32.8MB 31.9MB 31.9MB 32.0MB 32.1MB 32.2MB 32.3MB 32.5MB 32.8MB 31.9MB + ``` + + **Docker Memory Statistics (Filtered with Time Range):** + ``` + admin@sonic:~$ show memory-stats --type docker --from "23 hours ago" --to "now" --select swss + Memory Statistics (Docker): + Codes: M - minutes, H - hours, D - days + -------------------------------------------------------------------------------- + Report Generated: 2025-03-10 14:30:00 + Analysis Period: From 2025-03-09 15:30:00 to 2025-03-10 14:30:00 + Interval: 3 Hours + -------------------------------------------------------------------------------------------------------------------------------------------------- + Container Current High Low H15-H18 H18-H21 H21-H00 H00-H03 H03-H06 H06-H09 H09-H12 H12-H14 + Value Value Value 09Mar25 09Mar25 09Mar25 10Mar25 10Mar25 10Mar25 10Mar25 10Mar25 + -------------------------------------------------------------------------------------------------------------------------------------------------- + swss(7ef9d6a) 52.3MB 52.5MB 52.2MB 52.2MB 52.3MB 52.4MB 52.5MB 52.4MB 52.3MB 52.3MB 52.3MB + ``` + + **Process Memory Statistics (Default):** + ``` + admin@sonic:~$ show memory-stats --type process + Memory Statistics (Process): + Codes: M - minutes, H - hours, D - days + -------------------------------------------------------------------------------- + Report Generated: 2025-03-10 14:30:00 + Analysis Period: From 2025-02-23 14:30:00 to 2025-03-10 14:30:00 + Interval: 2 Days + -------------------------------------------------------------------------------------------------------------------------------------------------- + Process Current High Low D23-D25 D25-D27 D27-D01 D01-D03 D03-D05 D05-D07 D07-D09 D09-D10 + Value Value Value 23Feb25 25Feb25 27Feb25 01Mar25 03Mar25 05Mar25 07Mar25 09Mar25 + -------------------------------------------------------------------------------------------------------------------------------------------------- + bgpd(6284) 19.2MB 19.2MB 18.4MB 18.4MB 18.5MB 18.6MB 18.7MB 18.8MB 18.9MB 19.2MB 19.2MB + syncd(7889) 504.8MB 504.9MB 504.8MB 504.8MB 504.8MB 504.8MB 504.8MB 504.9MB 504.9MB 504.9MB 504.8MB + python3(14573) 20.3MB 20.3MB 20.3MB 20.3MB 20.3MB 20.3MB 20.3MB 20.3MB 20.3MB 20.3MB 20.3MB + ``` + + **Process Memory Statistics (Filtered with Time Range):** + ``` + admin@sonic:~$ show memory-stats --type process --from "12 hours ago" --to "now" --select bgpd + Memory Statistics (Process): + Codes: M - minutes, H - hours, D - days + -------------------------------------------------------------------------------- + Report Generated: 2025-03-10 14:30:00 + Analysis Period: From 2025-03-10 02:30:00 to 2025-03-10 14:30:00 + Interval: 2 Hours + -------------------------------------------------------------------------------------------------------------------------------------------------- + Process Current High Low H02-H04 H04-H06 H06-H08 H08-H10 H10-H12 H12-H14 + Value Value Value 10Mar25 10Mar25 10Mar25 10Mar25 10Mar25 10Mar25 + -------------------------------------------------------------------------------------------------------------------------------------------------- + bgpd(6284) 19.2MB 19.2MB 19.1MB 19.1MB 19.1MB 19.2MB 19.2MB 19.2MB 19.2MB + ``` + + **CPU Memory Statistics (Default):** + ``` + admin@sonic:~$ show memory-stats --type cpu + Memory Statistics (CPU): + Codes: M - minutes, H - hours, D - days + -------------------------------------------------------------------------------- + Report Generated: 2025-03-10 14:30:00 + Analysis Period: From 2025-02-23 14:30:00 to 2025-03-10 14:30:00 + Interval: 2 Days + -------------------------------------------------------------------------------------------------------------------------------------------------- + Metric Current High Low D23-D25 D25-D27 D27-D01 D01-D03 D03-D05 D05-D07 D07-D09 D09-D10 + Value Value Value 23Feb25 25Feb25 27Feb25 01Mar25 03Mar25 05Mar25 07Mar25 09Mar25 + -------------------------------------------------------------------------------------------------------------------------------------------------- + cpu_memory 2.5GB 2.7GB 2.2GB 2.2GB 2.3GB 2.4GB 2.5GB 2.6GB 2.7GB 2.6GB 2.5GB + ``` + + **CPU Memory Statistics (Filtered with Time Range):** + ``` + admin@sonic:~$ show memory-stats --type cpu --from "20 hours ago" --to "now" + Memory Statistics (CPU): + Codes: M - minutes, H - hours, D - days + -------------------------------------------------------------------------------- + Report Generated: 2025-03-10 14:30:00 + Analysis Period: From 2025-03-09 18:30:00 to 2025-03-10 14:30:00 + Interval: 3 Hours + -------------------------------------------------------------------------------------------------------------------------------------------------- + Metric Current High Low H18-H21 H21-H00 H00-H03 H03-H06 H06-H09 H09-H12 H12-H14 + Value Value Value 09Mar25 09Mar25 10Mar25 10Mar25 10Mar25 10Mar25 10Mar25 + -------------------------------------------------------------------------------------------------------------------------------------------------- + cpu_memory 2.5GB 2.7GB 2.4GB 2.4GB 2.5GB 2.6GB 2.7GB 2.6GB 2.5GB 2.5GB + ``` + + - **Process Memory Statistics with Leak Analysis (Default):** + + ``` + admin@sonic:~$ show memory-stats --type process --leak-analysis + Memory Statistics (Process Leak Analysis): + Codes: M - minutes, H - hours, D - days + -------------------------------------------------------------------------------- + Report Generated: 2025-03-10 14:30:00 + Analysis Period: From 2025-02-23 14:30:00 to 2025-03-10 14:30:00 + Interval: 2 Days + Threshold: 2048 KB + ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ + Process Current High Low D23-D25 D25-D27 D27-D01 D01-D03 D03-D05 D05-D07 D07-D09 D09-D10 Diff Status + Value Value Value 23Feb25 25Feb25 27Feb25 01Mar25 03Mar25 05Mar25 07Mar25 09Mar25 (Total) + ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ + bgpd(7437) 19.2MB 19.2MB 18.4MB 18.4MB 18.5MB 18.6MB 18.7MB 18.8MB 18.9MB 19.2MB 19.2MB 850 KB No Leak + syncd(32734) 5.6MB 5.6MB 752.0KB 752.0KB 1.2MB 2.0MB 3.0MB 4.0MB 4.8MB 5.2MB 5.6MB 4.8 MB Potential Leak Detected + ``` + + - **Process Memory Statistics with Leak Analysis (Filtered with Time Range):** + ``` + admin@sonic:~$ show memory-stats --type process --from "20 minutes ago" --to "now" --select syncd --leak-analysis + Memory Statistics (Process Leak Analysis): + Codes: M - minutes, H - hours, D - days + -------------------------------------------------------------------------------- + Report Generated: 2025-03-10 14:30:00 + Analysis Period: From 2025-03-10 14:10:00 to 2025-03-10 14:30:00 + Interval: 2 Minutes + Threshold: 2048 KB + -------------------------------------------------------------------------------------------------------------------------------------------------- + Process Current High Low M10-M12 M12-M14 M14-M16 M16-M18 M18-M20 Diff Status + Value Value Value 14:10 14:12 14:14 14:16 14:18 (Total) + -------------------------------------------------------------------------------------------------------------------------------------------------- + syncd(32734) 5.6MB 5.6MB 752.0KB 752.0KB 2.0MB 3.5MB 4.8MB 5.6MB 4.8 MB Potential Leak Detected + -------------------------------------------------------------------------------------------------------------------------------------------------- + ``` + +### Daemon Configuration Management +(Reuse original configuration management from [v1](https://github.com/Arham-Nasir/SONiC/blob/4cf0b5d0bc973cf3a72f91b7f0a9567fd42eeccd/doc/memory_statistics/memory_statistics_hld.md)) + +### Config DB Enhancements + +No Config DB enhancements required (Reused from [v1](https://github.com/Arham-Nasir/SONiC/blob/4cf0b5d0bc973cf3a72f91b7f0a9567fd42eeccd/doc/memory_statistics/memory_statistics_hld.md)) + +--- + +## Warmboot and Fastboot Design Impact + +No impact on warmboot/fastboot functionalities. + +--- + +## Testing Requirements/Design + +### Unit Test Cases +| Test Case ID | Description | +|--------------|-----------------------------------------------------------------------------| +| UT11 | Verify CLI to show Docker memory stats | +| UT12 | Verify CLI to show process memory stats with leak analysis | +| UT13 | Verify CLI to show CPU memory stats | +| UT14 | Verify leak detection for a process exceeding threshold | +| UT15 | Verify gNMI log retrieval for all metric types | + +### System Test Cases +| Test Case ID | Description | +|--------------|-----------------------------------------------------------------------------| +| ST2 | Validate end-to-end functionality including gNMI retrieval and log cleanup | +| ST3 | Validate memory leak detection accuracy over a 7-day period | + +--- + +## Future Work +- Add alerting for memory leaks via email/syslog. +- Extend metrics to include GPU memory (if applicable). From 148bfa4c14fd0c7db771006969d302a164a09d76 Mon Sep 17 00:00:00 2001 From: hamzahashmideveloper Date: Thu, 24 Apr 2025 13:45:41 +0500 Subject: [PATCH 2/5] Format changes --- doc/memory_statistics/images/architecture_diagram.svg | 4 ---- doc/memory_statistics/images/architecture_diagram_v2.svg | 4 ++++ .../images/mem_stats_architecture_diagram_v2.svg | 4 ---- 3 files changed, 4 insertions(+), 8 deletions(-) delete mode 100644 doc/memory_statistics/images/architecture_diagram.svg create mode 100644 doc/memory_statistics/images/architecture_diagram_v2.svg delete mode 100644 doc/memory_statistics/images/mem_stats_architecture_diagram_v2.svg diff --git a/doc/memory_statistics/images/architecture_diagram.svg b/doc/memory_statistics/images/architecture_diagram.svg deleted file mode 100644 index f19fffaaf72..00000000000 --- a/doc/memory_statistics/images/architecture_diagram.svg +++ /dev/null @@ -1,4 +0,0 @@ - - - -
SONiC System
SONiC System
MemoryStatsd Process
(Collects and Logs
Memory Data)
MemoryStatsd Process...
Data Storage
(Stores Memory Statistics Compressed Data in Persistent Memory) 
Data Storage...
Unix Socket Communication
(Handles IPC for Data Transmission)
Unix Socket Communicatio...
CLI Tools
(User Interface for Custom Memory Data Display and Configuration)
CLI Tools...
User
User
Config_db
(memory_statistics_table
stores memory-stats configuration)
Config_db...
Input Commands
Input Commands
Display Results
Display Results
Stores User-specific Configurations
Stores User-specific C...
Fetches Memory Statistics Table
Fetches Memory St...
Monitors changes
 in Config_db
Monitors changes...
Signals Daemon to Reload its Configurations
Signals Daemon to Reload i...
Requests Data Query for Show Commands
Requests Data Query f...
Responds to Data Query for Show Commands
Responds to Data Query f...
Socket Communication
Socket Communicat...
Logs Memory Data
Logs Memory Data
Fetches Memory Data
Fetches Memory Data
System Memory Data
System Memory Data
Reads Defaults from Config File 
on Startup
Reads Defaults from Config File...
HostConfigDaemon
(Listens to changes in ConfigDB, detects updates, and signals MemoryStatsd process)
HostConfigDaemon...
Config File 
(Stores default configuration settings that the memstats process reads at restart)
Config File...
Text is not SVG - cannot display
\ No newline at end of file diff --git a/doc/memory_statistics/images/architecture_diagram_v2.svg b/doc/memory_statistics/images/architecture_diagram_v2.svg new file mode 100644 index 00000000000..37362ee8643 --- /dev/null +++ b/doc/memory_statistics/images/architecture_diagram_v2.svg @@ -0,0 +1,4 @@ + + + +
Send gNMI Request
Extract Memory Stats
Inside SONiC
Stores User-specific Configurations
Signals Daemon to Reload its Configurations
Socket 
send gNMI request
send gNMI response
gNMI Monitoring Client
(Monitoring System/User)
gNMI Server
(Processes gNMI Requests)
Outside SONiC
User Query via gNMI
User
Fetch Memory Data
SONiC System
MemoryStatsd Process
(Collects and Logs
Memory Data)
Data Storage
(Stores Memory Statistics Compressed Data in Persistent Memory) 
Unix Socket Communication
(Handles IPC for Data Transmission)
CLI Tools
(User Interface for Custom Memory Data Display and Configuration)
External User
Config_db
(memory_statistics_table
stores memory-stats configuration)
Input Commands
Display Results
in Config_db
Fetches Memory Statistics Table
Monitors changes
 
Communication 
Requests Data Query for Show Commands
Responds to Data Query for Show Commands
Logs Memory Data
Fetches Memory Data
Memory Stats
HostConfigDaemon
(Listens to changes in ConfigDB, detects updates, and signals MemoryStatsd process)
\ No newline at end of file diff --git a/doc/memory_statistics/images/mem_stats_architecture_diagram_v2.svg b/doc/memory_statistics/images/mem_stats_architecture_diagram_v2.svg deleted file mode 100644 index 1ce37245ec4..00000000000 --- a/doc/memory_statistics/images/mem_stats_architecture_diagram_v2.svg +++ /dev/null @@ -1,4 +0,0 @@ - - - -
Stores User-specific Configurations
Signals Daemon to Reload its Configurations
Socket 
Outside SONiC
gNMI Monitoring Client
(Monitoring System/User)
gNMI Server
(Processes gNMI Requests)
Send gNMI Request
Send gNMI Response
Extract Memory Stats
User Query via gNMI
User
Fetch Memory Data
SONiC System
MemoryStatsd Process
(Collects and Logs
Memory Data)
Data Storage
(Stores Memory Statistics Compressed Data in Persistent Memory) 
Unix Socket Communication
(Handles IPC for Data Transmission)
CLI Tools
(User Interface for Custom Memory Data Display and Configuration)
External User
Config_db
(memory_statistics_table
stores memory-stats configuration)
Input Commands
Display Results
in Config_db
Fetches Memory Statistics Table
Monitors changes
 
Communication 
Requests Data Query for Show Commands
Responds to Data Query for Show Commands
Logs Memory Data
Fetches Memory Data
Memory Stats
HostConfigDaemon
(Listens to changes in ConfigDB, detects updates, and signals MemoryStatsd process)
\ No newline at end of file From ca111cdebb3cf8290381c1d209cae81728710f5f Mon Sep 17 00:00:00 2001 From: hamzahashmideveloper Date: Thu, 24 Apr 2025 14:02:52 +0500 Subject: [PATCH 3/5] New Format changes --- .../memory_statistics_enhancement.md | 70 +++++++++---------- 1 file changed, 34 insertions(+), 36 deletions(-) diff --git a/doc/memory_statistics/memory_statistics_enhancement.md b/doc/memory_statistics/memory_statistics_enhancement.md index bf70ac124f0..398b2d076b7 100644 --- a/doc/memory_statistics/memory_statistics_enhancement.md +++ b/doc/memory_statistics/memory_statistics_enhancement.md @@ -4,7 +4,7 @@ | Revision No. | Description | Editor | Date | |-------------|------------------|--------------------------------------------|---------------| -| 1.0 | Document Creation | Hamza Hashmi, Arham Nasir and Kanza Latif | 09 April 2025 | +| 1.0 | Document Creation | Kanza Latif, Hamza Hashmi and Arham Nasir | 09 April 2025 | ## Table of Contents @@ -70,9 +70,9 @@ This section outlines the functional requirements necessary for implementing thi The enhancement fits within the existing framework without altering its core structure. The memorystatsd daemon is extended to collect additional metrics and detect leaks, interfacing with hostcfgd for ConfigDB updates. A new gNMI server processes logs into JSON and serves them remotely. This integrates seamlessly with SONiC’s modular design, leveraging existing daemons and adding gNMI capabilities.

- Architecture Diagram v2 -
- Figure 1: Architecture Diagram + architecture diagram for memory data +
+ Figure 1: Feature architecture diagram showing the unix socket, daemon, ConfigDB, data file and gNMI

--- @@ -100,26 +100,26 @@ Users view stats, configure settings (reusing `enable/disable`, `sampling_interv - **View Memory Usage**: - **Description**: Shows the CLI-based retrieval of memory metrics (system, Docker, process, CPU). - **Diagram**: -

- View Memory Usage +

+ Leak Detection Sequence
Figure 2: View Memory Usage

- **Memory Leak Detection**: - **Description**: Depicts the process of trend analysis and leak reporting. - **Diagram**: -

- Leak Detection Sequence -
- Figure 3: Sequence for memory leak detection -

+

+ Leak Detection Sequence +
+ Figure 3: Sequence for memory leak detection +

- **gNMI Log Retrieval**: - **Description**: Outlines remote log access. - **Diagram**: -

+

gNMI Log Retrieval Sequence
- Figure 4: Sequence for gNMI log retrieval and cleanup + Figure 4: Sequence for gNMI log retrieval

--- @@ -131,39 +131,37 @@ No SAI API changes are required. --- ## Configuration and Management -### CLI/YANG Model Enhancements - -#### CLI Commands +## CLI/YANG Model Enhancements +### CLI Commands #### Config Commands - -##### Config Commands The following configuration commands are reused from [v1](https://github.com/Arham-Nasir/SONiC/blob/4cf0b5d0bc973cf3a72f91b7f0a9567fd42eeccd/doc/memory_statistics/memory_statistics_hld.md) without modification: -1. **`config memory-stats enable/disable`** +1. ##### config memory-stats enable/disable - Enables or disables monitoring (default: disabled). - Example: `config memory-stats enable` → "Memory statistics monitoring enabled." -2. **`config memory-stats sampling-interval `** +2. ##### config memory-stats sampling-interval - Sets sampling interval (3–15 minutes, default: 5). - Example: `config memory-stats sampling-interval 10` → "Sampling interval set to 10 minutes." -3. **`config memory-stats retention-period `** +3. ##### config memory-stats retention-period - Sets retention period (1–30 days, default: 15). - Example: `config memory-stats retention-period 20` → "Retention period set to 20 days." -##### Show Commands +#### Show Commands Below are all upgraded CLI commands with their definitions and sample outputs, covering system, Docker, process, CPU metrics, and memory leak analysis. -1. **View Memory Statistics** - - **Command:** `show memory-stats [--type ] [--from ] [--to ] [--select ] [--leak-analysis]` - - **Description:** +1. ##### View Memory Statistics + ##### Command: + `show memory-stats [--type ] [--from ] [--to ] [--select ] [--leak-analysis]` + - ##### Description: - Displays memory statistics for the specified type (system, Docker, process, or CPU; default: system, last 15 days). - `--type `: Specifies the metric type. - `--from/--to`: Defines the time range (ISO format or relative, e.g., "5 days ago"). - `--select `: Filters specific metrics (e.g., total_memory, used_memory, or process/container name/ID). - `--leak-analysis`: Enables leak detection mode. - - **Sample Outputs:** + - ##### Sample Outputs: - **System Memory Statistics (Default):** + ##### System Memory Statistics (Default): ``` admin@sonic:~$ show memory-stats --type system Memory Statistics (System): @@ -185,7 +183,7 @@ Below are all upgraded CLI commands with their definitions and sample outputs, c shared_memory 1.31GB 1.22GB 1.08GB 1.22GB 1.20GB 1.18GB 1.15GB 1.12GB 1.10GB 1.08GB 1.31GB ``` - **System Memory Statistics (Filtered with Time Range):** + ##### System Memory Statistics (Filtered with Time Range): ``` admin@sonic:~$ show memory-stats --type system --from "5 days ago" --to "now" --select used_memory Memory Statistics (System): @@ -201,7 +199,7 @@ Below are all upgraded CLI commands with their definitions and sample outputs, c used_memory 8.87GB 9.35GB 8.40GB 8.40GB 8.50GB 9.35GB 8.90GB 8.87GB ``` - **Docker Memory Statistics (Default):** + ##### Docker Memory Statistics (Default): ``` admin@sonic:~$ show memory-stats --type docker Memory Statistics (Docker): @@ -219,7 +217,7 @@ Below are all upgraded CLI commands with their definitions and sample outputs, c telemetry(b752462) 31.9MB 32.8MB 31.9MB 31.9MB 32.0MB 32.1MB 32.2MB 32.3MB 32.5MB 32.8MB 31.9MB ``` - **Docker Memory Statistics (Filtered with Time Range):** + ##### Docker Memory Statistics (Filtered with Time Range): ``` admin@sonic:~$ show memory-stats --type docker --from "23 hours ago" --to "now" --select swss Memory Statistics (Docker): @@ -235,7 +233,7 @@ Below are all upgraded CLI commands with their definitions and sample outputs, c swss(7ef9d6a) 52.3MB 52.5MB 52.2MB 52.2MB 52.3MB 52.4MB 52.5MB 52.4MB 52.3MB 52.3MB 52.3MB ``` - **Process Memory Statistics (Default):** + ##### Process Memory Statistics (Default): ``` admin@sonic:~$ show memory-stats --type process Memory Statistics (Process): @@ -253,7 +251,7 @@ Below are all upgraded CLI commands with their definitions and sample outputs, c python3(14573) 20.3MB 20.3MB 20.3MB 20.3MB 20.3MB 20.3MB 20.3MB 20.3MB 20.3MB 20.3MB 20.3MB ``` - **Process Memory Statistics (Filtered with Time Range):** + ##### Process Memory Statistics (Filtered with Time Range): ``` admin@sonic:~$ show memory-stats --type process --from "12 hours ago" --to "now" --select bgpd Memory Statistics (Process): @@ -269,7 +267,7 @@ Below are all upgraded CLI commands with their definitions and sample outputs, c bgpd(6284) 19.2MB 19.2MB 19.1MB 19.1MB 19.1MB 19.2MB 19.2MB 19.2MB 19.2MB ``` - **CPU Memory Statistics (Default):** + ##### CPU Memory Statistics (Default): ``` admin@sonic:~$ show memory-stats --type cpu Memory Statistics (CPU): @@ -285,7 +283,7 @@ Below are all upgraded CLI commands with their definitions and sample outputs, c cpu_memory 2.5GB 2.7GB 2.2GB 2.2GB 2.3GB 2.4GB 2.5GB 2.6GB 2.7GB 2.6GB 2.5GB ``` - **CPU Memory Statistics (Filtered with Time Range):** + ##### CPU Memory Statistics (Filtered with Time Range): ``` admin@sonic:~$ show memory-stats --type cpu --from "20 hours ago" --to "now" Memory Statistics (CPU): @@ -301,7 +299,7 @@ Below are all upgraded CLI commands with their definitions and sample outputs, c cpu_memory 2.5GB 2.7GB 2.4GB 2.4GB 2.5GB 2.6GB 2.7GB 2.6GB 2.5GB 2.5GB ``` - - **Process Memory Statistics with Leak Analysis (Default):** + ##### Process Memory Statistics with Leak Analysis (Default): ``` admin@sonic:~$ show memory-stats --type process --leak-analysis @@ -320,7 +318,7 @@ Below are all upgraded CLI commands with their definitions and sample outputs, c syncd(32734) 5.6MB 5.6MB 752.0KB 752.0KB 1.2MB 2.0MB 3.0MB 4.0MB 4.8MB 5.2MB 5.6MB 4.8 MB Potential Leak Detected ``` - - **Process Memory Statistics with Leak Analysis (Filtered with Time Range):** + ##### Process Memory Statistics with Leak Analysis (Filtered with Time Range) ``` admin@sonic:~$ show memory-stats --type process --from "20 minutes ago" --to "now" --select syncd --leak-analysis Memory Statistics (Process Leak Analysis): From 6fb392278ab665865426c488d3efe95f3f5a6b13 Mon Sep 17 00:00:00 2001 From: Arham-Nasir Date: Mon, 5 May 2025 18:43:43 +0500 Subject: [PATCH 4/5] Update HLD with diagrams and text enhancements Signed-off-by: Arham-Nasir --- .../images/architecture_diagram.svg | 4 ++ .../images/mem_stats_configuration.svg | 4 ++ .../memory_statistics_enhancement.md | 47 +++++++++---------- 3 files changed, 29 insertions(+), 26 deletions(-) create mode 100644 doc/memory_statistics/images/architecture_diagram.svg create mode 100644 doc/memory_statistics/images/mem_stats_configuration.svg diff --git a/doc/memory_statistics/images/architecture_diagram.svg b/doc/memory_statistics/images/architecture_diagram.svg new file mode 100644 index 00000000000..f19fffaaf72 --- /dev/null +++ b/doc/memory_statistics/images/architecture_diagram.svg @@ -0,0 +1,4 @@ + + + +
SONiC System
SONiC System
MemoryStatsd Process
(Collects and Logs
Memory Data)
MemoryStatsd Process...
Data Storage
(Stores Memory Statistics Compressed Data in Persistent Memory) 
Data Storage...
Unix Socket Communication
(Handles IPC for Data Transmission)
Unix Socket Communicatio...
CLI Tools
(User Interface for Custom Memory Data Display and Configuration)
CLI Tools...
User
User
Config_db
(memory_statistics_table
stores memory-stats configuration)
Config_db...
Input Commands
Input Commands
Display Results
Display Results
Stores User-specific Configurations
Stores User-specific C...
Fetches Memory Statistics Table
Fetches Memory St...
Monitors changes
 in Config_db
Monitors changes...
Signals Daemon to Reload its Configurations
Signals Daemon to Reload i...
Requests Data Query for Show Commands
Requests Data Query f...
Responds to Data Query for Show Commands
Responds to Data Query f...
Socket Communication
Socket Communicat...
Logs Memory Data
Logs Memory Data
Fetches Memory Data
Fetches Memory Data
System Memory Data
System Memory Data
Reads Defaults from Config File 
on Startup
Reads Defaults from Config File...
HostConfigDaemon
(Listens to changes in ConfigDB, detects updates, and signals MemoryStatsd process)
HostConfigDaemon...
Config File 
(Stores default configuration settings that the memstats process reads at restart)
Config File...
Text is not SVG - cannot display
\ No newline at end of file diff --git a/doc/memory_statistics/images/mem_stats_configuration.svg b/doc/memory_statistics/images/mem_stats_configuration.svg new file mode 100644 index 00000000000..25ca673bef2 --- /dev/null +++ b/doc/memory_statistics/images/mem_stats_configuration.svg @@ -0,0 +1,4 @@ + + + + \ No newline at end of file diff --git a/doc/memory_statistics/memory_statistics_enhancement.md b/doc/memory_statistics/memory_statistics_enhancement.md index 398b2d076b7..cbc6ef891e7 100644 --- a/doc/memory_statistics/memory_statistics_enhancement.md +++ b/doc/memory_statistics/memory_statistics_enhancement.md @@ -50,24 +50,24 @@ This High-Level Design (HLD) document enhances the existing Memory Statistics fe --- ## Overview -This High-Level Design (HLD) enhances SONiC’s Memory Statistics feature, originally limited to system-level memory monitoring via CLI, into a comprehensive solution that now includes Docker containers, individual processes, and CPU memory metrics. It introduces memory leak detection to prevent resource exhaustion and integrates gNMI for remote log access, reducing local dependency. This upgrade provides complete visibility across memory metrics, early memory leak detection, with customizable sampling and retention periods and efficient remote retrieval. +This High-Level Design (HLD) enhances SONiC’s Memory Statistics feature, originally limited to system-level memory monitoring via CLI, into a comprehensive solution that now includes Docker containers, individual processes, and CPU memory metrics. It introduces memory leak detection to prevent resource exhaustion and integrates gNMI for remote log access, reducing local dependency. This upgrade provides complete visibility across memory metrics with user-defined sampling and retention periods. Moreover, it also helps with early memory leak detection and efficient remote retrieval. --- ## Functional Requirements -This section outlines the functional requirements necessary for implementing this HLD in SONiC. +This section outlines the functional requirements necessary for implementing this HLD in SONiC: - **Monitoring Capabilities:** The system must monitor memory metrics for system (Total, Used, Free, Available, Cached, Buffers, Shared), Docker containers, individual processes, and CPU usage. -- **Memory Leak Detection:** The feature must analyze memory usage trends over time to detect potential leaks, reporting them via CLI. +- **Memory Leak Detection:** The feature must analyze memory usage trends over time to detect potential leaks and report them via CLI. - **Configurability:** The system must allow configuration of sampling intervals (3–15 minutes) and retention periods (1–30 days) via CLI. -- **CLI Enhancements:** The CLI must support displaying new metrics and leak analysis outputs with filtering options. +- **CLI Enhancements:** The CLI must support displaying new metrics and leak analysis with filtering options. - **Remote Log Access:** gNMI integration must enable retrieval of memory logs from remote systems. --- ## Architecture Design -The enhancement fits within the existing framework without altering its core structure. The memorystatsd daemon is extended to collect additional metrics and detect leaks, interfacing with hostcfgd for ConfigDB updates. A new gNMI server processes logs into JSON and serves them remotely. This integrates seamlessly with SONiC’s modular design, leveraging existing daemons and adding gNMI capabilities. +The enhancement fits within the existing framework without altering its core structure. The memorystatsd is extended to collect additional metrics and detect leaks, interfacing with hostcfgd for ConfigDB updates. Enhancements were made to the existing gNMI server which processes logs into JSON and makes them remotely accessible. This integrates seamlessly with SONiC’s modular design, leveraging existing daemons and adding gNMI capabilities.

architecture diagram for memory data @@ -82,23 +82,23 @@ The enhancement fits within the existing framework without altering its core str ### Core Functionalities #### Data Collection and Storage -The `memorystatsd` daemon collects system (Total, Used, Free, Available, Cached, Buffers, Shared), Docker, process, and CPU memory metrics using `psutil` and Docker API, storing them as compressed log files for efficiency. +The `memorystatsd` collects system, Docker, process, and CPU memory metrics using `psutil` and Docker APIs, storing them as compressed log files for optimized memory usage. #### Log Processing and Storage -Logs are processed into JSON for gNMI retrieval, with low overhead. +Logs are processed into JSON for gNMI retrieval with low overhead. #### Memory Leak Detection -Analyzes memory trends by comparing usage over time to detect steady increases, reported via `show memory-stats --type process --leak-analysis` (e.g., "Potential Leak Detected"). +This feature analyzes memory trends by comparing usage over time to detect steady increases reported via `show memory-stats --type process --leak-analysis` (e.g., "Potential Leak Detected"). #### Remote Access via gNMI -The gNMI server, potentially within `memorystatsd`, serves JSON logs via `Get` requests, supporting snapshots or intervals based on time range parameters. +The gNMI server, running inside SONiC’s built-in gNMI container, retrieves JSON-formatted memory logs placed by memorystatsd and serves them to the clients. #### User Interaction -Users view stats, configure settings (reusing `enable/disable`, `sampling_interval` and `retention_period` from [v1](https://github.com/Arham-Nasir/SONiC/blob/4cf0b5d0bc973cf3a72f91b7f0a9567fd42eeccd/doc/memory_statistics/memory_statistics_hld.md)), and analyze leaks via CLI; logs are fetched remotely via gNMI. +Users view statistics, configure settings (reusing `enable/disable`, `sampling_interval` and `retention_period` from [v1](https://github.com/Arham-Nasir/SONiC/blob/4cf0b5d0bc973cf3a72f91b7f0a9567fd42eeccd/doc/memory_statistics/memory_statistics_hld.md)) and analyze leaks via CLI. Logs can also be fetched remotely via gNMI. ### Sequence Diagrams - **View Memory Usage**: - - **Description**: Shows the CLI-based retrieval of memory metrics (system, Docker, process, CPU). + - **Description**: It shows the CLI-based retrieval of memory metrics (system, Docker, process, CPU). - **Diagram**:

Leak Detection Sequence @@ -106,7 +106,7 @@ Users view stats, configure settings (reusing `enable/disable`, `sampling_interv Figure 2: View Memory Usage

- **Memory Leak Detection**: - - **Description**: Depicts the process of trend analysis and leak reporting. + - **Description**: It tracks memory usage over time, detects unusual growth trends and warns about possible leaks. - **Diagram**:

Leak Detection Sequence @@ -114,7 +114,7 @@ Users view stats, configure settings (reusing `enable/disable`, `sampling_interv Figure 3: Sequence for memory leak detection

- **gNMI Log Retrieval**: - - **Description**: Outlines remote log access. + - **Description**: This diagram outlines remote log access. - **Diagram**:

gNMI Log Retrieval Sequence @@ -148,13 +148,13 @@ The following configuration commands are reused from [v1](https://github.com/Arh - Example: `config memory-stats retention-period 20` → "Retention period set to 20 days." #### Show Commands -Below are all upgraded CLI commands with their definitions and sample outputs, covering system, Docker, process, CPU metrics, and memory leak analysis. +Below are all the upgraded CLI commands with their definitions and sample outputs. These commands cover memory metrics for the system, Docker, process, CPU and leak analysis. 1. ##### View Memory Statistics ##### Command: `show memory-stats [--type ] [--from ] [--to ] [--select ] [--leak-analysis]` - ##### Description: - - Displays memory statistics for the specified type (system, Docker, process, or CPU; default: system, last 15 days). + - Displays memory statistics for the specified type (system, Docker, process, or CPU | default: system, last 15 days). - `--type `: Specifies the metric type. - `--from/--to`: Defines the time range (ISO format or relative, e.g., "5 days ago"). - `--select `: Filters specific metrics (e.g., total_memory, used_memory, or process/container name/ID). @@ -336,9 +336,6 @@ Below are all upgraded CLI commands with their definitions and sample outputs, c -------------------------------------------------------------------------------------------------------------------------------------------------- ``` -### Daemon Configuration Management -(Reuse original configuration management from [v1](https://github.com/Arham-Nasir/SONiC/blob/4cf0b5d0bc973cf3a72f91b7f0a9567fd42eeccd/doc/memory_statistics/memory_statistics_hld.md)) - ### Config DB Enhancements No Config DB enhancements required (Reused from [v1](https://github.com/Arham-Nasir/SONiC/blob/4cf0b5d0bc973cf3a72f91b7f0a9567fd42eeccd/doc/memory_statistics/memory_statistics_hld.md)) @@ -356,20 +353,18 @@ No impact on warmboot/fastboot functionalities. ### Unit Test Cases | Test Case ID | Description | |--------------|-----------------------------------------------------------------------------| -| UT11 | Verify CLI to show Docker memory stats | -| UT12 | Verify CLI to show process memory stats with leak analysis | -| UT13 | Verify CLI to show CPU memory stats | -| UT14 | Verify leak detection for a process exceeding threshold | -| UT15 | Verify gNMI log retrieval for all metric types | +| UT1 | Verify CLI to show Docker, process and CPU memory stats | +| UT2 | Verify CLI to show Docker, process and CPU memory stats with leak analysis | | +| UT3 | Verify leak detection for a process exceeding threshold | +| UT4 | Verify gNMI log retrieval for all metric types | ### System Test Cases | Test Case ID | Description | |--------------|-----------------------------------------------------------------------------| -| ST2 | Validate end-to-end functionality including gNMI retrieval and log cleanup | -| ST3 | Validate memory leak detection accuracy over a 7-day period | +| ST1 | Validate end-to-end functionality, including gNMI and CLI retrieval | --- ## Future Work - Add alerting for memory leaks via email/syslog. -- Extend metrics to include GPU memory (if applicable). +- Extend support to additional memory-related metrics. From 7c036c4fcff04b36b4bfb2a7fbc8a7d54f505393 Mon Sep 17 00:00:00 2001 From: Arham-Nasir Date: Tue, 6 May 2025 12:24:19 +0500 Subject: [PATCH 5/5] updated mem-stats-enhancement hld Signed-off-by: Arham-Nasir --- .../memory_statistics_enhancement.md | 56 ++++--------------- 1 file changed, 12 insertions(+), 44 deletions(-) diff --git a/doc/memory_statistics/memory_statistics_enhancement.md b/doc/memory_statistics/memory_statistics_enhancement.md index cbc6ef891e7..26f060d4823 100644 --- a/doc/memory_statistics/memory_statistics_enhancement.md +++ b/doc/memory_statistics/memory_statistics_enhancement.md @@ -34,7 +34,7 @@ ## Scope -This High-Level Design (HLD) document enhances the existing Memory Statistics feature in SONiC by extending memory metrics to include Docker, process, and CPU memory, adding memory leak detection, and enabling remote log access via gNMI. +This High-Level Design (HLD) document enhances the existing Memory Statistics feature in SONiC by extending memory metrics to include Docker and process memory, adding memory leak detection, and enabling remote log access via gNMI. --- @@ -50,14 +50,14 @@ This High-Level Design (HLD) document enhances the existing Memory Statistics fe --- ## Overview -This High-Level Design (HLD) enhances SONiC’s Memory Statistics feature, originally limited to system-level memory monitoring via CLI, into a comprehensive solution that now includes Docker containers, individual processes, and CPU memory metrics. It introduces memory leak detection to prevent resource exhaustion and integrates gNMI for remote log access, reducing local dependency. This upgrade provides complete visibility across memory metrics with user-defined sampling and retention periods. Moreover, it also helps with early memory leak detection and efficient remote retrieval. +This High-Level Design (HLD) enhances SONiC’s Memory Statistics feature, originally limited to system-level memory monitoring via CLI, into a comprehensive solution that now includes Docker containers and individual processes memory metrics. It introduces memory leak detection to prevent resource exhaustion and integrates gNMI for remote log access, reducing local dependency. This upgrade provides complete visibility across memory metrics with user-defined sampling and retention periods. Moreover, it also helps with early memory leak detection and efficient remote retrieval. --- ## Functional Requirements This section outlines the functional requirements necessary for implementing this HLD in SONiC: -- **Monitoring Capabilities:** The system must monitor memory metrics for system (Total, Used, Free, Available, Cached, Buffers, Shared), Docker containers, individual processes, and CPU usage. +- **Monitoring Capabilities:** The system must monitor memory metrics for system (Total, Used, Free, Available, Cached, Buffers, Shared), Docker containers and individual processes. - **Memory Leak Detection:** The feature must analyze memory usage trends over time to detect potential leaks and report them via CLI. - **Configurability:** The system must allow configuration of sampling intervals (3–15 minutes) and retention periods (1–30 days) via CLI. - **CLI Enhancements:** The CLI must support displaying new metrics and leak analysis with filtering options. @@ -82,7 +82,7 @@ The enhancement fits within the existing framework without altering its core str ### Core Functionalities #### Data Collection and Storage -The `memorystatsd` collects system, Docker, process, and CPU memory metrics using `psutil` and Docker APIs, storing them as compressed log files for optimized memory usage. +The `memorystatsd` collects system, Docker and process memory metrics using `psutil` and Docker APIs, storing them as compressed log files for optimized memory usage. #### Log Processing and Storage Logs are processed into JSON for gNMI retrieval with low overhead. @@ -98,7 +98,7 @@ Users view statistics, configure settings (reusing `enable/disable`, `sampling_i ### Sequence Diagrams - **View Memory Usage**: - - **Description**: It shows the CLI-based retrieval of memory metrics (system, Docker, process, CPU). + - **Description**: It shows the CLI-based retrieval of memory metrics (system, Docker and process). - **Diagram**:

Leak Detection Sequence @@ -148,14 +148,14 @@ The following configuration commands are reused from [v1](https://github.com/Arh - Example: `config memory-stats retention-period 20` → "Retention period set to 20 days." #### Show Commands -Below are all the upgraded CLI commands with their definitions and sample outputs. These commands cover memory metrics for the system, Docker, process, CPU and leak analysis. +Below are all the upgraded CLI commands with their definitions and sample outputs. These commands cover memory metrics and leak analysis for the system, Docker and process. 1. ##### View Memory Statistics ##### Command: - `show memory-stats [--type ] [--from ] [--to ] [--select ] [--leak-analysis]` + `show memory-stats [--type ] [--from ] [--to ] [--select ] [--leak-analysis]` - ##### Description: - - Displays memory statistics for the specified type (system, Docker, process, or CPU | default: system, last 15 days). - - `--type `: Specifies the metric type. + - Displays memory statistics for the specified type (system, Docker or process | default: system, last 15 days). + - `--type `: Specifies the metric type. - `--from/--to`: Defines the time range (ISO format or relative, e.g., "5 days ago"). - `--select `: Filters specific metrics (e.g., total_memory, used_memory, or process/container name/ID). - `--leak-analysis`: Enables leak detection mode. @@ -267,38 +267,6 @@ Below are all the upgraded CLI commands with their definitions and sample output bgpd(6284) 19.2MB 19.2MB 19.1MB 19.1MB 19.1MB 19.2MB 19.2MB 19.2MB 19.2MB ``` - ##### CPU Memory Statistics (Default): - ``` - admin@sonic:~$ show memory-stats --type cpu - Memory Statistics (CPU): - Codes: M - minutes, H - hours, D - days - -------------------------------------------------------------------------------- - Report Generated: 2025-03-10 14:30:00 - Analysis Period: From 2025-02-23 14:30:00 to 2025-03-10 14:30:00 - Interval: 2 Days - -------------------------------------------------------------------------------------------------------------------------------------------------- - Metric Current High Low D23-D25 D25-D27 D27-D01 D01-D03 D03-D05 D05-D07 D07-D09 D09-D10 - Value Value Value 23Feb25 25Feb25 27Feb25 01Mar25 03Mar25 05Mar25 07Mar25 09Mar25 - -------------------------------------------------------------------------------------------------------------------------------------------------- - cpu_memory 2.5GB 2.7GB 2.2GB 2.2GB 2.3GB 2.4GB 2.5GB 2.6GB 2.7GB 2.6GB 2.5GB - ``` - - ##### CPU Memory Statistics (Filtered with Time Range): - ``` - admin@sonic:~$ show memory-stats --type cpu --from "20 hours ago" --to "now" - Memory Statistics (CPU): - Codes: M - minutes, H - hours, D - days - -------------------------------------------------------------------------------- - Report Generated: 2025-03-10 14:30:00 - Analysis Period: From 2025-03-09 18:30:00 to 2025-03-10 14:30:00 - Interval: 3 Hours - -------------------------------------------------------------------------------------------------------------------------------------------------- - Metric Current High Low H18-H21 H21-H00 H00-H03 H03-H06 H06-H09 H09-H12 H12-H14 - Value Value Value 09Mar25 09Mar25 10Mar25 10Mar25 10Mar25 10Mar25 10Mar25 - -------------------------------------------------------------------------------------------------------------------------------------------------- - cpu_memory 2.5GB 2.7GB 2.4GB 2.4GB 2.5GB 2.6GB 2.7GB 2.6GB 2.5GB 2.5GB - ``` - ##### Process Memory Statistics with Leak Analysis (Default): ``` @@ -353,9 +321,9 @@ No impact on warmboot/fastboot functionalities. ### Unit Test Cases | Test Case ID | Description | |--------------|-----------------------------------------------------------------------------| -| UT1 | Verify CLI to show Docker, process and CPU memory stats | -| UT2 | Verify CLI to show Docker, process and CPU memory stats with leak analysis | | -| UT3 | Verify leak detection for a process exceeding threshold | +| UT1 | Verify CLI to show Docker and process memory stats | +| UT2 | Verify CLI to show Docker and process memory stats with leak analysis | | +| UT3 | Verify leak detection for a Docker and process memory exceeding threshold | | UT4 | Verify gNMI log retrieval for all metric types | ### System Test Cases