Add HLD for Memory Statistics Enhancement with New Metrics, Leak Detection, and gNMI Access#1962
Add HLD for Memory Statistics Enhancement with New Metrics, Leak Detection, and gNMI Access#1962Arham-Nasir wants to merge 5 commits intosonic-net:masterfrom
Conversation
…ction, and gNMI access Signed-off-by: Arham-Nasir <[email protected]>
|
/azp run |
|
No pipelines are associated with this pull request. |
|
/azp run |
|
No pipelines are associated with this pull request. |
|
/azp run |
|
No pipelines are associated with this pull request. |
Signed-off-by: Arham-Nasir <[email protected]>
|
/azp run |
|
No pipelines are associated with this pull request. |
Signed-off-by: Arham-Nasir <[email protected]>
|
/azp run |
|
No pipelines are associated with this pull request. |
|
|
||
| ## Architecture Design | ||
|
|
||
| The enhancement fits within the existing framework without altering its core structure. The memorystatsd is extended to collect additional metrics and detect leaks, interfacing with hostcfgd for ConfigDB updates. Enhancements were made to the existing gNMI server which processes logs into JSON and makes them remotely accessible. This integrates seamlessly with SONiC’s modular design, leveraging existing daemons and adding gNMI capabilities. |
There was a problem hiding this comment.
Comment raised in community review meeting: why develop a new daemon MemoryStatsd for monitor memory data, which is overlapping with existing procdockerd? Could this new feature design reusing existing daemon or combined them into one daemon?
I see some code PRs are already merged, but this comment is still applicable.
There was a problem hiding this comment.
Thanks for your feedback. We’ll carefully review procdockerd for overlap with MemoryStatsd and explore options for reusing or consolidating them into a single daemon. We appreciate your suggestion and will provide an update after our evaluation.
Can you please replace link of memory_statistics_hld.md with original sonic-net/SONiC repo link instead of personal fork link? |
Thanks for pointing it out. Updated |
| This section outlines the functional requirements necessary for implementing this HLD in SONiC: | ||
|
|
||
| - **Monitoring Capabilities:** The system must monitor memory metrics for system (Total, Used, Free, Available, Cached, Buffers, Shared), Docker containers and individual processes. | ||
| - **Memory Leak Detection:** The feature must analyze memory usage trends over time to detect potential leaks and report them via CLI. |
There was a problem hiding this comment.
What is the practical benefit of storing memory histograms on the switch itself? Would it be more efficient to pull this data externally and analyze it off-box?
| ## Functional Requirements | ||
| This section outlines the functional requirements necessary for implementing this HLD in SONiC: | ||
|
|
||
| - **Monitoring Capabilities:** The system must monitor memory metrics for system (Total, Used, Free, Available, Cached, Buffers, Shared), Docker containers and individual processes. |
There was a problem hiding this comment.
Memory leaks are not specific to SONiC—they’re a broader Linux-level concern. It might be better to explore native Linux tooling for leak detection instead of custom mechanisms.
| ### Core Functionalities | ||
|
|
||
| #### Data Collection and Storage | ||
| The `memorystatsd` collects system, Docker and process memory metrics using `psutil` and Docker APIs, storing them as compressed log files for optimized memory usage. |
There was a problem hiding this comment.
What is the intent behind the memory stats daemon? Is it based on existing sources like /proc/meminfo or /proc/dockerstats, or is it introducing a completely new daemon?
| The `memorystatsd` collects system, Docker and process memory metrics using `psutil` and Docker APIs, storing them as compressed log files for optimized memory usage. | ||
|
|
||
| #### Log Processing and Storage | ||
| Logs are processed into JSON for gNMI retrieval with low overhead. |
There was a problem hiding this comment.
Log rotation and retention need to be defined. Is logrotate being used to manage stored memory metrics?
This PR adds a High-Level Design (HLD) to doc/memory_statistics/memory_statistics_enhancement.md, enhancing the Memory Statistics feature in SONiC. Key additions:
This HLD builds on v1 (memory_statistics_hld).