Support for Memory Statisticsd Process and Configurations#20355
Support for Memory Statisticsd Process and Configurations#20355Arham-Nasir wants to merge 27 commits intosonic-net:masterfrom
Conversation
…sage data Signed-off-by: Arham-Nasir <[email protected]>
|
Hi @qiluo-msft , @prgeor , and @FengPan-Frank , Could you please help review the HLD and the linked code PRs for our feature? Thank you! |
| memory_statistics = self.collect_memory_statistics() | ||
| self.store_memory_statistics(memory_statistics) | ||
| except Exception as e: | ||
| self.logger.log(f"Error collecting or storing memory statistics: {e}", logging.ERROR) |
There was a problem hiding this comment.
will this be repeated in dead loop and generate bunch of logs?
There was a problem hiding this comment.
Thank you for your feedback. The memory statistics daemon is designed with robust loop control to prevent dead loops. It collects data at a configurable interval, logs it in compressed format for historical data, and shuts down gracefully when triggered. The loop control mechanism ensures it waits according to the sampling interval and only continues running until a shutdown signal is received. Potential issues could arise from misconfigurations (e.g., zero or negative sampling intervals) but the daemon is designed to handle such cases effectively.
Signed-off-by: Arham-Nasir <[email protected]>
…:Arham-Nasir/sonic-buildimage into feature/memory-statistics-daemon-process Merge branch 'feature/memory-statistics-daemon-process' - Integrated changes from the latest upstream. - Added SyslogLogger class for syslog message logging.
…handling, time delta, and memory size formatting. Signed-off-by: Arham-Nasir <[email protected]>
…d retention calculations Signed-off-by: Arham-Nasir <[email protected]>
…stics reports Signed-off-by: Arham-Nasir <[email protected]>
Signed-off-by: Arham-Nasir <[email protected]>
Signed-off-by: Arham-Nasir <[email protected]>
Signed-off-by: Arham-Nasir <[email protected]>
…or and MemoryStatisticsCollector class Signed-off-by: Arham-Nasir <[email protected]>
|
@qiluo-msft @xincunli-sonic pls help review this feature |
Signed-off-by: Arham-Nasir <[email protected]>
|
Hi @FengPan-Frank and @qiluo-msft , |
Thanks for the work, could you add test case to cover collector service code? |
Thank you for the review and suggestion, I’ll add test cases for the collector service code. Please let me know if there are any specific scenarios you'd like me to cover. |
Just common sanity check to cover the daemon functions. |
…ully Signed-off-by: Arham-Nasir <[email protected]>
| @@ -0,0 +1,10 @@ | |||
| # Memory Statistics Daemon Configuration | |||
There was a problem hiding this comment.
Would it better if these configurations can put into config db instead of a static conf file? #Closed
There was a problem hiding this comment.
We have stored configurations in both config_db and a static conf file for flexibility and reliability.
At startup, the daemon reads default settings from the conf file to ensure a consistent state after a restart. During runtime, the daemon dynamically updates its configuration from config_db without needing a restart. This approach combines the stability of predefined defaults with the convenience of live updates, ensuring predictable behavior and seamless adaptability.
| @@ -0,0 +1,1877 @@ | |||
| #!/usr/bin/env python3 | |||
| import psutil | |||
There was a problem hiding this comment.
Can you sort import as nit purpose? #Closed
There was a problem hiding this comment.
Thank you for the review. I have organized the imports following the standard Python import grouping practice: standard library imports, third-party imports, and local application imports, sorted alphabetically within each group for improved readability and maintainability.
| """Logs a message with the 'DEBUG' level.""" | ||
| self.log(syslog.LOG_DEBUG, message) | ||
|
|
||
| def close_logger(self): |
There was a problem hiding this comment.
I have addressed it by adding context manager methods (enter, exit) to manage the syslog connection lifecycle, ensuring it opens and closes cleanly. Additionally, the close method now ensures proper resource management when used explicitly.
| logger.log_info(f"Removing outdated entry: {old_entry}") | ||
| del total_dict['system_memory'][memory_type][old_entry] | ||
|
|
||
| total_dict['system_memory']['count'] = len(total_dict['system_memory'].get('system', {})) |
| logger.log_info("Accepted new connection") | ||
| self.handle_connection(connection) | ||
| except socket.timeout: | ||
| continue |
There was a problem hiding this comment.
Thanks for the feedback! I've added a logger.log_debug for socket.timeout to improve debuggability.
| """ | ||
| self.pid_file = pid_file | ||
|
|
||
| def daemonize(self): |
There was a problem hiding this comment.
Hi @xincunli-sonic ,
The enhanced Daemonizer class is designed to ensure reliable daemon creation, with robust validation for process isolation, session management, and file descriptor redirection.
Thank you.
| if slot < num_columns: | ||
| self.add_entry_to_time_group_list(time_entry_summary, slot, memory_entry) | ||
|
|
||
| def aggregate_data(self, request_data, time_entry_summary, num_columns): |
There was a problem hiding this comment.
There was a problem hiding this comment.
Thank you for your valuable feedback and suggestion to optimize the aggregate_data function. I have updated the code by implementing batch processing with dynamic sizing to reduce redundant operations, minimizing nested calls for better performance, and adding robust error handling with detailed logging. Let me know if further adjustments are needed.
| logger.log_info(f"Service initialized with name: {self.name}") | ||
|
|
||
| self.config_file_path = config_file_path | ||
| self.memory_statistics_lock = threading.Lock() |
There was a problem hiding this comment.
Thank you for pointing this out. I have carefully reviewed the code and ensured that locks are applied consistently for all shared resource access.
Added ThreadSafeConfig class for safe configuration access and consistently used memory_statistics_lock during memory statistics collection to prevent race conditions.
Let me know if further adjustments are needed
| if os.path.exists(pid_file): | ||
| os.unlink(pid_file) | ||
| logger.log_info(f"Removed PID file: {pid_file}") | ||
| except Exception as e: |
There was a problem hiding this comment.
Replaced except Exception with specific exceptions for improved clarity and control.
| try: | ||
| with gzip.open(memory_statistics_config['TOTAL_MEMORY_STATISTICS_LOG_FILENAME'], 'wt', encoding='utf-8') as jfile: | ||
| json.dump(total_dict, jfile) | ||
| except Exception as e: |
There was a problem hiding this comment.
Thank you for the feedback.
I’ve replaced all instances of the generic except Exception with more specific exceptions to ensure better error handling and clarity. Please let me know if you have any further suggestions.
|
/azp run Azure.sonic-buildimage |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Signed-off-by: Arham-Nasir <[email protected]>
|
/azp run Azure.sonic-buildimage |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
need to rethink the approach, this looks like reinvent the wheel of time series database on the sonic itself. |
Signed-off-by: Arham-Nasir <[email protected]>
|
/azp run Azure.sonic-buildimage |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Signed-off-by: Arham-Nasir <[email protected]>
|
/azp run Azure.sonic-buildimage |
|
Azure Pipelines successfully started running 1 pipeline(s). |
… feature/memory-statistics-daemon-process
|
/azp run Azure.sonic-buildimage |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Signed-off-by: Arham-Nasir <[email protected]>
|
/azp run Azure.sonic-buildimage |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Signed-off-by: Arham-Nasir <[email protected]>
|
/azp run Azure.sonic-buildimage |
|
Azure Pipelines successfully started running 1 pipeline(s). |
We want to clarify that the purpose of memorystatsd is not to build a time-series database (TSDB) inside the SONiC Redis DB. Instead, memorystatsd is designed to be a simple and lightweight solution that fits well in SONiC environments. To keep the system efficient and reduce complexity, memorystatsd avoids using Redis for storing data. Instead, it saves compressed JSON snapshots to /var/log/memorystats/memory-stats.json.gz. Each snapshot is very compact approximately 225 bytes resulting in a total of around 63 KB per day. Depending on the configured retention period, the storage footprint remains minimal. Old data is automatically deleted based on the set retention period, helping to meet platform limits and keep resource usage low. |
|
Dear @qiluo-msft This is the only PR left to be merged as per the scope of the HLD, hoping for your support. |
|
I left a comment in Add HLD for Memory Statistics Enhancement with New Metrics, Leak Detection, and gNMI Access by Arham-Nasir · Pull Request #1962 · sonic-net/SONiC Please resolve this issue at high priority |
7a31207
The Memory_Statisticsd Process was added to enhance system monitoring by collecting memory usage data at configurable intervals. This will allow the system to store historical data, making it easier to analyze memory trends and debug memory-related issues.