Implementation of a Monitoring Daemon for storage devices in SONiC switches#433
Implementation of a Monitoring Daemon for storage devices in SONiC switches#433prgeor merged 31 commits intosonic-net:masterfrom
Conversation
…ge*' to include all disk types
Added to the PR. |
|
|
|
/azpw run |
|
/AzurePipelines run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
sonic-stormond/scripts/stormond
Outdated
|
|
||
| STORAGEUTIL_LOAD_ERROR = 127 | ||
|
|
||
| log = syslogger.SysLogger(SYSLOG_IDENTIFIER) |
There was a problem hiding this comment.
@assrinivasan can we move this inside daemon calss?
|
|
||
| if value is None: self.log_warning("{}:{} value = None in StateDB".format(storage_device, field)) | ||
|
|
||
| self.statedb_storage_info_loaded = True |
There was a problem hiding this comment.
@assrinivasan what if the value is None, in that case we should fall back to .json on the disk
There was a problem hiding this comment.
Fixed this in latest. Also added a None check in the _load_fsio_rw_json function for None values. In this scenario, Both StateDB and JSON file have junk values, so it will be considered an init case.
sonic-stormond/scripts/stormond
Outdated
| if self.statedb_storage_info_loaded == False and self.fsio_json_file_loaded == True: | ||
| self.use_fsio_json_baseline = True | ||
| self.use_statedb_baseline = False | ||
|
|
||
| # If stormond is coming back up after a daemon crash, storage information would be saved in the | ||
| # STATE_DB. In that scenario, we use the STATE_DB information as the SoT and reconcile the FSIO | ||
| # reads and writes values. | ||
| elif self.statedb_storage_info_loaded == True: | ||
| self.use_fsio_json_baseline = False | ||
| self.use_statedb_baseline = True |
There was a problem hiding this comment.
@assrinivasan can you make the logic more clear, i.e, if the stats are available in STATE_DB, then use that and as a fallback use .json values from the backup
|
/azpw run |
|
/AzurePipelines run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Description
This commit adds a monitoring daemon for Storage device attributes on a device running SONiC.
SONiC Storage Monitoring Daemon HLD
Motivation and Context
Storage devices experience performance degradation over time on account of a variety of factors such as overall disk writes, bad-blocks management, lack of free space, sub-optimal operational temperature and good-old wear-and-tear which speaks to the overall health of the disk.
The goal of the Storage Monitoring Daemon (storagemond) is to provide meaningful metrics for the aforementioned issues and enable streaming telemetry for these attributes so that the required preventative measures are triggered in the eventuality of performance degradation.
How Has This Been Tested?
Has been manually tested on following platforms:
7050cx3.txt
S6100.txt
SN2700.txt
Additional Information (Optional)