Add support for liquid cooling inside hardware checker#24124
Add support for liquid cooling inside hardware checker#24124qiluo-msft merged 2 commits intosonic-net:masterfrom
Conversation
|
/azp run Azure.sonic-buildimage |
|
Azure Pipelines successfully started running 1 pipeline(s). |
9ca9219 to
814ae84
Compare
|
/azp run Azure.sonic-buildimage |
|
Azure Pipelines successfully started running 1 pipeline(s). |
814ae84 to
4c6f08c
Compare
|
/azp run Azure.sonic-buildimage |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
@judyjoseph would you please help to review this PR? |
|
@qiluo-msft are we still ok to use the eventd infrastructure, remember there was some memory leaks we found ? |
@qiluo-msft can you comment, thanks |
There was a problem hiding this comment.
Pull request overview
This PR adds support for monitoring liquid cooling devices in the system health checker, implementing functionality defined in the liquid cooling HLD. The implementation follows existing patterns for hardware monitoring (ASIC, PSU, fan) and introduces an opt-in configuration mechanism through include_devices.
Key Changes:
- Added liquid cooling leak detection monitoring in the hardware checker with event publishing capabilities
- Introduced
include_devicesconfiguration field to enable optional device monitoring features - Refactored event publisher lifecycle management to initialize/deinitialize within function scope
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| src/system-health/health_checker/hardware_checker.py | Adds _check_liquid_cooling_status() method to monitor leak sensors and publish events; includes event publisher helper method |
| src/system-health/health_checker/config.py | Adds include_devices configuration field to support opt-in device monitoring |
| src/system-health/health_checker/service_checker.py | Refactors event publisher to initialize/deinitialize within method scope instead of as instance variable |
| src/system-health/health_checker/system_health_monitoring_config.json | Adds include_devices field to production configuration |
| src/system-health/tests/test_system_health.py | Adds test coverage for liquid cooling status checking with multiple leak scenarios and configuration validation |
| src/system-health/tests/system_health_monitoring_config.json | Adds include_devices field with liquid_cooling enabled for test configuration |
| src/system-health/health_checker/utils.py | Removes extraneous blank line (code cleanup) |
Comments suppressed due to low confidence (1)
src/system-health/health_checker/hardware_checker.py:12
- The class docstring is outdated and should include liquid cooling as part of the hardware checks being performed. Consider updating it to: "Check system hardware status. For now, it checks ASIC, PSU, fan and liquid cooling status."
"""
Check system hardware status. For now, it checks ASIC, PSU and fan status.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
4c6f08c to
ebaa0f6
Compare
|
/azp run Azure.sonic-buildimage |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
@qiluo-msft could you review again |
|
@yuazhe can you update the branch |
Update the branch will cause checker to be re-ran, this should be unnecessary as the change can be cleanly merged. |
|
/azp run Azure.sonic-buildimage |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
@yuazhe Could you please reduce the force-push in PR? We need to code review between commits to understand what are the new changes. Force-push will destroy history and make code review inefficient. |
|
/azp run Azure.sonic-buildimage |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
Signed-off-by: Yuanzhe, Liu <yualiu@nvidia.com>
63e2bf8 to
1efe5e1
Compare
|
/azp run Azure.sonic-buildimage |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run Azure.sonic-buildimage |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Why I did it Following this HLD for liquid cooling sonic-net/SONiC#2032, I created this pr to add checker inside hardware checker to monitor liquid cooling device status
Why I did it Following this HLD for liquid cooling sonic-net/SONiC#2032, I created this pr to add checker inside hardware checker to monitor liquid cooling device status
Why I did it Following this HLD for liquid cooling sonic-net/SONiC#2032, I created this pr to add checker inside hardware checker to monitor liquid cooling device status Signed-off-by: xiaweijiang <xiaweijiang@microsoft.com>
Why I did it Following this HLD for liquid cooling sonic-net/SONiC#2032, I created this pr to add checker inside hardware checker to monitor liquid cooling device status
Why I did it Following this HLD for liquid cooling sonic-net/SONiC#2032, I created this pr to add checker inside hardware checker to monitor liquid cooling device status Signed-off-by: Feng Pan <fenpan@microsoft.com>
Why I did it
Following this HLD for liquid cooling sonic-net/SONiC#2032, I created this pr to add checker inside hardware checker to monitor liquid cooling device status
Work item tracking
How I did it
How to verify it
Which release branch to backport (provide reason below if selected)
Tested branch (Please provide the tested image version)
Description for the changelog
Link to config_db schema for YANG module changes
A picture of a cute animal (not mandatory but encouraged)