Skip to content

[Mellanox] Fix thermal control bugs #4298

Merged
jleveque merged 26 commits intosonic-net:masterfrom
Junchao-Mellanox:thermal-fix
Mar 25, 2020
Merged

[Mellanox] Fix thermal control bugs #4298
jleveque merged 26 commits intosonic-net:masterfrom
Junchao-Mellanox:thermal-fix

Conversation

@Junchao-Mellanox
Copy link
Collaborator

- What I did

  1. Fix issue: add PSU fan no matter sysfs exists or not
  2. Fix issue: get fan direction per drawer index instead of per fan index
  3. Fix issue: clarify logs for PSU absence and PSU power off
  4. Add 2100/2010 support to get fan presence status
  5. Fix issue: pmon docker exists on 3800
  6. Add unit test for code changes

- How I did it

- How to verify it

Manually verify and run existing regression cases

- Description for the changelog

- A picture of a cute animal (not mandatory but encouraged)

Junchao-Mellanox and others added 25 commits March 11, 2020 20:39
Conflicts:
	device/mellanox/x86_64-mlnx_msn2700-r0/plugins/sfputil.py
	platform/mellanox/mlnx-platform-api/sonic_platform/thermal.py
Conflicts:
	platform/mellanox/mlnx-platform-api/sonic_platform/psu.py
Conflicts:
	platform/mellanox/mlnx-platform-api/sonic_platform/psu.py
Conflicts:
	device/mellanox/x86_64-mlnx_msn2700-r0/plugins/sfputil.py
	platform/mellanox/mlnx-platform-api/sonic_platform/thermal.py
Conflicts:
	platform/mellanox/mlnx-platform-api/sonic_platform/psu.py
Conflicts:
	platform/mellanox/mlnx-platform-api/sonic_platform/psu.py
@Junchao-Mellanox
Copy link
Collaborator Author

I closed review #4252 and created this new one. I did a rebase for build issue in previous review.

@Junchao-Mellanox Junchao-Mellanox marked this pull request as ready for review March 20, 2020 09:54
@jleveque
Copy link
Contributor

Retest this please

@Junchao-Mellanox
Copy link
Collaborator Author

Retest vsimage please

@jleveque
Copy link
Contributor

Looks good to me. I'd just like someone else from Mellanox to review and approve.

@Junchao-Mellanox
Copy link
Collaborator Author

@kebol could you please help review?

@jleveque jleveque merged commit 80bf061 into sonic-net:master Mar 25, 2020
pphuchar pushed a commit to pphuchar/sonic-buildimage that referenced this pull request Apr 20, 2020
* [thermal control] Fix pmon docker stop issue on 3800
* [thermal fix] Fix QA test issue
* [thermal fix] change psu._get_power_available_status to psu.get_power_available_status
* [thermal fix] adjust log for PSU absence and power absence
* [thermal fix] add unit test for loading thermal policy file with duplicate conditions in different policies
* [thermal] fix fan.get_presence for non-removable SKU
* [thermal fix] fix issue: fan direction is based on drawer
* Fix issue: when fan is not present, should not read fan direction from sysfs but directly return N/A
* [thermal fix] add unit test for get_direction for absent FAN
* Unplugable PSU has no FAN, no need add a FAN object for this PSU
* Update submodules
@Junchao-Mellanox Junchao-Mellanox deleted the thermal-fix branch May 7, 2020 06:06
@abdosi
Copy link
Contributor

abdosi commented May 28, 2020

@Junchao-Mellanox Please create PR for 201911. There is merge conflict.
Removing Request for 201911 Label

@rlhui @liat-grozovik

Junchao-Mellanox added a commit to Junchao-Mellanox/sonic-buildimage that referenced this pull request Jun 1, 2020
* [thermal control] Fix pmon docker stop issue on 3800
* [thermal fix] Fix QA test issue
* [thermal fix] change psu._get_power_available_status to psu.get_power_available_status
* [thermal fix] adjust log for PSU absence and power absence
* [thermal fix] add unit test for loading thermal policy file with duplicate conditions in different policies
* [thermal] fix fan.get_presence for non-removable SKU
* [thermal fix] fix issue: fan direction is based on drawer
* Fix issue: when fan is not present, should not read fan direction from sysfs but directly return N/A
* [thermal fix] add unit test for get_direction for absent FAN
* Unplugable PSU has no FAN, no need add a FAN object for this PSU
* Update submodules

Co-authored-by: Stephen Sun <[email protected]>
Conflicts:
	platform/mellanox/mlnx-platform-api/sonic_platform/chassis.py
	platform/mellanox/mlnx-platform-api/sonic_platform/fan.py
	platform/mellanox/mlnx-platform-api/sonic_platform/psu.py
	platform/mellanox/mlnx-platform-api/sonic_platform/thermal.py
	src/sonic-platform-common
	src/sonic-platform-daemons
mssonicbld added a commit that referenced this pull request Mar 13, 2026
…atically (#25254)

#### Why I did it
src/sonic-utilities
```
* 20a7131b - (HEAD -> master, origin/master, origin/HEAD) clear: make --namespace optional for arp and ndp commands (#4355) (5 minutes ago) [Oleksandr Ivantsiv]
* f56e4a78 - show version: replace --verbose with --brief flag (#4350) (20 hours ago) [Ashwin Srinivasan]
* 5e50cf3d - Wait for monit monitor <service> operation to complete during config (#4295) (23 hours ago) [Hemanth Kumar Tirupati]
* 0306ea20 - Change sensorshow conn to use TCP socket (#4343) (2 days ago) [Chenyang Wang]
* cb5b3e82 - Fix route_check.py redis client memory usage (#4217) (2 days ago) [Roee Bar]
* e93a5c3c - config: allow golden config to override mac, platform, asic_id (#4348) (2 days ago) [securely1g]
* 0024c8d4 - Add non -B- hwsku names as well (#4331) (2 days ago) [dakotac-arista]
* eb7301cc - Fix unit tests (#4345) (3 days ago) [william8545]
* 052199c0 - [Arista] Add Arista-7050CX3-32C-C28S4 to generic_config_updater (#4257) (4 days ago) [byu343]
* ed68290a - Add multi-ASIC namespace support for show/config subinterface(s) command (#4298) (4 days ago) [william8545]
* 9c9f099d - New CLI proposal for PHY diagnostics (#4214) (4 days ago) [Prince George]
* 9e3373df - Fix generate_dump to preserve per-ASIC subdirectory structure for sdk_dbg collection (#4334) (4 days ago) [william8545]
* 3fe8972f - Add multi-ASIC namespace support for ARP/NDP show and clear commands (#4231) (4 days ago) [Oleksandr Ivantsiv]
* be5fe2aa - Add multi-ASIC namespace support for VLAN and FDB operations (#4230) (4 days ago) [Oleksandr Ivantsiv]
* e74fca78 - Add multi-ASIC namespace support for static route commands (#4269) (4 days ago) [Oleksandr Ivantsiv]
* 599e7c71 - Add multi-ASIC namespace support for ACL table add/remove commands (#4270) (4 days ago) [Oleksandr Ivantsiv]
* d09d6cd6 - Add CLI support for "show interfaces <intf> <phy-signal/phy-serdes>" commands (#4312) (4 days ago) [prajjwal-arista]
* 345f5686 - Add multi-asic namespace support for IPv6 link-local commands (#4289) (4 days ago) [william8545]
* edd4b190 - Add multi-asic namespace support for crm show resources command (#4290) (4 days ago) [william8545]
* 2b52a051 - [multi-asic] Add namespace support for vxlan and vnet show/config commands (#4299) (4 days ago) [william8545]
* 03160905 - [fast-reboot][cosmetic] Fixed debug/error prints with the correct reboot type (#4285) (4 days ago) [Yair Raviv]
* 6eedf8a7 - [warm-reboot][multi-asic] Added error-handling for faulty ASIC/s after orchagent freeze (#4297) (4 days ago) [Yair Raviv]
* 2330bab5 - [BMC] Add new BMC CLIs for manual session management and reset root password (#4238) (4 days ago) [Ben Levi]
* 4d0cc933 - Fix issue: pmon services's restart count is not cleared during config reload (#4314) (4 days ago) [Stephen Sun]
* 0a1bbc55 - Fix the generate_dump for BCM Asic Q3D (#4326) (6 days ago) [saksarav-nokia]
* 1580ccce - GCU generates suboptimal plan for CreateOnly paths (#4335) (6 days ago) [Brad House - Nexthop]
* 369e703e - GCU: Add path tracing support (#4317) (7 days ago) [Brad House - Nexthop]
* bc05e1a4 - [GCU]: Restart telemetry container on port speed change via GCU to handle OID update (#4248) (7 days ago) [Xincun Li]
* 73f1ea51 - Fix warning messages due to nose test deprecation (#4322) (8 days ago) [Brad House - Nexthop]
* ebfefbd8 - [Arista] Add TH5 HWSKU to list for pfcwd support (#4329) (8 days ago) [dakotac-arista]
* 0d969b85 - [DPU] Add support for HA Set Counters (#4283) (8 days ago) [Connor Roos]
* 44f8c37b - [DPU] Add CLI to trigger and dump flows (#4278) (8 days ago) [Vivek]
* 76bf567e - [show interfaces] "show interfaces flap" command does not support multi-ASIC platforms (#4316) (9 days ago) [pnakka28]
* 2ec21e19 - Limit PFC WD Detection time to maximum value of 1000ms (#4306) (9 days ago) [Hemanth Kumar Tirupati]
* 99b1b76a - Modified dualtor_neighbor_check to use mux neighbor_mode (#4227) (10 days ago) [manamand2020]
* 5dfd11ed - Fix 'show version' KeyError when sonic_version.yml has missing fields (#4324) (10 days ago) [securely1g]
* 4c77f9d4 - fix: skip PORT_INGRESS/EGRESS_MIRROR_CAPABLE check for ERSPAN mirror sessions (#4323) (11 days ago) [bingwang-ms]
* d8d2a39e - fix scapy delayed import when we have large routes (#4315) (11 days ago) [Hemanth Kumar Tirupati]
* c6601cda - [LACP retry-count] Syntax Fix for Trixie (#4274) (11 days ago) [Yair Raviv]
* f54d0a7c - Add fsync to config save to persist config across power cycle (#4313) (11 days ago) [Jianyue Wu]
* e5f77f61 - Fix unit test assertions broken by spelling typo PRs (#4321) (13 days ago) [rustiqly]
* 7660b19f - Fix spelling typos in muxcable modules (#4259) (2 weeks ago) [rustiqly]
* f7d820f3 - Fix spelling typos in config/main.py (#4261) (2 weeks ago) [rustiqly]
* 244942bd - Fix spelling typos in scripts/ (#4262) (2 weeks ago) [rustiqly]
* 89001b10 - Fix spelling typos in show/ and clear/ modules (#4263) (2 weeks ago) [rustiqly]
* d6e646c2 - Fix spelling typos in config/config_mgmt.py (#4260) (2 weeks ago) [rustiqly]
* e244129c - Fix spelling typos in config/nat.py (#4258) (2 weeks ago) [rustiqly]
* 5a0c48f0 - In route_check.py, Convey the IJSON Backend using an env variable (#4294) (2 weeks ago) [venkit-nexthop]
* e2712fc1 - Fix spelling typos across utilities_common, config plugins, and misc modules (#4264) (2 weeks ago) [rustiqly]
* 4211edee - Fixed show vxlan remotemac ambiguity (#4121) (2 weeks ago) [Gnanapriya [Marvell]]
* cfd23f97 - Add FEC histograms to generate_dump output (#4244) (2 weeks ago) [Fraser Gordon]
* 8882a633 - [storm-control] Fixed show storm-control interface command display (#4122) (2 weeks ago) [Gnanapriya [Marvell]]
* 7a1e656e - [fibshow]: Fix exception when blackhole routes are present (#4189) (2 weeks ago) [Ravi Minnikanti(Marvell)]
* 2b3f14de - [marvell-teralynx] Enhance techsupport to include HWSKU configs (#4161) (3 weeks ago) [Naveen-Rampuram]
* 9cb7b3e6 - Merge pull request #4275 from tirupatihemanth/fix_scapy_lagkeepalive (3 weeks ago) [Ying Xie]
|\ 
| failure_prs.log skip_prs.log 7e54ddff - Fix delayed scapy import when we have a lot of routes (3 weeks ago) [Hemanth Kumar Tirupati]
* | cbb31f0d - [multi-asic] fix utilities_common Db helper (#4273) (3 weeks ago) [Yakiv Huryk]
* | f65ddfa2 - Prevent early exit of reboot status (#4282) (3 weeks ago) [Gagan Punathil Ellath]
* | 14840074 - [fast-reboot] Remove teamsyncd timer override by fast-boot (#4233) (3 weeks ago) [Yair Raviv]
* | a3085380 - [lag_keepalive] add `--namespace` option (#4194) (4 weeks ago) [Yair Raviv]
* | abc8bba1 - [teamd_retry_count] Add support for --namespace parameter (#4195) (4 weeks ago) [Yair Raviv]
* | c05d995c - [warm/fast-reboot] check per-ASIC FW upgrade status (#4196) (4 weeks ago) [Yair Raviv]
* | 433d01c1 - [check_db_integrity] Add NETNS environment (#4197) (4 weeks ago) [Yair Raviv]
* | 441595c7 - [centralize_database] Add --namespace option (#4198) (4 weeks ago) [Yair Raviv]
* | 0f3b5291 - [multi-asic][warm-reboot] Support warm-reboot on Multi-ASIC systems (#4199) (4 weeks ago) [Yair Raviv]
* | 28623ca9 - [multi-asic][warm_restart] add Multi-ASIC support for warm_restart commands (#4200) (4 weeks ago) [Yair Raviv]
* | 3cd228af - Add filesystem sync after plugin installation (#4251) (4 weeks ago) [Jianyue Wu]
* | 1d78c210 - Add .github/copilot-instructions.md for AI-assisted development (#4271) (4 weeks ago) [rustiqly]
* | 7895da57 - Fix dump port state CLI command crash on multi-asic platforms (#4229) (4 weeks ago) [Setu Patel]
|/ 
* bcb1d4bb - Clearing /tmp/tmp* is unsafe with parallel builds (#4268) (4 weeks ago) [Brad House - NextHop]
* 8103627e - Fix sonic-utilities submodule update failure due to ijson library (#4256) (4 weeks ago) [venkit-nexthop]
* 85becedc - [Mellanox] Add restricted sysfs to fw control list (#4240) (4 weeks ago) [Noa Or]
* 275bdc6c - Add multi-asic support for sonic-clear queue wredcounters and counter poll , --nonzero support for show queue wredcounters (#4152) (5 weeks ago) [saksarav-nokia]
* fbc85ee4 - Fix j2 files not getting packaged (#4250) (5 weeks ago) [Saikrishna Arcot]
* a9543cba - Fix route_check.py to not hog a lot of memory (#4205) (5 weeks ago) [venkit-nexthop]
* 40260d5b - Fix JsonMove._get_value to Support Both String and Integer List Indices (#4237) (5 weeks ago) [Xincun Li]
* 0a3ef184 - refactor: enhance show bfd summary command (#4242) (5 weeks ago) [Chenyang Wang]
* 7c6dfdc2 - Update the error message for sfputil debug loopback command (#4224) (5 weeks ago) [Ariz Zubair]
* f246da25 - [Fast-linkup] Added CLIs for config/show (#4182) (6 weeks ago) [Yair Raviv]
* 87703c1 - Use Singleton PlatformDataProvider to reduce module import time (#4183) (6 weeks ago) [Hemanth Kumar Tirupati]
* 0dae5f2 - [sfputil] Fix issue: should not do low power mode or reset for non-present ports (#4206) (6 weeks ago) [Junchao-Mellanox]
* 5f56518 - generate_dump: add interface FEC stats (#4093) (6 weeks ago) [Fraser Gordon]
* 2e9e81c - [GCU] Update WRED_PROFILE and BUFFER_POOL validators for GCU (#4219) (6 weeks ago) [Dev Ojha]
* 2350203 - Update bash completions for sonic-utilities commands (#4163) (6 weeks ago) [Saikrishna Arcot]
* 5052e02 - Fix the PSU show command error message on platform without psu at all (#4151) (6 weeks ago) [Yuanzhe]
* 7d9ec5d - Fix issue that namespace is not correctly fetched in Multi ASIC environment for mirror capability checking (#4159) (6 weeks ago) [Stephen Sun]
* f473b4f - Fix multi asic initialization for dump command (#4108) (6 weeks ago) [Gagan Punathil Ellath]
* 0f45e43 - Add current and configured frequency to DOM CLI (#4209) (7 weeks ago) [Ariz Zubair]
* 6f0b181 - Added counterpoll CLI support (#4106) (7 weeks ago) [Dhanasekar Rathinavel]
* 3d5bef9 - [multi-asic][Mellanox] Add multi-ASIC support for generate_dump and update FW upgrade script (#4192) (7 weeks ago) [Oleksandr Ivantsiv]
* 8451f01 - sonic-utilities: Support for clearing aggregate VOQ counters(#2001) (#4044) (8 weeks ago) [manish1-arista]
* 21f013f - Add q3d SKUs to gcu_field_operation_validators.conf.json (#4201) (8 weeks ago) [HP]
* 1a15091 - Fix multi asic connection creation (#4109) (8 weeks ago) [Gagan Punathil Ellath]
```
#### How I did it
#### How to verify it
#### Description for the changelog
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants