Skip to content

Commit 548e8e0

Browse files
authored
[mellanox]: Backport patches to increase critical threshold for ASIC and validate transceiver temperature (#185)
Backport new patches to increase the ASIC critical threshold from 110C to 140C, and validate the transceiver critical threshold temperature: 1. 0022-mlxsw-core-Increase-critical-threshold-for-ASIC-ther.patch torvalds/linux@b06ca3d 2. 0023-mlxsw-core-Add-validation-of-transceiver-temperature.patch torvalds/linux@57726eb This change has been verified on all Mellanox devices based on Spectrum-1, Spectrum-2, and Spectrum-3 ASIC Signed-off-by: Kebo Liu <kebol@nvidia.com>
1 parent a7c1af7 commit 548e8e0

File tree

3 files changed

+98
-0
lines changed

3 files changed

+98
-0
lines changed
Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
From f79a25e99568d19ed2cd39de4650ced66de4ab5d Mon Sep 17 00:00:00 2001
2+
From: Vadim Pasternak <vadimp@nvidia.com>
3+
Date: Thu, 31 Dec 2020 19:27:02 +0200
4+
Subject: [PATCH mlxsw/net-next 1/1] mlxsw: core: Increase critical threshold
5+
for ASIC thermal zone
6+
7+
Increase critical threshold for ASIC thermal zone from 110C to 140C
8+
according to the system hardware requirements. All the supported ASICs
9+
(SX, Spectrum1, Spectune2, Spectrum3) could be still operational with
10+
ASIC temperature below 140C.
11+
12+
According to the system requirements software thermal protection is the
13+
second level of protection, while the first level of protection should
14+
be performed by firmware. So firmware could decide to perform system
15+
thermal shutdown in case the temperature is below 140C. So firmware can
16+
decide to perform system thermal shutdown in case the temperature is
17+
below 140C. In case firmware did not perform it and ASIC temperature
18+
reached 140C, the second level of thermal protection will be performed
19+
by software.
20+
21+
Fixes: 41e760841d26 ("mlxsw: core: Replace thermal temperature trips with defines")
22+
Signed-off-by: Vadim Pasternak <vadimp@nvidia.com>
23+
---
24+
drivers/net/ethernet/mellanox/mlxsw/core_thermal.c | 2 +-
25+
1 file changed, 1 insertion(+), 1 deletion(-)
26+
27+
diff --git a/drivers/net/ethernet/mellanox/mlxsw/core_thermal.c b/drivers/net/ethernet/mellanox/mlxsw/core_thermal.c
28+
index 141e3655e211..d575aa469517 100644
29+
--- a/drivers/net/ethernet/mellanox/mlxsw/core_thermal.c
30+
+++ b/drivers/net/ethernet/mellanox/mlxsw/core_thermal.c
31+
@@ -19,7 +19,7 @@
32+
#define MLXSW_THERMAL_ASIC_TEMP_NORM 75000 /* 75C */
33+
#define MLXSW_THERMAL_ASIC_TEMP_HIGH 85000 /* 85C */
34+
#define MLXSW_THERMAL_ASIC_TEMP_HOT 105000 /* 105C */
35+
-#define MLXSW_THERMAL_ASIC_TEMP_CRIT 110000 /* 110C */
36+
+#define MLXSW_THERMAL_ASIC_TEMP_CRIT 140000 /* 140C */
37+
#define MLXSW_THERMAL_MODULE_TEMP_NORM 60000 /* 60C */
38+
#define MLXSW_THERMAL_MODULE_TEMP_HIGH 70000 /* 70C */
39+
#define MLXSW_THERMAL_MODULE_TEMP_HOT 80000 /* 80C */
40+
--
41+
2.11.0
42+
Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,54 @@
1+
From 8845c46138eca323a3fe9cd1332190f092f35888 Mon Sep 17 00:00:00 2001
2+
From: Vadim Pasternak <vadimp@nvidia.com>
3+
Date: Thu, 7 Jan 2021 12:56:21 +0200
4+
Subject: [PATCH mlxsw/backport 2/2] mlxsw: core: Add validation of transceiver
5+
temperature thresholds
6+
7+
Validate thresholds to avoid a single failure due to some transceiver
8+
unreliability. Ignore the last readouts in case warning temperature is
9+
above alarm temperature, since it can cause unexpected thermal
10+
shutdown. Stay with the previous values and refresh threshold within
11+
the next iteration.
12+
13+
This is the rare scenario, but somehow once it has been observed at a
14+
customer site.
15+
16+
Fixes: 6a79507cfe94 ("mlxsw: core: Extend thermal module with per QSFP module thermal zones")
17+
Signed-off-by: Vadim Pasternak <vadimp@nvidia.com>
18+
---
19+
drivers/net/ethernet/mellanox/mlxsw/core_thermal.c | 11 +++++++----
20+
1 file changed, 7 insertions(+), 4 deletions(-)
21+
22+
diff --git a/drivers/net/ethernet/mellanox/mlxsw/core_thermal.c b/drivers/net/ethernet/mellanox/mlxsw/core_thermal.c
23+
index 54d0e8b8d..477c3ed53 100644
24+
--- a/drivers/net/ethernet/mellanox/mlxsw/core_thermal.c
25+
+++ b/drivers/net/ethernet/mellanox/mlxsw/core_thermal.c
26+
@@ -183,6 +183,12 @@ mlxsw_thermal_module_trips_update(struct device *dev, struct mlxsw_core *core,
27+
if (err)
28+
return err;
29+
30+
+ if (crit_temp > emerg_temp) {
31+
+ dev_warn(dev, "%s : Critical threshold %d is above emergency threshold %d\n",
32+
+ tz->tzdev->type, crit_temp, emerg_temp);
33+
+ return 0;
34+
+ }
35+
+
36+
/* According to the system thermal requirements, the thermal zones are
37+
* defined with four trip points. The critical and emergency
38+
* temperature thresholds, provided by QSFP module are set as "active"
39+
@@ -197,11 +203,8 @@ mlxsw_thermal_module_trips_update(struct device *dev, struct mlxsw_core *core,
40+
tz->trips[MLXSW_THERMAL_TEMP_TRIP_NORM].temp = crit_temp;
41+
tz->trips[MLXSW_THERMAL_TEMP_TRIP_HIGH].temp = crit_temp;
42+
tz->trips[MLXSW_THERMAL_TEMP_TRIP_HOT].temp = emerg_temp;
43+
- if (emerg_temp > crit_temp)
44+
- tz->trips[MLXSW_THERMAL_TEMP_TRIP_CRIT].temp = emerg_temp +
45+
+ tz->trips[MLXSW_THERMAL_TEMP_TRIP_CRIT].temp = emerg_temp +
46+
MLXSW_THERMAL_MODULE_TEMP_SHIFT;
47+
- else
48+
- tz->trips[MLXSW_THERMAL_TEMP_TRIP_CRIT].temp = emerg_temp;
49+
50+
return 0;
51+
}
52+
--
53+
2.11.0
54+

patch/series

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -70,6 +70,8 @@ driver-ixgbe-external-phy.patch
7070
0019-mlxsw-i2c-Allow-flexible-setting-of-I2C-transactions.patch
7171
0020-mlxsw-core-Set-different-thermal-polling-time-based.patch
7272
0021-platform-x86-mlx-platform-Remove-PSU-EEPROM-configur.patch
73+
0022-mlxsw-core-Increase-critical-threshold-for-ASIC-ther.patch
74+
0023-mlxsw-core-Add-validation-of-transceiver-temperature.patch
7375
############################################################
7476
#
7577
# Internal patches will be added below (placeholder)

0 commit comments

Comments
 (0)