Skip to content

[Mellanox] feed module info to hw-management#22

Closed
jianyuewu wants to merge 1 commit intomasterfrom
master_innolight_thermal_algo
Closed

[Mellanox] feed module info to hw-management#22
jianyuewu wants to merge 1 commit intomasterfrom
master_innolight_thermal_algo

Conversation

@jianyuewu
Copy link
Owner

Why I did it

At 40°C ambient temperature with current FW+SW, some modules have >7.6% probability of reaching 75°C, which triggers false temperature warnings.
This PR implements vendor-specific temperature threshold support to eliminate false warnings while maintaining accurate temperature telemetry for monitoring purposes.

How I did it

Implemented new API for vendor-specific temperature offset adjustments:

  1. New API:

    • Add get_vendor_info() API with caching support.
  2. Smart Module Detection:

    • Cache vendor information (Manufacturer + Part Number) for each module.
    • Skip redundant vendor info updates when the same module is replugged.

How to verify it

  1. Plug in optical module -> Verify vendor info sent to HW-MGMT.
  2. Unplug and replug same module -> Verify no redundant vendor info update.
  3. Replace with different module -> Verify new vendor info sent.

Which release branch to backport (provide reason below if selected)

  • 202412
  • 202511

Tested branch (Please provide the tested image version)

202412

A picture of a cute animal (not mandatory but encouraged)

    /\_/\  
   ( o.o ) 
    > ^ <
   /|   |\
  (_|   |_)
   Cool Cat~

@jianyuewu jianyuewu changed the title [Mellanox] Feed module info to hw-management [Mellanox] feed module info to hw-management Dec 30, 2025
@jianyuewu jianyuewu force-pushed the master_innolight_thermal_algo branch 2 times, most recently from 809a338 to 2eb140a Compare December 31, 2025 03:19
On first detection or module replacement, if the serial number (SN) has changed,
call vendor_data_set_module() with the manufacturer (MFG) and part number (PN)
to send the vendor info to hw-management.

Sample output like:
NOTICE pmon#thermalctld: Module 0 vendor info updated \
- manufacturer: NVIDIA part_number: MCP4Y10-N001

Signed-off-by: Jianyue Wu <[email protected]>
@jianyuewu jianyuewu force-pushed the master_innolight_thermal_algo branch from dda7387 to 3250495 Compare December 31, 2025 06:49
@jianyuewu jianyuewu closed this Dec 31, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant