Skip to content

[Nvidia-Bluefield] Change sonic-bfb-installer reboot flow to fix pmon sensor errors#24783

Merged
liat-grozovik merged 1 commit intosonic-net:masterfrom
tirupatihemanth:bfb_fix
Jan 14, 2026
Merged

[Nvidia-Bluefield] Change sonic-bfb-installer reboot flow to fix pmon sensor errors#24783
liat-grozovik merged 1 commit intosonic-net:masterfrom
tirupatihemanth:bfb_fix

Conversation

@tirupatihemanth
Copy link
Contributor

Why I did it

Fix transient errors during bfb install on smartswitch platform.

ERR pmon#sensord: Error getting sensor data: mp2975/#16: Kernel interface error
Work item tracking
  • Microsoft ADO (number only):

How I did it

Use pre-shutdown procedures before doing a reboot

How to verify it

Installation of bfb image on dpu from switch shouldn't cause errors

Which release branch to backport (provide reason below if selected)

  • 202205
  • 202211
  • 202305
  • 202311
  • 202405
  • 202411
  • 202505

Tested branch (Please provide the tested image version)

Description for the changelog

Link to config_db schema for YANG module changes

A picture of a cute animal (not mandatory but encouraged)

Copilot AI review requested due to automatic review settings December 9, 2025 17:54
@mssonicbld
Copy link
Collaborator

/azp run Azure.sonic-buildimage

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR modifies the sonic-bfb-installer script to fix transient sensor errors during BFB installation on Mellanox smartswitch platforms by implementing a more graceful DPU reset procedure.

Key Changes:

  • Introduces a new reset flow that uses pre-shutdown and post-startup procedures before rebooting DPUs
  • Adds fallback logic to maintain backward compatibility with the existing dpuctl-based reset method

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@tirupatihemanth tirupatihemanth marked this pull request as draft December 9, 2025 18:57
@tirupatihemanth tirupatihemanth changed the title [Mellanox] Change sonic-bfb-installer reboot flow to fix pmon sensor errors [SmartSwitch] Change sonic-bfb-installer reboot flow to fix pmon sensor errors Dec 9, 2025
@tirupatihemanth tirupatihemanth marked this pull request as ready for review December 10, 2025 18:38
@tirupatihemanth tirupatihemanth marked this pull request as draft December 10, 2025 18:39
@mssonicbld
Copy link
Collaborator

/azp run Azure.sonic-buildimage

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run Azure.sonic-buildimage

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@tirupatihemanth tirupatihemanth marked this pull request as ready for review December 16, 2025 01:48
Copilot AI review requested due to automatic review settings December 16, 2025 01:48
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@mssonicbld
Copy link
Collaborator

/azp run Azure.sonic-buildimage

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Signed-off-by: Hemanth Kumar Tirupati <htirupati@nvidia.com>
Copilot AI review requested due to automatic review settings December 16, 2025 02:07
@mssonicbld
Copy link
Collaborator

/azp run Azure.sonic-buildimage

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@liat-grozovik
Copy link
Collaborator

@oleksandrivantsiv can you please help to review as well?

@KrisNey-MSFT
Copy link

hi @oleksandrivantsiv - would you have time to review pls? TY!

@liat-grozovik liat-grozovik merged commit 064b16b into sonic-net:master Jan 14, 2026
13 checks passed
@liat-grozovik liat-grozovik changed the title [SmartSwitch] Change sonic-bfb-installer reboot flow to fix pmon sensor errors [Nvidia-Bluefield] Change sonic-bfb-installer reboot flow to fix pmon sensor errors Jan 14, 2026
@mssonicbld
Copy link
Collaborator

Cherry-pick PR to 202511: #25276

FengPan-Frank pushed a commit to FengPan-Frank/sonic-buildimage that referenced this pull request Mar 6, 2026
… sensor errors (sonic-net#24783)

- Why I did it
Fix transient errors during bfb install on smartswitch platform.

ERR pmon#sensord: Error getting sensor data: mp2975/sonic-net#16: Kernel interface error

- How I did it
Use pre-shutdown procedures before doing a reboot

- How to verify it
Installation of bfb image on dpu from switch shouldn't cause errors

Signed-off-by: Hemanth Kumar Tirupati <htirupati@nvidia.com>
Signed-off-by: Feng Pan <fenpan@microsoft.com>
dprital pushed a commit that referenced this pull request Mar 19, 2026
… sensor errors (#24783)

- Why I did it
Fix transient errors during bfb install on smartswitch platform.

ERR pmon#sensord: Error getting sensor data: mp2975/#16: Kernel interface error

- How I did it
Use pre-shutdown procedures before doing a reboot

- How to verify it
Installation of bfb image on dpu from switch shouldn't cause errors

Signed-off-by: Hemanth Kumar Tirupati <htirupati@nvidia.com>
Signed-off-by: dprital <drorp@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants