Skip to content

[202511][Mellanox]Add back mst driver start during firmware upgrade#26131

Merged
vmittal-msft merged 1 commit intosonic-net:202511from
gpunathilell:mst_readd
Mar 11, 2026
Merged

[202511][Mellanox]Add back mst driver start during firmware upgrade#26131
vmittal-msft merged 1 commit intosonic-net:202511from
gpunathilell:mst_readd

Conversation

@gpunathilell
Copy link
Copy Markdown
Contributor

Why I did it

Firmware upgrade was experiencing degradation in time when the MST (Mellanox Software Tools) service was not already running. Starting MST explicitly before the upgrade (with --with_i2cdev) and stopping it after the upgrade completes ensures predictable, faster firmware upgrade times on Mellanox/NVIDIA Spectrum and BlueField platforms.

Work item tracking
  • Microsoft ADO (number only):

How I did it

platform/mellanox/mlnx-fw-upgrade.j2:

  • Start MST with /usr/bin/mst start --with_i2cdev only when a firmware upgrade is actually required (inside UpgradeFW(), before RunFwUpdateCmd).
  • Introduced MST_STARTED variable (default NO_PARAM); set to YES_PARAM only when this script starts MST.
  • In Cleanup() (invoked on script exit via trap Cleanup EXIT), run mst stop only if MST_STARTED is YES_PARAM, so we do not stop MST when the script did not start it (e.g. when FW was already up to date, dry-run, or early exit).
  • Ensures MST is stopped after upgrade completion irrespective of success or failure, while avoiding unnecessary mst stop when MST was never started by the script.
  • Added ignore MST start failure option: -m / --ignore-mst-start-failure and variable IGNORE_MST_START_FAILURE. When set, a failed mst start does not exit the script (mst start ... || true); when not set, RunCmd is used and the script exits on failure. This allows environments (e.g. BlueField driver install) where MST/driver may not be ready to still attempt firmware upgrade without failing the whole flow.

platform/nvidia-bluefield/installer/install.sh.j2:

  • Invoke firmware upgrade with -m -v: mlnx-fw-upgrade.sh -m -v (and mlnx-fw-upgrade.sh -m -v -r for config reset). The -m flag ignores MST start failure so that if the driver/MST is not ready during install, the script does not exit and the upgrade can still proceed or fail on actual FW update rather than on driver start.

How to verify it

  1. Mellanox Spectrum: Run firmware upgrade (e.g. mlnx-fw-upgrade.sh -v or image upgrade flow). Confirm upgrade time is improved when MST is not pre-started; confirm mst stop runs after upgrade (check logs). Run with FW already up to date and confirm mst stop is not executed.
  2. NVIDIA BlueField: Run BFB install/upgrade that triggers mlnx-fw-upgrade.sh -m -v. Verify firmware upgrade completes in expected time; if MST start fails (e.g. driver not ready), script should continue and not exit until actual FW update is attempted. Confirm MST is started/stopped when start succeeds.
  3. Ignore MST start failure: Run mlnx-fw-upgrade.sh -m -v in an environment where mst start can fail; script should not exit on that failure and should proceed to FW upgrade (or fail on upgrade itself).
  4. Dry-run / no-upgrade paths: Run mlnx-fw-upgrade.sh -d or on QEMU/SimX; confirm script exits without calling mst stop.

Which release branch to backport (provide reason below if selected)

  • 202205
  • 202211
  • 202305
  • 202311
  • 202405
  • 202411
  • 202505

Tested branch (Please provide the tested image version)

Description for the changelog

Start MST before Mellanox/NVIDIA firmware upgrade and stop it after completion to fix upgrade time degradation; add -m to ignore MST start failure (used by BlueField installer).

Link to config_db schema for YANG module changes

N/A – no YANG/config_db changes.

A picture of a cute animal (not mandatory but encouraged)

Signed-off-by: gpunathilell <gpunathilell@nvidia.com>
@gpunathilell gpunathilell requested a review from lguohan as a code owner March 11, 2026 15:16
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run Azure.sonic-buildimage

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@vmittal-msft vmittal-msft merged commit ea17aec into sonic-net:202511 Mar 11, 2026
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants