Skip to content

[action] [PR:4113] [SmartSwitch] add graceful shutdown/startup utilities and visibility#4168

Merged
mssonicbld merged 2 commits intosonic-net:202511from
mssonicbld:cherry/202511/4113
Jan 6, 2026
Merged

[action] [PR:4113] [SmartSwitch] add graceful shutdown/startup utilities and visibility#4168
mssonicbld merged 2 commits intosonic-net:202511from
mssonicbld:cherry/202511/4113

Conversation

@mssonicbld
Copy link
Copy Markdown
Collaborator

HLD: https://github.com/sonic-net/SONiC/blob/master/doc/smart-switch/graceful-shutdown/graceful-shutdown.md
These changes build upon enhancements in #4031

This PR adds CLI support and visibility for module-level graceful transitions (startup/shutdown/reboot) to align with the SmartSwitch/DPU lifecycle work.

What I did

  • Added support to view module transition states (startup, shutdown, reboot) through CLI.
  • Integrated with STATE_DB CHASSIS_MODULE_TABLE to display transition status, type, and elapsed time.
  • Enhanced user experience with readable durations and exit codes for automation.
  • Implemented comprehensive unit tests for transition visibility, parsing, and error handling.

How I did it

  • Added a helper class to read STATE_DB entries:
    • state_transition_in_progress
    • transition_type
    • transition_start_time
  • Implemented robust error handling for missing or malformed DB entries.
  • Added pytest-based unit tests using mocked state_db_connector.

How to verify it

  • Build and install the updated sonic-utilities package on DUT
  • Check Redis entries: redis-cli -n 6 hgetall "CHASSIS_MODULE_TABLE|DPU0"
  • Run the module startup/shutdown commands
  • Run unit tests

Sample outputs when "state_transition_in_progress"

Errors thrown when the same module transition is already in progress.

$ sudo config chassis modules shutdown DPU2;redis-cli -n 6 hgetall 'CHASSIS_MODULE_TABLE|DPU2';sudo reboot -d DPU2;redis-cli -n 6 hgetall 'CHASSIS_MODULE_TABLE|DPU2'
Shutting down chassis module DPU2

  1. "desc"
  2. "NVIDIA XXXXXX DPU"
  3. "slot"
  4. "N/A"
  5. "oper_status"
  6. "Online"
  7. "serial"
  8. "XXXXXXXXXX"
  9. "transition_in_progress"
  10. "True"
  11. "transition_type"
  12. "shutdown"
  13. "transition_start_time"
  14. "1763059401"
    True
    2025-11-13 18:43:22 - User requested rebooting device dpu2 ...
    2025-11-13 18:43:23 - INFO: DPU dpu2 is in 'Online' state before reboot.
    2025-11-13 18:43:23 - ERROR: state_transition_in_progress flag is already set for dpu2

Previous command output (if the output of a command-line utility has changed)

New command output (if the output of a command-line utility has changed)

$ reboot -d DPU1
True
2025-11-17 17:56:10 - User requested rebooting device dpu1 ...
2025-11-17 17:56:11 - INFO: DPU dpu1 is in 'Online' state before reboot.
2025-11-17 17:56:12 - INFO: Rebooting dpu1, ip:1X9.XXX.X00.2 gnmi_port:50XXX
2025-11-17 17:56:53 - INFO: dpu1 halted the services successfully
2025-11-17 17:58:50 - INFO: Rebooting dpu1 with reboot_type:DPU...

<!--
    Please make sure you've read and understood our contributing guidelines:
    https://github.com/Azure/SONiC/blob/gh-pages/CONTRIBUTING.md

    ** Make sure all your commits include a signature generated with `git commit -s` **

    If this is a bug fix, make sure your description includes "closes #xxxx",
    "fixes #xxxx" or "resolves #xxxx" so that GitHub automatically closes the related
    issue when the PR is merged.

    If you are adding/modifying/removing any command or utility script, please also
    make sure to add/modify/remove any unit tests from the tests
    directory as appropriate.

    If you are modifying or removing an existing 'show', 'config' or 'sonic-clear'
    subcommand, or you are adding a new subcommand, please make sure you also
    update the Command Line Reference Guide (doc/Command-Reference.md) to reflect
    your changes.

    Please provide the following information:
-->

HLD: https://github.com/sonic-net/SONiC/blob/master/doc/smart-switch/graceful-shutdown/graceful-shutdown.md
These changes build upon enhancements in sonic-net#4031

This PR adds CLI support and visibility for module-level graceful transitions (startup/shutdown/reboot) to align with the SmartSwitch/DPU lifecycle work.

#### What I did

- Added support to view module transition states (startup, shutdown, reboot) through CLI.
- Integrated with STATE_DB CHASSIS_MODULE_TABLE to display transition status, type, and elapsed time.
- Enhanced user experience with readable durations and exit codes for automation.
- Implemented comprehensive unit tests for transition visibility, parsing, and error handling.

#### How I did it

- Added a helper class to read STATE_DB entries:
    - state_transition_in_progress
    - transition_type
    - transition_start_time
- Implemented robust error handling for missing or malformed DB entries.
- Added pytest-based unit tests using mocked state_db_connector.

#### How to verify it
- Build and install the updated sonic-utilities package on DUT
- Check Redis entries: `redis-cli -n 6 hgetall "CHASSIS_MODULE_TABLE|DPU0"`
- Run the module startup/shutdown commands
- Run unit tests

#### Sample outputs when "state_transition_in_progress"
Errors thrown when the same module transition is already in progress.

$ sudo config chassis modules shutdown DPU2;redis-cli -n 6 hgetall 'CHASSIS_MODULE_TABLE|DPU2';sudo reboot -d DPU2;redis-cli -n 6 hgetall 'CHASSIS_MODULE_TABLE|DPU2'
Shutting down chassis module DPU2
 1) "desc"
 2) "NVIDIA XXXXXX DPU"
 3) "slot"
 4) "N/A"
 5) "oper_status"
 6) "Online"
 7) "serial"
 8) "XXXXXXXXXX"
 9) "transition_in_progress"
10) "True"
11) "transition_type"
12) "shutdown"
13) "transition_start_time"
14) "1763059401"
True
2025-11-13 18:43:22 - User requested rebooting device dpu2 ...
2025-11-13 18:43:23 - INFO: DPU dpu2 is in 'Online' state before reboot.
2025-11-13 18:43:23 - ERROR: state_transition_in_progress flag is already set for dpu2

#### Previous command output (if the output of a command-line utility has changed)

#### New command output (if the output of a command-line utility has changed)
$ reboot -d DPU1
True
2025-11-17 17:56:10 - User requested rebooting device dpu1 ...
2025-11-17 17:56:11 - INFO: DPU dpu1 is in 'Online' state before reboot.
2025-11-17 17:56:12 - INFO: Rebooting dpu1, ip:1X9.XXX.X00.2 gnmi_port:50XXX
2025-11-17 17:56:53 - INFO: dpu1 halted the services successfully
2025-11-17 17:58:50 - INFO: Rebooting dpu1 with reboot_type:DPU...
@mssonicbld
Copy link
Copy Markdown
Collaborator Author

Original PR: #4113

@mssonicbld
Copy link
Copy Markdown
Collaborator Author

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@yijingyan2
Copy link
Copy Markdown

/azpw run

@mssonicbld
Copy link
Copy Markdown
Collaborator Author

/AzurePipelines run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@vvolam
Copy link
Copy Markdown
Contributor

vvolam commented Jan 5, 2026

/azpw run

@mssonicbld
Copy link
Copy Markdown
Collaborator Author

/AzurePipelines run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Copy Markdown
Collaborator Author

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld mssonicbld merged commit 59929ab into sonic-net:202511 Jan 6, 2026
7 of 8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants