[action] [PR:4113] [SmartSwitch] add graceful shutdown/startup utilities and visibility#4168
Merged
mssonicbld merged 2 commits intosonic-net:202511from Jan 6, 2026
Merged
Conversation
<!--
Please make sure you've read and understood our contributing guidelines:
https://github.com/Azure/SONiC/blob/gh-pages/CONTRIBUTING.md
** Make sure all your commits include a signature generated with `git commit -s` **
If this is a bug fix, make sure your description includes "closes #xxxx",
"fixes #xxxx" or "resolves #xxxx" so that GitHub automatically closes the related
issue when the PR is merged.
If you are adding/modifying/removing any command or utility script, please also
make sure to add/modify/remove any unit tests from the tests
directory as appropriate.
If you are modifying or removing an existing 'show', 'config' or 'sonic-clear'
subcommand, or you are adding a new subcommand, please make sure you also
update the Command Line Reference Guide (doc/Command-Reference.md) to reflect
your changes.
Please provide the following information:
-->
HLD: https://github.com/sonic-net/SONiC/blob/master/doc/smart-switch/graceful-shutdown/graceful-shutdown.md
These changes build upon enhancements in sonic-net#4031
This PR adds CLI support and visibility for module-level graceful transitions (startup/shutdown/reboot) to align with the SmartSwitch/DPU lifecycle work.
#### What I did
- Added support to view module transition states (startup, shutdown, reboot) through CLI.
- Integrated with STATE_DB CHASSIS_MODULE_TABLE to display transition status, type, and elapsed time.
- Enhanced user experience with readable durations and exit codes for automation.
- Implemented comprehensive unit tests for transition visibility, parsing, and error handling.
#### How I did it
- Added a helper class to read STATE_DB entries:
- state_transition_in_progress
- transition_type
- transition_start_time
- Implemented robust error handling for missing or malformed DB entries.
- Added pytest-based unit tests using mocked state_db_connector.
#### How to verify it
- Build and install the updated sonic-utilities package on DUT
- Check Redis entries: `redis-cli -n 6 hgetall "CHASSIS_MODULE_TABLE|DPU0"`
- Run the module startup/shutdown commands
- Run unit tests
#### Sample outputs when "state_transition_in_progress"
Errors thrown when the same module transition is already in progress.
$ sudo config chassis modules shutdown DPU2;redis-cli -n 6 hgetall 'CHASSIS_MODULE_TABLE|DPU2';sudo reboot -d DPU2;redis-cli -n 6 hgetall 'CHASSIS_MODULE_TABLE|DPU2'
Shutting down chassis module DPU2
1) "desc"
2) "NVIDIA XXXXXX DPU"
3) "slot"
4) "N/A"
5) "oper_status"
6) "Online"
7) "serial"
8) "XXXXXXXXXX"
9) "transition_in_progress"
10) "True"
11) "transition_type"
12) "shutdown"
13) "transition_start_time"
14) "1763059401"
True
2025-11-13 18:43:22 - User requested rebooting device dpu2 ...
2025-11-13 18:43:23 - INFO: DPU dpu2 is in 'Online' state before reboot.
2025-11-13 18:43:23 - ERROR: state_transition_in_progress flag is already set for dpu2
#### Previous command output (if the output of a command-line utility has changed)
#### New command output (if the output of a command-line utility has changed)
$ reboot -d DPU1
True
2025-11-17 17:56:10 - User requested rebooting device dpu1 ...
2025-11-17 17:56:11 - INFO: DPU dpu1 is in 'Online' state before reboot.
2025-11-17 17:56:12 - INFO: Rebooting dpu1, ip:1X9.XXX.X00.2 gnmi_port:50XXX
2025-11-17 17:56:53 - INFO: dpu1 halted the services successfully
2025-11-17 17:58:50 - INFO: Rebooting dpu1 with reboot_type:DPU...
Collaborator
Author
|
Original PR: #4113 |
Collaborator
Author
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azpw run |
Collaborator
Author
|
/AzurePipelines run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Contributor
|
/azpw run |
Collaborator
Author
|
/AzurePipelines run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Collaborator
Author
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
HLD: https://github.com/sonic-net/SONiC/blob/master/doc/smart-switch/graceful-shutdown/graceful-shutdown.md
These changes build upon enhancements in #4031
This PR adds CLI support and visibility for module-level graceful transitions (startup/shutdown/reboot) to align with the SmartSwitch/DPU lifecycle work.
What I did
How I did it
How to verify it
redis-cli -n 6 hgetall "CHASSIS_MODULE_TABLE|DPU0"Sample outputs when "state_transition_in_progress"
Errors thrown when the same module transition is already in progress.
$ sudo config chassis modules shutdown DPU2;redis-cli -n 6 hgetall 'CHASSIS_MODULE_TABLE|DPU2';sudo reboot -d DPU2;redis-cli -n 6 hgetall 'CHASSIS_MODULE_TABLE|DPU2'
Shutting down chassis module DPU2
True
2025-11-13 18:43:22 - User requested rebooting device dpu2 ...
2025-11-13 18:43:23 - INFO: DPU dpu2 is in 'Online' state before reboot.
2025-11-13 18:43:23 - ERROR: state_transition_in_progress flag is already set for dpu2
Previous command output (if the output of a command-line utility has changed)
New command output (if the output of a command-line utility has changed)
$ reboot -d DPU1
True
2025-11-17 17:56:10 - User requested rebooting device dpu1 ...
2025-11-17 17:56:11 - INFO: DPU dpu1 is in 'Online' state before reboot.
2025-11-17 17:56:12 - INFO: Rebooting dpu1, ip:1X9.XXX.X00.2 gnmi_port:50XXX
2025-11-17 17:56:53 - INFO: dpu1 halted the services successfully
2025-11-17 17:58:50 - INFO: Rebooting dpu1 with reboot_type:DPU...