Skip to content

[ssw] clean up DPU_APPL_DB and DPU_STATE_DB for DPU swss restart or DPU reboot #25187

Merged
kperumalbfn merged 5 commits intosonic-net:masterfrom
zjswhhh:dpu_restart
Feb 19, 2026
Merged

[ssw] clean up DPU_APPL_DB and DPU_STATE_DB for DPU swss restart or DPU reboot #25187
kperumalbfn merged 5 commits intosonic-net:masterfrom
zjswhhh:dpu_restart

Conversation

@zjswhhh
Copy link
Contributor

@zjswhhh zjswhhh commented Jan 26, 2026

reference: https://github.com/sonic-net/sonic-buildimage/blob/master/src/sonic-yang-models/yang-models/sonic-device_metadata.yang#L197

Why I did it

For sonic-net/SONiC#2175

sign-off: Jing Zhang [email protected]

Work item tracking
  • Microsoft ADO (number only):

How I did it

  1. Check if DPU_APPL_DB is pingable
  2. If yes, flush all DB entries when swss starts.

How to verify it

Tested on ssw testbed.

  • Entries got removed.
$ redis-cli -p 6381 -n 15 
127.0.0.1:6381[15]> keys *
1) "BFD_SESSION_TABLE:default:default:10.1.0.32"
2) "BFD_SESSION_TABLE:default:default:10.1.2.32"
3) "BFD_SESSION_TABLE:default:default:10.1.1.32"
127.0.0.1:6381[15]> keys *
(empty array)
$ redis-cli  -p 6381 MONITOR
OK

... ...
1769395816.952661 [18 169.254.200.1:34286] "PING"
1769395816.952718 [16 169.254.200.1:34280] "PING"
1769395816.954712 [17 169.254.200.1:34296] "SELECT" "17"
1769395816.954877 [17 169.254.200.1:34296] "PING"
1769395817.183169 [15 169.254.200.1:34310] "SELECT" "15"
1769395817.183386 [15 169.254.200.1:34310] "PING"
1769395817.210425 [15 169.254.200.1:34320] "SELECT" "15"
1769395817.210602 [15 169.254.200.1:34320] "FLUSHDB"
1769395817.229074 [17 169.254.200.1:34322] "SELECT" "17"
1769395817.229309 [17 169.254.200.1:34322] "FLUSHDB"

Which release branch to backport (provide reason below if selected)

  • 202305
  • 202311
  • 202405
  • 202411
  • 202505
  • 202511

Tested branch (Please provide the tested image version)

Description for the changelog

Link to config_db schema for YANG module changes

A picture of a cute animal (not mandatory but encouraged)

@zjswhhh zjswhhh requested a review from lguohan as a code owner January 26, 2026 02:56
Copilot AI review requested due to automatic review settings January 26, 2026 02:56
@mssonicbld
Copy link
Collaborator

/azp run Azure.sonic-buildimage

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run Azure.sonic-buildimage

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request adds database cleanup logic for DPU (Data Processing Unit) remote databases during swss service restart. The implementation checks if DPU_APPL_DB is reachable and flushes both DPU_APPL_DB and DPU_STATE_DB when swss starts, ensuring a clean state after DPU swss restart or DPU reboot.

Changes:

  • Added conditional check to detect if DPU_APPL_DB is pingable before flushing
  • Implemented flush logic for DPU_APPL_DB and DPU_STATE_DB within the existing warm boot protection block
  • Added debug logging for DPU database cleanup operations

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@prabhataravind
Copy link
Contributor

@zjswhhh can we make sure we do this check only for smartswitch DPUs and not for any other platforms to avoid unnecessary delays in swss startup? Please also confirm that the change flushes only the DBs associated with the DPU that restarted and that the rest of the online DPU databases are unaffected..

@mssonicbld
Copy link
Collaborator

/azp run Azure.sonic-buildimage

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copilot AI review requested due to automatic review settings January 28, 2026 01:46
@mssonicbld
Copy link
Collaborator

/azp run Azure.sonic-buildimage

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@zjswhhh
Copy link
Contributor Author

zjswhhh commented Jan 28, 2026

Hi @prabhataravind - please help review again.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot AI review requested due to automatic review settings January 28, 2026 20:06
@mssonicbld
Copy link
Collaborator

/azp run Azure.sonic-buildimage

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@mssonicbld
Copy link
Collaborator

/azp run Azure.sonic-buildimage

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 2 comments.

Comment on lines +419 to +422
$SONIC_DB_CLI DPU_APPL_DB FLUSHDB
$SONIC_DB_CLI DPU_STATE_DB FLUSHDB
$SONIC_DB_CLI DPU_APPL_STATE_DB FLUSHDB
$SONIC_DB_CLI DPU_COUNTERS_DB FLUSHDB
Copy link

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The DPU DB flushes are performed without checking the return code. Since these can target the remote_redis instance, a transient connectivity issue could leave stale entries behind without any clear indication of which FLUSHDB failed. Consider checking each FLUSHDB command’s exit status and logging failures (or failing fast) to make swss restart behavior deterministic and debuggable.

Suggested change
$SONIC_DB_CLI DPU_APPL_DB FLUSHDB
$SONIC_DB_CLI DPU_STATE_DB FLUSHDB
$SONIC_DB_CLI DPU_APPL_STATE_DB FLUSHDB
$SONIC_DB_CLI DPU_COUNTERS_DB FLUSHDB
if ! $SONIC_DB_CLI DPU_APPL_DB FLUSHDB; then
debug "Failed to flush DPU_APPL_DB via FLUSHDB"
exit 1
fi
if ! $SONIC_DB_CLI DPU_STATE_DB FLUSHDB; then
debug "Failed to flush DPU_STATE_DB via FLUSHDB"
exit 1
fi
if ! $SONIC_DB_CLI DPU_APPL_STATE_DB FLUSHDB; then
debug "Failed to flush DPU_APPL_STATE_DB via FLUSHDB"
exit 1
fi
if ! $SONIC_DB_CLI DPU_COUNTERS_DB FLUSHDB; then
debug "Failed to flush DPU_COUNTERS_DB via FLUSHDB"
exit 1
fi

Copilot uses AI. Check for mistakes.
@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@zjswhhh
Copy link
Contributor Author

zjswhhh commented Feb 18, 2026

/azpw ms_conflict

@mssonicbld
Copy link
Collaborator

/azp run Azure.sonic-buildimage

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

zjswhhh and others added 5 commits February 19, 2026 07:07
Signed-off-by: Jing Zhang <[email protected]>
Signed-off-by: Jing Zhang <[email protected]>
Signed-off-by: Jing Zhang <[email protected]>
Copilot AI review requested due to automatic review settings February 18, 2026 23:07
@mssonicbld
Copy link
Collaborator

/azp run Azure.sonic-buildimage

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated no new comments.

@kperumalbfn kperumalbfn merged commit 0250a88 into sonic-net:master Feb 19, 2026
29 of 30 checks passed
@mssonicbld
Copy link
Collaborator

Cherry-pick PR to 202511: #25578

FengPan-Frank pushed a commit to FengPan-Frank/sonic-buildimage that referenced this pull request Mar 6, 2026
…PU reboot (sonic-net#25187)

[ssw] clean up DPU_APPL_DB and DPU_STATE_DB for DPU swss restart or DPU reboot  (sonic-net#25187)

Signed-off-by: Feng Pan <[email protected]>
dprital pushed a commit that referenced this pull request Mar 19, 2026
…PU reboot (#25187)

[ssw] clean up DPU_APPL_DB and DPU_STATE_DB for DPU swss restart or DPU reboot  (#25187)

Signed-off-by: dprital <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.