Modified reboot pre-shutdown script to handle dpu side reboot#26234
Modified reboot pre-shutdown script to handle dpu side reboot#26234yxieca merged 4 commits intosonic-net:masterfrom
Conversation
Signed-off-by: Sahil Chaudhari <sahil.chaudhari@amd.com>
|
/azp run Azure.sonic-buildimage |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Signed-off-by: Sahil Chaudhari <sahil.chaudhari@amd.com>
|
/azp run Azure.sonic-buildimage |
|
Azure Pipelines successfully started running 1 pipeline(s). |
There was a problem hiding this comment.
Pull request overview
This PR updates Pensando DPU shutdown/reboot handling to better support GNOI-triggered reboot flows where the NPU may remove the PCIe/midplane connection after the initial pre-shutdown sequence, requiring the DPU to force a reboot/power-cycle path.
Changes:
- Enable a Pensando firmware reboot behavior via a sysfs knob during DPU (polaris pipeline) startup.
- Extend the platform
pre_reboot_hookto spawn a detached watchdog loop that pings the host and triggers a CPLD power cycle after consecutive reachability failures.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
files/dsc/dpu.init |
Writes panic_reboot sysfs setting during start_polaris() initialization. |
device/pensando/arm64-elba-asic-flash128-r0/pre_reboot_hook |
Adds a detached ping-and-retry loop that triggers cpldapp -pwrcycle on repeated failures. |
files/dsc/dpu.init
Outdated
| mkdir -p $HOST_DIR_POLARIS/mnt/a/mnt/work | ||
| mkdir -p $DPU_DOCKER_INFO_DIR | ||
| echo 256 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages | ||
| echo 1 > /sys/firmware/pensando/reboot/panic_reboot |
There was a problem hiding this comment.
Write to /sys/firmware/pensando/reboot/panic_reboot is unguarded. If this sysfs node is missing or not writable on some images/kernel configs, this will emit errors during boot. Consider checking -w (or -e) before writing and log a clear message when unavailable.
| echo 1 > /sys/firmware/pensando/reboot/panic_reboot | |
| if [ -w /sys/firmware/pensando/reboot/panic_reboot ]; then | |
| echo 1 > /sys/firmware/pensando/reboot/panic_reboot | |
| else | |
| log_msg "Pensando panic_reboot sysfs node not writable; skipping configuration" | |
| fi |
| docker exec "$(cat /host/dpu-docker-info/name)" /nic/bin/cpldapp -w 0xd 200 | ||
|
|
||
| # Spawn fully independent background process to ping host and trigger power cycle if unreachable | ||
| # Algorithm: If ping fails, wait 10 seconds and retry. After 3 consecutive failures, trigger power cycle. | ||
| # Using setsid + nohup to completely detach from parent process (daemonize) | ||
| DPU_CONTAINER_NAME=$(cat /host/dpu-docker-info/name) | ||
| setsid nohup bash -c " | ||
| DPU_CONTAINER='$DPU_CONTAINER_NAME' | ||
| HOST_IP='169.254.200.254' | ||
| TIMEOUT=120 | ||
| POLL_INTERVAL=5 | ||
| ELAPSED=0 | ||
| MAX_FAILURES=3 | ||
| RETRY_WAIT=10 | ||
|
|
||
| while [ \$ELAPSED -lt \$TIMEOUT ]; do | ||
| if ping -c 1 -W 1 \"\$HOST_IP\" > /dev/null 2>&1; then | ||
| echo \"Ping to \$HOST_IP successful, host is still reachable\" | tee /dev/kmsg /dev/console | ||
| else | ||
| echo \"Ping to \$HOST_IP failed, starting failure retry sequence\" | tee /dev/kmsg /dev/console | ||
| FAIL_COUNT=1 | ||
|
|
||
| # Retry loop: wait 10 seconds between each retry, up to MAX_FAILURES total attempts | ||
| while [ \$FAIL_COUNT -lt \$MAX_FAILURES ]; do | ||
| echo \"Ping failure \$FAIL_COUNT/\$MAX_FAILURES, waiting \$RETRY_WAIT seconds before retry...\" | tee /dev/kmsg /dev/console | ||
| sleep \$RETRY_WAIT | ||
|
|
||
| if ping -c 1 -W 1 \"\$HOST_IP\" > /dev/null 2>&1; then | ||
| echo \"Ping to \$HOST_IP recovered after \$FAIL_COUNT failures\" | tee /dev/kmsg /dev/console | ||
| FAIL_COUNT=0 | ||
| break | ||
| fi | ||
| FAIL_COUNT=\$((FAIL_COUNT + 1)) | ||
| done | ||
|
|
||
| # If we exhausted all retries (3 consecutive failures), trigger power cycle | ||
| if [ \$FAIL_COUNT -ge \$MAX_FAILURES ]; then | ||
| echo \"Ping failure \$FAIL_COUNT/\$MAX_FAILURES, all retries exhausted\" | tee /dev/kmsg /dev/console | ||
| echo \"Triggering cpld power cycle after \$MAX_FAILURES consecutive ping failures\" | tee /dev/kmsg /dev/console | ||
| docker exec \"\$DPU_CONTAINER\" /nic/bin/cpldapp -pwrcycle | ||
| exit 0 | ||
| fi | ||
| fi | ||
| sleep \$POLL_INTERVAL | ||
| ELAPSED=\$((ELAPSED + POLL_INTERVAL)) | ||
| done | ||
| echo \"Ping to \$HOST_IP remained successful for \$TIMEOUT seconds, no power cycle triggered\" | tee /dev/kmsg /dev/console | ||
| " </dev/null >/dev/null 2>&1 & |
There was a problem hiding this comment.
DPU_CONTAINER_NAME=$(cat /host/dpu-docker-info/name) assumes the file exists and contains a valid container name. If the file is missing/empty, the background logic will later run docker exec with an empty container name and silently fail. Please add an existence/empty check (and log) before spawning the background process.
| docker exec "$(cat /host/dpu-docker-info/name)" /nic/bin/cpldapp -w 0xd 200 | |
| # Spawn fully independent background process to ping host and trigger power cycle if unreachable | |
| # Algorithm: If ping fails, wait 10 seconds and retry. After 3 consecutive failures, trigger power cycle. | |
| # Using setsid + nohup to completely detach from parent process (daemonize) | |
| DPU_CONTAINER_NAME=$(cat /host/dpu-docker-info/name) | |
| setsid nohup bash -c " | |
| DPU_CONTAINER='$DPU_CONTAINER_NAME' | |
| HOST_IP='169.254.200.254' | |
| TIMEOUT=120 | |
| POLL_INTERVAL=5 | |
| ELAPSED=0 | |
| MAX_FAILURES=3 | |
| RETRY_WAIT=10 | |
| while [ \$ELAPSED -lt \$TIMEOUT ]; do | |
| if ping -c 1 -W 1 \"\$HOST_IP\" > /dev/null 2>&1; then | |
| echo \"Ping to \$HOST_IP successful, host is still reachable\" | tee /dev/kmsg /dev/console | |
| else | |
| echo \"Ping to \$HOST_IP failed, starting failure retry sequence\" | tee /dev/kmsg /dev/console | |
| FAIL_COUNT=1 | |
| # Retry loop: wait 10 seconds between each retry, up to MAX_FAILURES total attempts | |
| while [ \$FAIL_COUNT -lt \$MAX_FAILURES ]; do | |
| echo \"Ping failure \$FAIL_COUNT/\$MAX_FAILURES, waiting \$RETRY_WAIT seconds before retry...\" | tee /dev/kmsg /dev/console | |
| sleep \$RETRY_WAIT | |
| if ping -c 1 -W 1 \"\$HOST_IP\" > /dev/null 2>&1; then | |
| echo \"Ping to \$HOST_IP recovered after \$FAIL_COUNT failures\" | tee /dev/kmsg /dev/console | |
| FAIL_COUNT=0 | |
| break | |
| fi | |
| FAIL_COUNT=\$((FAIL_COUNT + 1)) | |
| done | |
| # If we exhausted all retries (3 consecutive failures), trigger power cycle | |
| if [ \$FAIL_COUNT -ge \$MAX_FAILURES ]; then | |
| echo \"Ping failure \$FAIL_COUNT/\$MAX_FAILURES, all retries exhausted\" | tee /dev/kmsg /dev/console | |
| echo \"Triggering cpld power cycle after \$MAX_FAILURES consecutive ping failures\" | tee /dev/kmsg /dev/console | |
| docker exec \"\$DPU_CONTAINER\" /nic/bin/cpldapp -pwrcycle | |
| exit 0 | |
| fi | |
| fi | |
| sleep \$POLL_INTERVAL | |
| ELAPSED=\$((ELAPSED + POLL_INTERVAL)) | |
| done | |
| echo \"Ping to \$HOST_IP remained successful for \$TIMEOUT seconds, no power cycle triggered\" | tee /dev/kmsg /dev/console | |
| " </dev/null >/dev/null 2>&1 & | |
| DPU_NAME_FILE="/host/dpu-docker-info/name" | |
| if [ ! -s "$DPU_NAME_FILE" ]; then | |
| LOG_MSG "DPU container name file '$DPU_NAME_FILE' is missing or empty; skipping CPLD operations" | |
| else | |
| DPU_CONTAINER_NAME=$(cat "$DPU_NAME_FILE") | |
| docker exec "$DPU_CONTAINER_NAME" /nic/bin/cpldapp -w 0xd 200 | |
| # Spawn fully independent background process to ping host and trigger power cycle if unreachable | |
| # Algorithm: If ping fails, wait 10 seconds and retry. After 3 consecutive failures, trigger power cycle. | |
| # Using setsid + nohup to completely detach from parent process (daemonize) | |
| setsid nohup bash -c " | |
| DPU_CONTAINER='$DPU_CONTAINER_NAME' | |
| HOST_IP='169.254.200.254' | |
| TIMEOUT=120 | |
| POLL_INTERVAL=5 | |
| ELAPSED=0 | |
| MAX_FAILURES=3 | |
| RETRY_WAIT=10 | |
| while [ \$ELAPSED -lt \$TIMEOUT ]; do | |
| if ping -c 1 -W 1 \"\$HOST_IP\" > /dev/null 2>&1; then | |
| echo \"Ping to \$HOST_IP successful, host is still reachable\" | tee /dev/kmsg /dev/console | |
| else | |
| echo \"Ping to \$HOST_IP failed, starting failure retry sequence\" | tee /dev/kmsg /dev/console | |
| FAIL_COUNT=1 | |
| # Retry loop: wait 10 seconds between each retry, up to MAX_FAILURES total attempts | |
| while [ \$FAIL_COUNT -lt \$MAX_FAILURES ]; do | |
| echo \"Ping failure \$FAIL_COUNT/\$MAX_FAILURES, waiting \$RETRY_WAIT seconds before retry...\" | tee /dev/kmsg /dev/console | |
| sleep \$RETRY_WAIT | |
| if ping -c 1 -W 1 \"\$HOST_IP\" > /dev/null 2>&1; then | |
| echo \"Ping to \$HOST_IP recovered after \$FAIL_COUNT failures\" | tee /dev/kmsg /dev/console | |
| FAIL_COUNT=0 | |
| break | |
| fi | |
| FAIL_COUNT=\$((FAIL_COUNT + 1)) | |
| done | |
| # If we exhausted all retries (3 consecutive failures), trigger power cycle | |
| if [ \$FAIL_COUNT -ge \$MAX_FAILURES ]; then | |
| echo \"Ping failure \$FAIL_COUNT/\$MAX_FAILURES, all retries exhausted\" | tee /dev/kmsg /dev/console | |
| echo \"Triggering cpld power cycle after \$MAX_FAILURES consecutive ping failures\" | tee /dev/kmsg /dev/console | |
| docker exec \"\$DPU_CONTAINER\" /nic/bin/cpldapp -pwrcycle | |
| exit 0 | |
| fi | |
| fi | |
| sleep \$POLL_INTERVAL | |
| ELAPSED=\$((ELAPSED + POLL_INTERVAL)) | |
| done | |
| echo \"Ping to \$HOST_IP remained successful for \$TIMEOUT seconds, no power cycle triggered\" | tee /dev/kmsg /dev/console | |
| " </dev/null >/dev/null 2>&1 & | |
| fi |
| while [ \$ELAPSED -lt \$TIMEOUT ]; do | ||
| if ping -c 1 -W 1 \"\$HOST_IP\" > /dev/null 2>&1; then | ||
| echo \"Ping to \$HOST_IP successful, host is still reachable\" | tee /dev/kmsg /dev/console |
There was a problem hiding this comment.
Ping is not bound to the midplane interface, so any alternate route to 169.254.200.254 (or cached neighbor state) could produce a false “reachable” result even after PCIe bridge removal. Consider using ping -I <midplane-iface> (or an equivalent interface-scoped check) so the decision reflects the intended link.
| # If we exhausted all retries (3 consecutive failures), trigger power cycle | ||
| if [ \$FAIL_COUNT -ge \$MAX_FAILURES ]; then | ||
| echo \"Ping failure \$FAIL_COUNT/\$MAX_FAILURES, all retries exhausted\" | tee /dev/kmsg /dev/console | ||
| echo \"Triggering cpld power cycle after \$MAX_FAILURES consecutive ping failures\" | tee /dev/kmsg /dev/console | ||
| docker exec \"\$DPU_CONTAINER\" /nic/bin/cpldapp -pwrcycle | ||
| exit 0 |
There was a problem hiding this comment.
The power-cycle trigger uses docker exec from a detached background process. During shutdown/reboot, the Docker daemon or container may be unavailable/hung, which can prevent cpldapp -pwrcycle from running. Consider invoking the CPLD utility from the host context if possible, or at least add a bounded timeout/retry around docker exec and log failures explicitly.
|
|
||
| while [ \$ELAPSED -lt \$TIMEOUT ]; do | ||
| if ping -c 1 -W 1 \"\$HOST_IP\" > /dev/null 2>&1; then | ||
| echo \"Ping to \$HOST_IP successful, host is still reachable\" | tee /dev/kmsg /dev/console | ||
| else | ||
| echo \"Ping to \$HOST_IP failed, starting failure retry sequence\" | tee /dev/kmsg /dev/console |
There was a problem hiding this comment.
This loop logs to kmsg/console every POLL_INTERVAL even when the host remains reachable, which can create noisy logs during shutdown. Consider logging only on state transitions (reachable→unreachable and vice versa) or throttling success messages.
| while [ \$ELAPSED -lt \$TIMEOUT ]; do | |
| if ping -c 1 -W 1 \"\$HOST_IP\" > /dev/null 2>&1; then | |
| echo \"Ping to \$HOST_IP successful, host is still reachable\" | tee /dev/kmsg /dev/console | |
| else | |
| echo \"Ping to \$HOST_IP failed, starting failure retry sequence\" | tee /dev/kmsg /dev/console | |
| LAST_STATE=\"unknown\" | |
| while [ \$ELAPSED -lt \$TIMEOUT ]; do | |
| if ping -c 1 -W 1 \"\$HOST_IP\" > /dev/null 2>&1; then | |
| # Log success only when transitioning to reachable state | |
| if [ \"\$LAST_STATE\" != \"up\" ]; then | |
| echo \"Ping to \$HOST_IP successful, host is still reachable\" | tee /dev/kmsg /dev/console | |
| LAST_STATE=\"up\" | |
| fi | |
| else | |
| # Log failure only when transitioning to unreachable state | |
| if [ \"\$LAST_STATE\" != \"down\" ]; then | |
| echo \"Ping to \$HOST_IP failed, starting failure retry sequence\" | tee /dev/kmsg /dev/console | |
| LAST_STATE=\"down\" | |
| fi |
rameshraghupathy
left a comment
There was a problem hiding this comment.
@SahilChaudhari LGTM besides two minor comments as copilot. 1. Please guard the panic_reboot sysfs write, bind the host ping to the intended midplane interface 2. avoid relying only on detached docker exec for the final cpldapp -pwrcycle path during shutdown/reboot. I'm approving it.
Signed-off-by: Sahil Chaudhari <sahil.chaudhari@amd.com>
|
/azp run Azure.sonic-buildimage |
|
Azure Pipelines successfully started running 1 pipeline(s). |
| DPU_NAME_FILE="/host/dpu-docker-info/name" | ||
| if [ ! -s "$DPU_NAME_FILE" ]; then | ||
| LOG_MSG "DPU container name file '$DPU_NAME_FILE' is missing or empty; skipping CPLD operations" | ||
| exit 1 |
There was a problem hiding this comment.
The script logs that it is "skipping CPLD operations" when the DPU container name file is missing/empty, but then exits with status 1. That makes this path look like a hard failure rather than a best-effort skip and can break the reboot flow on systems where /host/dpu-docker-info/name is not populated. Consider returning success (exit 0) or continuing without CPLD operations instead of exiting non-zero here.
| exit 1 |
| # Fetch NTP server IP from CONFIG_DB (typically the host/midplane IP) | ||
| NTP_SERVER_KEY=$(sonic-db-cli CONFIG_DB keys 'NTP_SERVER*' | head -1) | ||
| if [ -z "$NTP_SERVER_KEY" ]; then | ||
| LOG_MSG "ERROR: NTP_SERVER not found in CONFIG_DB, aborting pre-reboot hook" | ||
| exit 1 | ||
| fi | ||
|
|
||
| HOST_IP=$(echo "$NTP_SERVER_KEY" | cut -d'|' -f2) | ||
| if [ -z "$HOST_IP" ]; then | ||
| LOG_MSG "ERROR: Failed to extract host IP from NTP_SERVER key '$NTP_SERVER_KEY', aborting pre-reboot hook" | ||
| exit 1 | ||
| fi | ||
| LOG_MSG "Using host IP from CONFIG_DB: $HOST_IP" | ||
|
|
There was a problem hiding this comment.
HOST_IP is derived from the first NTP_SERVER* key in CONFIG_DB (keys ... | head -1). NTP_SERVER is user-configurable and may contain multiple entries (including pools/FQDNs), so the first key is not guaranteed to be the midplane/host IP you intend to monitor. This can cause false ping failures (and unintended CPLD power-cycles) if the first NTP server is not reachable via the midplane interface. Prefer a deterministic source for the host/midplane IP (e.g., explicitly use 169.254.200.254 for smartswitch DPU, read MID_PLANE_BRIDGE ip_prefix, or derive the midplane gateway from the interface configuration) instead of selecting an arbitrary NTP_SERVER entry.
| # Fetch NTP server IP from CONFIG_DB (typically the host/midplane IP) | |
| NTP_SERVER_KEY=$(sonic-db-cli CONFIG_DB keys 'NTP_SERVER*' | head -1) | |
| if [ -z "$NTP_SERVER_KEY" ]; then | |
| LOG_MSG "ERROR: NTP_SERVER not found in CONFIG_DB, aborting pre-reboot hook" | |
| exit 1 | |
| fi | |
| HOST_IP=$(echo "$NTP_SERVER_KEY" | cut -d'|' -f2) | |
| if [ -z "$HOST_IP" ]; then | |
| LOG_MSG "ERROR: Failed to extract host IP from NTP_SERVER key '$NTP_SERVER_KEY', aborting pre-reboot hook" | |
| exit 1 | |
| fi | |
| LOG_MSG "Using host IP from CONFIG_DB: $HOST_IP" | |
| # Use deterministic host/midplane IP instead of inferring from NTP_SERVER in CONFIG_DB | |
| HOST_IP="169.254.200.254" | |
| LOG_MSG "Using fixed host/midplane IP: $HOST_IP" |
Signed-off-by: Sahil Chaudhari <sahil.chaudhari@amd.com>
|
/azp run Azure.sonic-buildimage |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
@yxieca , would you help with merge? |
|
Cherry-pick PR to 202511: #26545 |
Why I did it
As a part of GNOI reboot sequence for DPU, it is calling reboot -p which does pre shutdown sequence on DPU.
Post GNOI reboot sequence completes, NPU removes bridge-midplane (PCIE) connection between NPU and DPU. Then DPU need to trigger reboot.
Work item tracking
How I did it
How to verify it
Which release branch to backport (provide reason below if selected)
Tested branch (Please provide the tested image version)
Description for the changelog
Link to config_db schema for YANG module changes
A picture of a cute animal (not mandatory but encouraged)