Skip to content
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
6 changes: 0 additions & 6 deletions doc/smart-switch/pmon/images/dpu-shutdown-seq.svg

This file was deleted.

Binary file added doc/smart-switch/pmon/images/dpu-startup-seq.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
6 changes: 0 additions & 6 deletions doc/smart-switch/pmon/images/dpu-startup-seq.svg

This file was deleted.

156 changes: 148 additions & 8 deletions doc/smart-switch/pmon/smartswitch-pmon.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
| 0.2 | 01/08/2024 | Ramesh Raghupathy | Updated API, CPI sections and addressed review comments |
| 0.3 | 02/26/2024 | Ramesh Raghupathy | Addressed review comments |
| 0.4 | 06/06/2024 | Ramesh Raghupathy | Added schema for DPU health-info and called out phase:1 and phase:2 activities for DPU health-info. Added key suffix to module reboot-cause to avoid key conflicts |
| 0.5 | 04/30/2025 | Gagan Punathil Ellath | Added Post Startup and Pre shutdown sections for DPU |

## Definitions / Abbreviations

Expand Down Expand Up @@ -78,14 +79,51 @@ The picture below highlights the PMON vertical and its association with other lo
#### DPU cold startup Sequence
* The chassis is powered up and the host is booting up.
* The switch PMON is registered with the configDB state change handler.
* If the DPU's "admin_status: down" in the configDB, the DPU will remain powered down. The default setting is "down".
* The switch PMON gets the admin up notification from the configDB
* The switch PMON invokes the platform API to power on the DPU
* DPU boots up and attaches itself to the midplane.
* If there is ignore configuration relevant to the DPU then we remove the file and restart sensord
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a line to give some context on "ignore config", which file and sensord ? And also some information on why PCIe rescan is performed?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

* PCIe rescan is performed, The relevant bus information is removed from STATE_DB if it exists
* Once SONiC is up, the state progression is updated for every state transition on the DPU_STATE table in the chassisStateDB

#### Post-startup Procedures
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we rename like "DPU post-startup handling"?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these operations "handle_pci_rescan(), pci_reattach()"applicable to the power on sequence or only to gNOI based reboot requence? Give some context on the two platform supported models and and talk about the two sets of APIs and how they work. Then give some context on why suddenly sensors come into this block.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to call out that the platforms implement these functions will only follow this and for platforms that don't implement this it is a NO-OP


Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please callout in the beginning just like Line: #284 to #290 that this is applicable only to platforms that need a full PCI rescan and should implement module.handle_pci_rescan() and self.handle_sensor_addition() # No action taken if it is not implemented. Can you also say the same in the sequence diagram? Rest all LGTM

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed with @rameshraghupathy, please mention that if it is not handled by any dedicated platform specific server and needs to be done via sonic APIs, then one should follow this otherwise it would be confusing as In our we are not needed these at all. In future, other platform might also have there own methods and it would be confusing/trickier to follow these standard including myself.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vvolam, can you please comment?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rameshraghupathy It is mentioned in line 96 If pci_reattach() is not implemented in the specific platform, then no operations are performed in this function and in line 100
if such file does not exist, the sensord restart for this module is skipped
do you want to have it present seperately here?

When a DPU module's admin state is changed from "down" to "up", the following post-startup procedures are executed:

1. **PCI Device Rescan**: The `handle_pci_rescan()` function is called to rescan and reattach PCI devices.

This function calls the platform specific `pci_reattach()` is called first, and then `get_pci_bus_info()` to get all the PCIe devices associated with the specific DPU, and removes `PCIE_DETACH_INFO` key in STATE_DB relevant to the device. If either `get_pci_bus_info()` or `pci_reattach()` is not implemented for the specific platform, there is a fallback implemented to obtain the `bus_info` from platform.json file and remove the relevant information to STATE_DB and perform platform independent pcie rescan (`echo 1 > /sys/bus/pci/rescan`)
2. **Sensor Addition**: The `handle_sensor_addition()` function is called to handle sensor-related setup.

If sensors ignore configuration exists in the sensord folder `/etc/sensors.d/ignore_{module_name}.conf` , the relevant sensord ignore configuration has to be removed and then we restart the sensord, if such file does not exist, the sensord restart for this module is skipped

##### Function Signatures

```python
def handle_pci_rescan(self):
"""
Handles PCI device rescan by updating state database and reattaching device.
If pci_reattach is not implemented, falls back to platform.json based rescan.

Returns:
bool: True if operation was successful, False otherwise
"""
```

```python
def handle_sensor_addition(self):
"""
Handles sensor addition by removing the ignore configuration file from
sensors.d directory and restarting sensord.

Returns:
bool: True if operation was successful, False otherwise
"""
```


### DPU startup sequence diagram
<p align="center"><img src="./images/dpu-startup-seq.svg"></p>
<p align="center"><img src="./images/dpu-startup-seq.png"></p>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the sequence diagram show the module.,py platform API vertical and if PCIe is not of significance please remove it.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you show what triggers the post startup sequence int he seq.diag? Also, can you check the order of deleting sensors vs pci-rescan?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

order during shutdown:
sensors removal(by adding ignore configuration), then pcie device is removed
order during startup:
pcie rescan is performed, then we re-add the sensors (by removing the ignore configuration)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gpunathilell Please show that this is applicable only to platforms that need a full PCI rescan and should implement module.handle_pci_rescan() and self.handle_sensor_addition() # No action taken if it is not implemented. Rest all LGTM


#### 2.1.1 DPUs in dark mode
* A smartswitch when configured to boot up with all the DPUs in it are powered down upon boot up is referred as DPUs in dark mode.
Expand Down Expand Up @@ -153,19 +191,119 @@ Key: "CHASSIS_MODULE|DPU0"
#### Use case
* Switch: Maintenance, Critical alarm, RMA
* DPU: Maintenance, Critical alarm, Service migration, RMA

#### Pre-shutdown Procedure

The switch has to prepare for the DPUs being powered off.
* The PCIe devices associated with the DPU are removed
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you move these two points below the line "When a DPU module's admin state is set to "down", the following pre-shutdown procedures are executed:"

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

* The sensors which are attached to the DPU (reporting its values to the switch) are no longer functional

Since the DPU specific PCI devices are removed, the PCIeDaemon which is running on the switch should not create warning logs pertaining to these PCI IDs, the sensord daemon should not create new error logs.
During the graceful shutdown procedure, We need to notify pciedaemon that the PCIE devices have been removed, and sensord should ignore the relevant sensors so that we can remove

When a DPU module's admin state is set to "down", the following pre-shutdown procedures are executed:

* **PCI Device Removal**: The `handle_pci_removal()` function is called to properly detach PCI devices from the system.

This function calls `get_pci_bus_info()` to get all the PCIe devices associated with the specific DPU, and adds `PCIE_DETACH_INFO` key in STATE_DB relevant to the device, after all the device information is added to STATE_DB, the platform specific `pci_detach()` is called. If either `get_pci_bus_info()` or `pci_detach()` is not implemented for the specific platform, there is a fallback implemented to obtain the `bus_info` from platform.json file and add the relevant information to STATE_DB and perform platform independent pcie detachment (`echo 1 > /sys/bus/pci/devices/{pci_bus}/remove`)
* **Sensor Removal**: The `handle_sensor_removal()` function is called to handle sensor-related cleanup.

If sensors have to be ignored on DPU shutdown, the relevant sensord ignore configuration has to be added to the device folder in sonic-buildimage, `sonic-buildimage/device/<Platform>/<device>/dpu_ignore_conf`, after build this is moved to the following folder in PMON: `/usr/share/sonic/platform/dpu_ignore_conf`. The ignore configuration for a specific DPU should follow the following format: `ignore_<Module_Name>.conf`. If this file exists for a specific DPU Module, then this is copied to `/etc/sensors.d/ignore_{module_name}.conf` and then we restart sensord. If the file does not exist, then we skip further processing for this function

##### Function Signatures

```python
def get_pci_bus_info(self):
"""
Retrieves the bus information.

Returns:
Returns the PCI bus information in list of BDF format like "[DDDD:]BB:SS:F"
"""
```

```python
def handle_pci_removal(self):
"""
Handles PCI device removal by updating state database and detaching device.
If pci_detach is not implemented, falls back to platform.json based removal.

Returns:
bool: True if operation was successful, False otherwise
"""
```

```python
def handle_sensor_removal(self):
"""
Handles sensor removal by copying ignore configuration file from platform folder
looks for ignore configuration in:
/usr/share/sonic/platform/dpu_ignore_conf/ignore_{module_name}.conf
to sensors.d directory and restarting sensord if the file exists.

Returns:
bool: True if operation was successful, False otherwise
"""
```


#### Implementation Details

```
Sample platform.json configuration

"DPUS": {
"DPU0": {
"bus_info": "XXXX:XX:XX.X"
},
"DPU1": {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this intentional and what does it signify?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed, as this configuration is not used from Platform.json

}
},
```
```
PCIE_DETACH_INFO STATE_DB TABLE

"PCIE_DETACH_INFO|[DDDD:]BB:SS.F": {
"value": {
"dpu_state": "detaching",
"bus_info" : "[DDDD:]BB:SS.F"
}
}
```

These functions are called by chassisd when we perform admin state changes by changing config_db. The platform implementation should call these functions at the appropriate times during the admin state change process.

The implementation in chassisd will follow this sequence:

```python
def set_admin_state(self, up):
if up:
module.set_admin_state(up)
module.handle_pci_rescan()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should call out that these functions are NO-OPs for platforms don't need them.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added as comment in the set admin state call

self.handle_sensor_addition()
else:
self.handle_pci_removal()
self.handle_sensor_removal()
module.set_admin_state(down)
return
```
#### DPU shutdown sequence
* There could be two possible sources for DPU shutdown. 1. A configuration change to DPU "admin_status: down" 2. The GNOI logic can trigger it.
* In the first option the "admin_status: down" configDB status change event will trigger chassisd as it is subscribed to the event
* The switch PMON will invoke the module class API "set_admin_state(self, up):" with the state being "down" and the platform in turn will call its API to gracefully shutdown the DPU.
* The GNOI server runs on the DPU even after the DPU is pre-shutdown and listens until the graceful shutdown finishes.
* The host sends a GNOI signal to shutdown the DPU. The DPU does a graceful-shutdown if not already done and sends an ack back to the host.
* Upon receiving the ack or on a timeout the host may trigger the switch PMON vendor API to shutdown the DPU.
* If a vendor specific API is not defined, detachment is done via sysfs (echo 1 > /sys/bus/pci/devices/XXXX:XX:XX.X/remove).
* NPU-DPU (GNOI) soft reboot workflow is captured in [reboot-hld.md](https://github.com/sonic-net/SONiC/blob/26f3f4e282f3d2bd4a5c684608897850354f5c30/doc/smart-switch/reboot/reboot-hld.md)
* In the first option the "admin_status: down" configDB status change event will send a message to the switch PMON.
* The switch PMON will invoke the module class API "set_admin_state(self, up):" with the state being "down" and the platform in turn will call its API to gracefully shutdown the DPU.
* The PCIE device is added to `PCIE_DETACH_INFO` table and we remove the pcie device. If a vendor specific API is not defined, detachment is done via sysfs (echo 1 > /sys/bus/pci/devices/XXXX:XX:XX.X/remove).
* sensord is restarted if we need to ignore some sensors
* Vendor specific DPU shutdown is initiated
* The DPU upon receiving the shutdown message will do a graceful shutdown and send an ack back. The DPU graceful shutdown is vendor specific. The DPU power will be turned off after the graceful shutdown. In case of timeout the platform will force power down.
* The switch upon receiving the ack or on a timeout will remove the DPU from the bridge and PCIe tree.
* NPU-DPU (GNOI) soft reboot workflow is captured in [reboot-hld.md](https://github.com/sonic-net/SONiC/blob/26f3f4e282f3d2bd4a5c684608897850354f5c30/doc/smart-switch/reboot/reboot-hld.md)

### DPU shutdown sequence diagram
<p align="center"><img src="./images/dpu-shutdown-seq.svg"></p>
<p align="center"><img src="./images/dpu-shutdown-seq.png"></p>



### Restart
#### Definition
Expand Down Expand Up @@ -1175,5 +1313,7 @@ Note:
Progress of FPD operation and any failures would be displayed on the console with appropriate levels of severity
```



## 4. Test Plan
[Test Plan](https://github.com/nissampa/sonic-mgmt_dpu_test/blob/dpu_test_plan_draft_pr/docs/testplan/Smartswitch-test-plan.md)