-
Notifications
You must be signed in to change notification settings - Fork 1.3k
[Smartswitch][Pmon]Changes for Post startup and pre shutdown #1980
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 2 commits
f60b6aa
4b8708f
7801cd1
aeaf957
fd1dd78
8dbc66b
8803db2
96dc769
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
This file was deleted.
This file was deleted.
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -6,6 +6,7 @@ | |
| | 0.2 | 01/08/2024 | Ramesh Raghupathy | Updated API, CPI sections and addressed review comments | | ||
| | 0.3 | 02/26/2024 | Ramesh Raghupathy | Addressed review comments | | ||
| | 0.4 | 06/06/2024 | Ramesh Raghupathy | Added schema for DPU health-info and called out phase:1 and phase:2 activities for DPU health-info. Added key suffix to module reboot-cause to avoid key conflicts | | ||
| | 0.5 | 04/30/2025 | Gagan Punathil Ellath | Added Post Startup and Pre shutdown sections for DPU | | ||
|
|
||
| ## Definitions / Abbreviations | ||
|
|
||
|
|
@@ -78,14 +79,51 @@ The picture below highlights the PMON vertical and its association with other lo | |
| #### DPU cold startup Sequence | ||
| * The chassis is powered up and the host is booting up. | ||
| * The switch PMON is registered with the configDB state change handler. | ||
| * If the DPU's "admin_status: down" in the configDB, the DPU will remain powered down. The default setting is "down". | ||
| * The switch PMON gets the admin up notification from the configDB | ||
| * The switch PMON invokes the platform API to power on the DPU | ||
| * DPU boots up and attaches itself to the midplane. | ||
| * If there is ignore configuration relevant to the DPU then we remove the file and restart sensord | ||
|
||
| * PCIe rescan is performed, The relevant bus information is removed from STATE_DB if it exists | ||
| * Once SONiC is up, the state progression is updated for every state transition on the DPU_STATE table in the chassisStateDB | ||
|
|
||
| #### Post-startup Procedures | ||
|
||
|
|
||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please callout in the beginning just like Line: #284 to #290 that this is applicable only to platforms that need a full PCI rescan and should implement module.handle_pci_rescan() and self.handle_sensor_addition() # No action taken if it is not implemented. Can you also say the same in the sequence diagram? Rest all LGTM There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agreed with @rameshraghupathy, please mention that if it is not handled by any dedicated platform specific server and needs to be done via sonic APIs, then one should follow this otherwise it would be confusing as In our we are not needed these at all. In future, other platform might also have there own methods and it would be confusing/trickier to follow these standard including myself. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @vvolam, can you please comment?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @rameshraghupathy It is mentioned in line 96 |
||
| When a DPU module's admin state is changed from "down" to "up", the following post-startup procedures are executed: | ||
|
|
||
| 1. **PCI Device Rescan**: The `handle_pci_rescan()` function is called to rescan and reattach PCI devices. | ||
|
|
||
| This function calls the platform specific `pci_reattach()` is called first, and then `get_pci_bus_info()` to get all the PCIe devices associated with the specific DPU, and removes `PCIE_DETACH_INFO` key in STATE_DB relevant to the device. If either `get_pci_bus_info()` or `pci_reattach()` is not implemented for the specific platform, there is a fallback implemented to obtain the `bus_info` from platform.json file and remove the relevant information to STATE_DB and perform platform independent pcie rescan (`echo 1 > /sys/bus/pci/rescan`) | ||
| 2. **Sensor Addition**: The `handle_sensor_addition()` function is called to handle sensor-related setup. | ||
|
|
||
| If sensors ignore configuration exists in the sensord folder `/etc/sensors.d/ignore_{module_name}.conf` , the relevant sensord ignore configuration has to be removed and then we restart the sensord, if such file does not exist, the sensord restart for this module is skipped | ||
|
|
||
| ##### Function Signatures | ||
vvolam marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| ```python | ||
| def handle_pci_rescan(self): | ||
| """ | ||
| Handles PCI device rescan by updating state database and reattaching device. | ||
| If pci_reattach is not implemented, falls back to platform.json based rescan. | ||
vvolam marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| Returns: | ||
| bool: True if operation was successful, False otherwise | ||
| """ | ||
| ``` | ||
|
|
||
| ```python | ||
| def handle_sensor_addition(self): | ||
| """ | ||
| Handles sensor addition by removing the ignore configuration file from | ||
| sensors.d directory and restarting sensord. | ||
|
|
||
| Returns: | ||
| bool: True if operation was successful, False otherwise | ||
| """ | ||
| ``` | ||
|
|
||
|
|
||
| ### DPU startup sequence diagram | ||
| <p align="center"><img src="./images/dpu-startup-seq.svg"></p> | ||
| <p align="center"><img src="./images/dpu-startup-seq.png"></p> | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In the sequence diagram show the module.,py platform API vertical and if PCIe is not of significance please remove it.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you show what triggers the post startup sequence int he seq.diag? Also, can you check the order of deleting sensors vs pci-rescan?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. order during shutdown:
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @gpunathilell Please show that this is applicable only to platforms that need a full PCI rescan and should implement module.handle_pci_rescan() and self.handle_sensor_addition() # No action taken if it is not implemented. Rest all LGTM |
||
|
|
||
| #### 2.1.1 DPUs in dark mode | ||
| * A smartswitch when configured to boot up with all the DPUs in it are powered down upon boot up is referred as DPUs in dark mode. | ||
|
|
@@ -153,19 +191,119 @@ Key: "CHASSIS_MODULE|DPU0" | |
| #### Use case | ||
| * Switch: Maintenance, Critical alarm, RMA | ||
| * DPU: Maintenance, Critical alarm, Service migration, RMA | ||
|
|
||
| #### Pre-shutdown Procedure | ||
vvolam marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| The switch has to prepare for the DPUs being powered off. | ||
vvolam marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| * The PCIe devices associated with the DPU are removed | ||
|
||
| * The sensors which are attached to the DPU (reporting its values to the switch) are no longer functional | ||
|
|
||
| Since the DPU specific PCI devices are removed, the PCIeDaemon which is running on the switch should not create warning logs pertaining to these PCI IDs, the sensord daemon should not create new error logs. | ||
vvolam marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| During the graceful shutdown procedure, We need to notify pciedaemon that the PCIE devices have been removed, and sensord should ignore the relevant sensors so that we can remove | ||
|
|
||
| When a DPU module's admin state is set to "down", the following pre-shutdown procedures are executed: | ||
|
|
||
| * **PCI Device Removal**: The `handle_pci_removal()` function is called to properly detach PCI devices from the system. | ||
|
|
||
| This function calls `get_pci_bus_info()` to get all the PCIe devices associated with the specific DPU, and adds `PCIE_DETACH_INFO` key in STATE_DB relevant to the device, after all the device information is added to STATE_DB, the platform specific `pci_detach()` is called. If either `get_pci_bus_info()` or `pci_detach()` is not implemented for the specific platform, there is a fallback implemented to obtain the `bus_info` from platform.json file and add the relevant information to STATE_DB and perform platform independent pcie detachment (`echo 1 > /sys/bus/pci/devices/{pci_bus}/remove`) | ||
| * **Sensor Removal**: The `handle_sensor_removal()` function is called to handle sensor-related cleanup. | ||
|
|
||
| If sensors have to be ignored on DPU shutdown, the relevant sensord ignore configuration has to be added to the device folder in sonic-buildimage, `sonic-buildimage/device/<Platform>/<device>/dpu_ignore_conf`, after build this is moved to the following folder in PMON: `/usr/share/sonic/platform/dpu_ignore_conf`. The ignore configuration for a specific DPU should follow the following format: `ignore_<Module_Name>.conf`. If this file exists for a specific DPU Module, then this is copied to `/etc/sensors.d/ignore_{module_name}.conf` and then we restart sensord. If the file does not exist, then we skip further processing for this function | ||
|
|
||
| ##### Function Signatures | ||
|
|
||
| ```python | ||
| def get_pci_bus_info(self): | ||
| """ | ||
| Retrieves the bus information. | ||
|
|
||
| Returns: | ||
| Returns the PCI bus information in list of BDF format like "[DDDD:]BB:SS:F" | ||
| """ | ||
| ``` | ||
|
|
||
| ```python | ||
| def handle_pci_removal(self): | ||
| """ | ||
| Handles PCI device removal by updating state database and detaching device. | ||
| If pci_detach is not implemented, falls back to platform.json based removal. | ||
vvolam marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| Returns: | ||
| bool: True if operation was successful, False otherwise | ||
| """ | ||
| ``` | ||
|
|
||
| ```python | ||
| def handle_sensor_removal(self): | ||
| """ | ||
| Handles sensor removal by copying ignore configuration file from platform folder | ||
| looks for ignore configuration in: | ||
| /usr/share/sonic/platform/dpu_ignore_conf/ignore_{module_name}.conf | ||
| to sensors.d directory and restarting sensord if the file exists. | ||
|
|
||
| Returns: | ||
| bool: True if operation was successful, False otherwise | ||
| """ | ||
| ``` | ||
|
|
||
|
|
||
| #### Implementation Details | ||
|
|
||
| ``` | ||
| Sample platform.json configuration | ||
|
|
||
| "DPUS": { | ||
| "DPU0": { | ||
| "bus_info": "XXXX:XX:XX.X" | ||
| }, | ||
| "DPU1": { | ||
|
||
| } | ||
| }, | ||
| ``` | ||
| ``` | ||
| PCIE_DETACH_INFO STATE_DB TABLE | ||
|
|
||
| "PCIE_DETACH_INFO|[DDDD:]BB:SS.F": { | ||
| "value": { | ||
| "dpu_state": "detaching", | ||
| "bus_info" : "[DDDD:]BB:SS.F" | ||
| } | ||
| } | ||
| ``` | ||
|
|
||
| These functions are called by chassisd when we perform admin state changes by changing config_db. The platform implementation should call these functions at the appropriate times during the admin state change process. | ||
|
|
||
| The implementation in chassisd will follow this sequence: | ||
|
|
||
| ```python | ||
| def set_admin_state(self, up): | ||
| if up: | ||
| module.set_admin_state(up) | ||
| module.handle_pci_rescan() | ||
|
||
| self.handle_sensor_addition() | ||
| else: | ||
| self.handle_pci_removal() | ||
| self.handle_sensor_removal() | ||
| module.set_admin_state(down) | ||
| return | ||
| ``` | ||
| #### DPU shutdown sequence | ||
| * There could be two possible sources for DPU shutdown. 1. A configuration change to DPU "admin_status: down" 2. The GNOI logic can trigger it. | ||
vvolam marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| * In the first option the "admin_status: down" configDB status change event will trigger chassisd as it is subscribed to the event | ||
| * The switch PMON will invoke the module class API "set_admin_state(self, up):" with the state being "down" and the platform in turn will call its API to gracefully shutdown the DPU. | ||
| * The GNOI server runs on the DPU even after the DPU is pre-shutdown and listens until the graceful shutdown finishes. | ||
| * The host sends a GNOI signal to shutdown the DPU. The DPU does a graceful-shutdown if not already done and sends an ack back to the host. | ||
| * Upon receiving the ack or on a timeout the host may trigger the switch PMON vendor API to shutdown the DPU. | ||
| * If a vendor specific API is not defined, detachment is done via sysfs (echo 1 > /sys/bus/pci/devices/XXXX:XX:XX.X/remove). | ||
| * NPU-DPU (GNOI) soft reboot workflow is captured in [reboot-hld.md](https://github.com/sonic-net/SONiC/blob/26f3f4e282f3d2bd4a5c684608897850354f5c30/doc/smart-switch/reboot/reboot-hld.md) | ||
| * In the first option the "admin_status: down" configDB status change event will send a message to the switch PMON. | ||
| * The switch PMON will invoke the module class API "set_admin_state(self, up):" with the state being "down" and the platform in turn will call its API to gracefully shutdown the DPU. | ||
| * The PCIE device is added to `PCIE_DETACH_INFO` table and we remove the pcie device. If a vendor specific API is not defined, detachment is done via sysfs (echo 1 > /sys/bus/pci/devices/XXXX:XX:XX.X/remove). | ||
| * sensord is restarted if we need to ignore some sensors | ||
| * Vendor specific DPU shutdown is initiated | ||
| * The DPU upon receiving the shutdown message will do a graceful shutdown and send an ack back. The DPU graceful shutdown is vendor specific. The DPU power will be turned off after the graceful shutdown. In case of timeout the platform will force power down. | ||
| * The switch upon receiving the ack or on a timeout will remove the DPU from the bridge and PCIe tree. | ||
| * NPU-DPU (GNOI) soft reboot workflow is captured in [reboot-hld.md](https://github.com/sonic-net/SONiC/blob/26f3f4e282f3d2bd4a5c684608897850354f5c30/doc/smart-switch/reboot/reboot-hld.md) | ||
|
|
||
| ### DPU shutdown sequence diagram | ||
| <p align="center"><img src="./images/dpu-shutdown-seq.svg"></p> | ||
| <p align="center"><img src="./images/dpu-shutdown-seq.png"></p> | ||
vvolam marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
|
|
||
|
|
||
| ### Restart | ||
| #### Definition | ||
|
|
@@ -1175,5 +1313,7 @@ Note: | |
| Progress of FPD operation and any failures would be displayed on the console with appropriate levels of severity | ||
| ``` | ||
|
|
||
|
|
||
|
|
||
| ## 4. Test Plan | ||
| [Test Plan](https://github.com/nissampa/sonic-mgmt_dpu_test/blob/dpu_test_plan_draft_pr/docs/testplan/Smartswitch-test-plan.md) | ||
Uh oh!
There was an error while loading. Please reload this page.