Skip to content

[SmartSwitch] Extend implementation of the DPU chassis daemon.#563

Merged
prgeor merged 1 commit intosonic-net:masterfrom
oleksandrivantsiv:smart-switch-chassisd-select
Nov 20, 2024
Merged

[SmartSwitch] Extend implementation of the DPU chassis daemon.#563
prgeor merged 1 commit intosonic-net:masterfrom
oleksandrivantsiv:smart-switch-chassisd-select

Conversation

@oleksandrivantsiv
Copy link
Collaborator

Description

Extend the DPU chassis daemon to support subscriptions to data planes and control plane state changes.

Motivation and Context

This allows the optimization of CPU usage and removes the delay in state change processing compared to polling.

How Has This Been Tested?

New unit tests were added to cover new functionality.

Additional Information (Optional)

@oleksandrivantsiv oleksandrivantsiv marked this pull request as draft November 19, 2024 00:53
@oleksandrivantsiv oleksandrivantsiv marked this pull request as ready for review November 19, 2024 02:19
@prgeor prgeor merged commit b276e41 into sonic-net:master Nov 20, 2024
vvolam pushed a commit to vvolam/sonic-platform-daemons that referenced this pull request Jan 3, 2025
prgeor pushed a commit that referenced this pull request Feb 6, 2025
…evice is in detaching mode (#546)

* Skip logging the warning, if device is in detaching mode

* Add detach_info table and unittests

* Fix unit tests

* Increase code coverage

* Remove unused header import

* Fix dict get values

* Increase code coverage

* Increase test coverage

* [SmartSwitch] Extend implementation of the DPU chassis daemon. (#563)

* Addition of DPU Chassis for thermalctld (#564)

* [stormond] Added new dynamic field 'last_sync_time' to STATE_DB (#535)

* Added new dynamic field 'last_sync_time' that shows when STORAGE_INFO for disk was last synced to STATE_DB

* Moved 'start' message to actual starting point of the daemon

* Added functions for formatted and epoch time for user friendly time display

* Made changes per prgeor review comments

* Pivot to SysLogger for all logging

* Increased log level so that they are seen in syslogs

* Code coverage improvement

* [lag_id] Add lagid to free_list when LC absent for 30 minutes (#542)

When LC is absent for 30 minutes, the database cleanup kicks in. When LagId is released, it needs to be appended to the SYSTEM_LAG_IDS_FREE_LIST

This PR works with the following 2 PRs:
sonic-net/sonic-swss#3303
sonic-net/sonic-buildimage#20369

Signed-off-by: mlok <marty.lok@nokia.com>

* Fixed bug in chassisd causing incorrect number of ASICs in CHASSIS_STATE_DB (#560)

Fixed the bug in chassisd due to which incorrect number of ASICs were being pushed to CHASSIS_STATE_DB.

* thermalctld: Add support for fans on non-CPU modules (#555)

* thermalctld: Add support for fans on non-CPU modules

* Add module fan to unit tests

* Advanced Azure pipeline to Bookworm (#572)

Description
This PR advances the azure pipeline on sonic_platform_daemons from bullseye to bookworm. This fixes the issue where sonic-platform-daemons azp is having some issues due to upgrade to bookworm. See Pipelines - Run 20241210.8 logs for details.

* Take non-CMIS xcvrs out of lpmode in SFF Manager (#565)

Description
Fix non-CMIS transceivers in down state by bringing them out of low power mode in the SFF Manager Task.
This is intended to work together with the change in sonic-net/sonic-buildimage#20886.

Motivation and Context
Non-CMIS transceivers were not functioning correctly when put into Low Power mode. So XCVRD now brings them out of lpmode.

How Has This Been Tested?
Loaded an image containing this change alongside the change from sonic-net/sonic-buildimage#20886 on an Arista chassis containing a Clearwater2 linecard.
Verified that without this image some interfaces were in a down state but with the image all interfaces came up as expected.

* Added SmartSwitch support in chassisd and enabling chassisd  (#467)

Added SmartSwitch support in chassisd and enabling chassisd

* [chassis][psud] Move the PSU parent information generation to the loop run function from the initialization function (#576)

Description
Move the PSU parent information generation to the loop run function from the initialization function

Motivation and Context
Fixes #575

How Has This Been Tested?
Tested on Cisco chassis, the PHYSICAL_ENTITY_INFO|PSU * can be re-inserted after thermalctld restart.
And monitored the stated db for memory for hours, works well:

* [chassisd] Address the chassisd crash issue and add UT for it (#573)

Description
On Nokia platform, slot name of Supervisor is string "A" instead of a number. Using "int" to convert it could cause issue backtrace. We should use slot value to any checking without any conversion. This will fixes sonic-net/sonic-buildimage#21131

Motivation and Context
Modify the _get_module_info not to convert "slot" to a string value. And also modify the code not to convert slot value to an to do any checking. Just directly use the returned value of get_slot(). Also add UT test_moduleupdater_check_slot_string() to valid it.

How Has This Been Tested?
Tested on 202405 branch


Signed-off-by: mlok <marty.lok@nokia.com>

* Fix a comment

---------

Signed-off-by: mlok <marty.lok@nokia.com>
Co-authored-by: Oleksandr Ivantsiv <oivantsiv@nvidia.com>
Co-authored-by: Gagan Punathil Ellath <gpunathilell@nvidia.com>
Co-authored-by: Ashwin Srinivasan <93744978+assrinivasan@users.noreply.github.com>
Co-authored-by: Marty Y. Lok <76118573+mlok-nokia@users.noreply.github.com>
Co-authored-by: Vivek Verma <137406113+vivekverma-arista@users.noreply.github.com>
Co-authored-by: Patrick MacArthur <pmacarthur@arista.com>
Co-authored-by: Peter Bailey <peterbailey@arista.com>
Co-authored-by: rameshraghupathy <43161235+rameshraghupathy@users.noreply.github.com>
Co-authored-by: Jianquan Ye <jianquanye@microsoft.com>
lotus-nexthop pushed a commit to lotus-nexthop/sonic-platform-daemons that referenced this pull request Oct 28, 2025
* Refactor cmis eeprom i2c read

Change multiple eeprom i2c reads into a single i2c read across multiple addresses.
1. Update functions get_tx_power_flag, get_tx_bias_flag and get_rx_power_flag.
2. Add dedicated groups:
TX_POWER_ALARM_FLAGS_FIELD, TX_BIAS_ALARM_FLAGS_FIELD, RX_POWER_ALARM_FLAGS_FIELD.
3. Move the MEDIA_TYPE_FIELD read and prefix lookup out of the loop.
4. Performance test result:
Before:
Running performance tests (3000 iterations each)...
get_tx_power_flag: total 32.120s, avg 10.707ms per call
get_tx_bias_flag: total 26.635s, avg 8.878ms per call
get_rx_power_flag: total 32.306s, avg 10.769ms per call
After:
Running performance tests (3000 iterations each)...
get_tx_power_flag: total 8.213s, avg 2.738ms per call
get_tx_bias_flag: total 7.831s, avg 2.610ms per call
get_rx_power_flag: total 9.632s, avg 3.211ms per call

Signed-off-by: Jianyue Wu <jianyuew@nvidia.com>

* Refactor CMIS alarm flag parsing into generic helper

Add get_alarm_flags helper consolidating TX_POWER, RX_POWER, TX_BIAS flag logic.
Simplify get_tx_power_flag, get_rx_power_flag, get_tx_bias_flag to call helper.

Signed-off-by: Jianyue Wu <jianyuew@nvidia.com>

---------

Signed-off-by: Jianyue Wu <jianyuew@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants