Skip to content

Take non-CMIS xcvrs out of lpmode in SFF Manager#565

Merged
arlakshm merged 1 commit intosonic-net:masterfrom
peterbailey-arista:master-sff-lpmode
Dec 14, 2024
Merged

Take non-CMIS xcvrs out of lpmode in SFF Manager#565
arlakshm merged 1 commit intosonic-net:masterfrom
peterbailey-arista:master-sff-lpmode

Conversation

@peterbailey-arista
Copy link
Contributor

@peterbailey-arista peterbailey-arista commented Nov 21, 2024

Description

Fix non-CMIS transceivers in down state by bringing them out of low power mode in the SFF Manager Task.
This is intended to work together with the change in sonic-net/sonic-buildimage#20886.

Motivation and Context

Non-CMIS transceivers were not functioning correctly when put into Low Power mode. So XCVRD now brings them out of lpmode.

How Has This Been Tested?

Loaded an image containing this change alongside the change from sonic-net/sonic-buildimage#20886 on an Arista chassis containing a Clearwater2 linecard.
Verified that without this image some interfaces were in a down state but with the image all interfaces came up as expected.

Additional Information (Optional)

@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Nov 21, 2024

CLA Signed

The committers listed above are authorized under a signed CLA.

  • ✅ login: peterbailey-arista / name: Peter Bailey (94b96b6)

@kenneth-arista
Copy link

@arlakshm @wenyiz2021 for awareness

Copy link

@wenyiz2021 wenyiz2021 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kenneth-arista
Copy link

kenneth-arista commented Nov 22, 2024 via email

@prgeor
Copy link
Collaborator

prgeor commented Dec 1, 2024

@mihirpat1 can you please review

@prgeor prgeor requested a review from mihirpat1 December 1, 2024 14:26
# Skip if these essential routines are not available
continue

sfp.set_lpmode(False)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@peterbailey-arista I would suggest to use sfp.set_lpmode() only for SFPs that follow SFF8472. All other transceivers like QSFP+, QSFP28 can support lpmode via EEPROM write. The above code expects each platform to implement set_lpmode() even thought that is NOT required for QSFP based modules.

if (SFP type module) {
    sfp.set_lpmode(False)
} else {
    api = set_lpmode(False)
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After discussing with @byu343 I ended up wrapping api.set_lpmode in a try except instead. Please let me know if this update works for you. Thanks!


try:
api = sfp.get_xcvr_api()
api.set_lpmode(False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@peterbailey-arista Per the below implementation, an exception will not be returned for SFF-8472 modules. Can you please handle this accordingly.
https://github.com/sonic-net/sonic-platform-common/blob/0f2e22faccd093a1e5d18235fe119a860be7855e/sonic_platform_base/sonic_xcvr/api/public/sff8472.py#L308

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've now updated it to use sfp.set_lpmode() for only SFPs implementing SFF8472 as was originally suggested

continue

try:
api = sfp.get_xcvr_api()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

api has been already obtained at the code before:

api = sfp.get_xcvr_api()

continue

if isinstance(api, Sff8472Api):
sfp.set_lpmode(False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@peterbailey-arista Can we check for the return value in both the cases and log error if it returns False?
Also, can you please help in fixing the built failure?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the check with the new error log. But am not sure how to resolve the build failure, it does not seem to be related to my change. Do you have any suggestions?

@kenneth-arista
Copy link

/azpw run Azure.sonic-platform-daemons

@kenneth-arista
Copy link

Above build failure is unrelated to the change in this PR. The failure is caused by:

dpkg: error: cannot access archive 'libnl-3-200_*.deb': No such file or directory

@peterbailey-arista
Copy link
Contributor Author

/azpw run Azure.sonic-platform-daemons

@mssonicbld
Copy link
Collaborator

/AzurePipelines run Azure.sonic-platform-daemons

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

# Skip if these essential routines are not available
continue

set_lp_success = (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume lpmode handling is only needed at module insertion event, right?

Just some minor suggestion:
Maybe adding below condition check can avoid unnecessary lpmode handling in other cases (e.g. the case of admin_status/host_tx_ready getting changed by config interface shutdown/startup)

if xcvr_inserted:
    <lpmode logic>

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The described scenario unfortunately still requires lpmode set False. If you shutdown then startup the interface without bringing it out of lpmode the interface remains down even if it was up before shutdown

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought lpmode can get reset after module reset (sfputil reset), but interface shut/start (i.e. NPU/PHY/laser tx ON/OFF) wouldn't impact the module on lpmode/etc unless user/platform/vendor explicitly triggers something additional for module as part of the interface shut/start. Is that not the case here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My mistake, I believe you are correct. I'll add that change, thanks

@peterbailey-arista
Copy link
Contributor Author

/azpw run Azure.sonic-platform-daemons

@mssonicbld
Copy link
Collaborator

/AzurePipelines run Azure.sonic-platform-daemons

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@arlakshm
Copy link
Contributor

arlakshm commented Dec 6, 2024

/Azp Azure.sonic-platform-daemons

@azure-pipelines
Copy link

Command 'Azure.sonic-platform-daemons' is not supported by Azure Pipelines.

Supported commands
  • help:
    • Get descriptions, examples and documentation about supported commands
    • Example: help "command_name"
  • list:
    • List all pipelines for this repository using a comment.
    • Example: "list"
  • run:
    • Run all pipelines or specific pipelines for this repository using a comment. Use this command by itself to trigger all related pipelines, or specify specific pipelines to run.
    • Example: "run" or "run pipeline_name, pipeline_name, pipeline_name"
  • where:
    • Report back the Azure DevOps orgs that are related to this repository and org
    • Example: "where"

See additional documentation.

Fix non-CMIS transceivers in down state by bringing them out of
lpmode in the SFF Manager Task.
@arlakshm
Copy link
Contributor

@longhuan-cisco @mihirpat1, Can you please approve this change if all the comments are addressed.

Copy link
Contributor

@mihirpat1 mihirpat1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@peterbailey-arista The changes look good to me. Can you please help in resolving the build failure?

@arlakshm
Copy link
Contributor

/Azp run Azure.sonic-platform-daemons

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@arlakshm arlakshm merged commit 88d0dd7 into sonic-net:master Dec 14, 2024
mssonicbld pushed a commit to mssonicbld/sonic-platform-daemons that referenced this pull request Dec 21, 2024
Description
Fix non-CMIS transceivers in down state by bringing them out of low power mode in the SFF Manager Task.
This is intended to work together with the change in sonic-net/sonic-buildimage#20886.

Motivation and Context
Non-CMIS transceivers were not functioning correctly when put into Low Power mode. So XCVRD now brings them out of lpmode.

How Has This Been Tested?
Loaded an image containing this change alongside the change from sonic-net/sonic-buildimage#20886 on an Arista chassis containing a Clearwater2 linecard.
Verified that without this image some interfaces were in a down state but with the image all interfaces came up as expected.
@mssonicbld
Copy link
Collaborator

Cherry-pick PR to 202405: #581

mssonicbld pushed a commit that referenced this pull request Dec 21, 2024
Description
Fix non-CMIS transceivers in down state by bringing them out of low power mode in the SFF Manager Task.
This is intended to work together with the change in sonic-net/sonic-buildimage#20886.

Motivation and Context
Non-CMIS transceivers were not functioning correctly when put into Low Power mode. So XCVRD now brings them out of lpmode.

How Has This Been Tested?
Loaded an image containing this change alongside the change from sonic-net/sonic-buildimage#20886 on an Arista chassis containing a Clearwater2 linecard.
Verified that without this image some interfaces were in a down state but with the image all interfaces came up as expected.
vvolam pushed a commit to vvolam/sonic-platform-daemons that referenced this pull request Jan 3, 2025
Description
Fix non-CMIS transceivers in down state by bringing them out of low power mode in the SFF Manager Task.
This is intended to work together with the change in sonic-net/sonic-buildimage#20886.

Motivation and Context
Non-CMIS transceivers were not functioning correctly when put into Low Power mode. So XCVRD now brings them out of lpmode.

How Has This Been Tested?
Loaded an image containing this change alongside the change from sonic-net/sonic-buildimage#20886 on an Arista chassis containing a Clearwater2 linecard.
Verified that without this image some interfaces were in a down state but with the image all interfaces came up as expected.
prgeor pushed a commit that referenced this pull request Feb 6, 2025
…evice is in detaching mode (#546)

* Skip logging the warning, if device is in detaching mode

* Add detach_info table and unittests

* Fix unit tests

* Increase code coverage

* Remove unused header import

* Fix dict get values

* Increase code coverage

* Increase test coverage

* [SmartSwitch] Extend implementation of the DPU chassis daemon. (#563)

* Addition of DPU Chassis for thermalctld (#564)

* [stormond] Added new dynamic field 'last_sync_time' to STATE_DB (#535)

* Added new dynamic field 'last_sync_time' that shows when STORAGE_INFO for disk was last synced to STATE_DB

* Moved 'start' message to actual starting point of the daemon

* Added functions for formatted and epoch time for user friendly time display

* Made changes per prgeor review comments

* Pivot to SysLogger for all logging

* Increased log level so that they are seen in syslogs

* Code coverage improvement

* [lag_id] Add lagid to free_list when LC absent for 30 minutes (#542)

When LC is absent for 30 minutes, the database cleanup kicks in. When LagId is released, it needs to be appended to the SYSTEM_LAG_IDS_FREE_LIST

This PR works with the following 2 PRs:
sonic-net/sonic-swss#3303
sonic-net/sonic-buildimage#20369

Signed-off-by: mlok <[email protected]>

* Fixed bug in chassisd causing incorrect number of ASICs in CHASSIS_STATE_DB (#560)

Fixed the bug in chassisd due to which incorrect number of ASICs were being pushed to CHASSIS_STATE_DB.

* thermalctld: Add support for fans on non-CPU modules (#555)

* thermalctld: Add support for fans on non-CPU modules

* Add module fan to unit tests

* Advanced Azure pipeline to Bookworm (#572)

Description
This PR advances the azure pipeline on sonic_platform_daemons from bullseye to bookworm. This fixes the issue where sonic-platform-daemons azp is having some issues due to upgrade to bookworm. See Pipelines - Run 20241210.8 logs for details.

* Take non-CMIS xcvrs out of lpmode in SFF Manager (#565)

Description
Fix non-CMIS transceivers in down state by bringing them out of low power mode in the SFF Manager Task.
This is intended to work together with the change in sonic-net/sonic-buildimage#20886.

Motivation and Context
Non-CMIS transceivers were not functioning correctly when put into Low Power mode. So XCVRD now brings them out of lpmode.

How Has This Been Tested?
Loaded an image containing this change alongside the change from sonic-net/sonic-buildimage#20886 on an Arista chassis containing a Clearwater2 linecard.
Verified that without this image some interfaces were in a down state but with the image all interfaces came up as expected.

* Added SmartSwitch support in chassisd and enabling chassisd  (#467)

Added SmartSwitch support in chassisd and enabling chassisd

* [chassis][psud] Move the PSU parent information generation to the loop run function from the initialization function (#576)

Description
Move the PSU parent information generation to the loop run function from the initialization function

Motivation and Context
Fixes #575

How Has This Been Tested?
Tested on Cisco chassis, the PHYSICAL_ENTITY_INFO|PSU * can be re-inserted after thermalctld restart.
And monitored the stated db for memory for hours, works well:

* [chassisd] Address the chassisd crash issue and add UT for it (#573)

Description
On Nokia platform, slot name of Supervisor is string "A" instead of a number. Using "int" to convert it could cause issue backtrace. We should use slot value to any checking without any conversion. This will fixes sonic-net/sonic-buildimage#21131

Motivation and Context
Modify the _get_module_info not to convert "slot" to a string value. And also modify the code not to convert slot value to an to do any checking. Just directly use the returned value of get_slot(). Also add UT test_moduleupdater_check_slot_string() to valid it.

How Has This Been Tested?
Tested on 202405 branch


Signed-off-by: mlok <[email protected]>

* Fix a comment

---------

Signed-off-by: mlok <[email protected]>
Co-authored-by: Oleksandr Ivantsiv <[email protected]>
Co-authored-by: Gagan Punathil Ellath <[email protected]>
Co-authored-by: Ashwin Srinivasan <[email protected]>
Co-authored-by: Marty Y. Lok <[email protected]>
Co-authored-by: Vivek Verma <[email protected]>
Co-authored-by: Patrick MacArthur <[email protected]>
Co-authored-by: Peter Bailey <[email protected]>
Co-authored-by: rameshraghupathy <[email protected]>
Co-authored-by: Jianquan Ye <[email protected]>
@mssonicbld
Copy link
Collaborator

Cherry-pick PR to msft-202503: Azure/sonic-platform-daemons.msft#18

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants