Skip to content

[sfputil] Retrieve error-status for transceiver using sfp API#2953

Merged
prgeor merged 3 commits intosonic-net:masterfrom
mihirpat1:sfputil_error_status
Aug 30, 2023
Merged

[sfputil] Retrieve error-status for transceiver using sfp API#2953
prgeor merged 3 commits intosonic-net:masterfrom
mihirpat1:sfputil_error_status

Conversation

@mihirpat1
Copy link
Copy Markdown
Contributor

What I did

The mismatch in 0 based or 1 based port numbering between sfp.index and physical_port number is causing the below issues while executing the "sfputil show error-status -hw" command

  1. “sfputil show error-status -hw” command doesn’t show error-status for Ethernet0 or the last port for some platforms and the remaining ports show error-status of the next or previous port
  2. “sfputil show error-status -hw -p ” doesn’t display error-status
  3. platform_tests/sfp/test_sfputil.py::test_check_sfputil_error_status sonic-mgmt testcase fails for some platforms

Issue details with an example of sfp.index being 1 based while physical_port being 0 based:
The sfp.index

" errors=['{}:{}'.format(sfp.index, sfp.get_error_description()) for sfp in sfp_list]\n" \
is 1 based whereas the physical port numbering is 0 based (retried from port_config.ini file).
Hence, for Ethernet0 (sfp.index = 1), there is no key-value defined in output_dict (dictionary of <sfp.index>:<error_status>) for physical_port = 0.
output.append([port_name, output_dict.get(physical_port_list[0])])

How I did it

I am now directly calling get_error_description using sfp object to retrieve the error description. This removes the dependency with sfp.index to retrieve error description for a port.

How to verify it

Ran platform_tests/sfp/test_sfputil.py::test_check_sfputil_error_status sonic-mgmt testcase and ensured it passes successfully.

Previous command output (if the output of a command-line utility has changed)

No change in CLI o/p. Also, platforms which didn't work with this CLI earlier, will now show a valid CLI output.

New command output (if the output of a command-line utility has changed)

prgeor
prgeor previously approved these changes Aug 24, 2023
@prgeor prgeor added the Bug label Aug 24, 2023
"""
if port is None:
logical_port_list = natsort.natsorted(platform_sfputil.logical)
# Create a list containing the logical port names of all ports we're interested in
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SFP platform API can be invoked form the pmon docker only but it is invoked from master, which causes the exception you mentioned below.
Regarding the starting index, I think spf.index should start from 1 which is assumed also in xcvrd. if it doesn’t mismatch with physical index on some platforms, I think it should be fixed in vendor’s platform API.

Copy link
Copy Markdown
Contributor

@aravindmani-1 aravindmani-1 Aug 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to use the SFP index as it is internal to vendor platforms. SFP API can be invoked from the host too like it was done for presence and eeprom API in the same file. All the API's implemented in sfputil/main.py are using host files only and implementing this function in same way as other function will fix this issue.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok. you are probably correct according to @Junchao-Mellanox 's explaination.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this PR is going to be cherry-picked to 202211/202205, then, this will be an issue.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Junchao-Mellanox I just verified the fix in 202205 branch in Dell platforms and it worked fine. Could you please elaborate more on the issue that you expect?.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In 202205/202211, SFP platform API can be invoked from the pmon container only. So, it will be an issue.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If SFP platform API can be invoked from PMON container only, then existing CLI's like presence, eeprom would have failed.
From the earlier implementation, error description uses PMON container for invoking platform API which deviates from other CLI's implemenation.
From this PR, error description API uses the same method as other CLI's like presence, EEPROM where it executes from the host.

def load_platform_chassis():

SFP platform API can be invoked from host too in 202205/202211.
I confirmed the same in Dell platforms.
root@sonic:~# show ver

    SONiC Software Version: SONiC.202205.345706-611449dc8
    SONiC OS Version: 11
    Distribution: Debian 11.7
    Kernel: 5.10.0-18-2-amd64
    Build commit: 611449dc8
    Build date: Thu Aug 24 13:38:00 UTC 2023
    Built by: AzDevOps@vmss-soni001UN2
    
    Platform: x86_64-dell_s6100_c2538-r0
    HwSKU: Force10-S6100
    ASIC: broadcom
    ASIC Count: 1
    ASIC Count: 1
    Serial Number: CFXYZP2
    Model Number: 0F6N2R
    Hardware Revision: A03
    Uptime: 03:07:14 up 2 days, 20:54,  1 user,  load average: 1.25, 1.62, 1.58
    Date: Mon 28 Aug 2023 03:07:14
    root@sonic:~# python3
    Python 3.9.2 (default, Feb 28 2021, 17:03:44)
    [GCC 10.2.1 20210110] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>>
    >>> from sonic_platform.chassis import Chassis
    >>> c=Chassis()
    >>> c.get_sfp(32).get_presence()
    True
    >>> c.get_sfp(32).get_error_description()
    'OK'
    >>>

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a vendor specific behavior. Not all SFP API, but some of them can only be triggered in PMON container. That's why we have the logic to run "docker exec" there. It is for compatible with all vendors.

Copy link
Copy Markdown
Contributor

@aravindmani-1 aravindmani-1 Aug 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If that's the case, i would encourage to not to use SFP index as those attributes are internal to vendor specific and invoke standard API to fetch the data. From my POV, if SFP API like can be invoked in master branch, i expect it to work in other branches too. Without this fix, it causes 5 seconds delay on Dell S6100 platform.
Why is this SFP API deviation between master and 202205/202211 branch?.

Copy link
Copy Markdown
Collaborator

@stephenxs stephenxs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As commented

@prgeor
Copy link
Copy Markdown
Contributor

prgeor commented Aug 25, 2023

@Junchao-Mellanox @keboliu @stephenxs please approve if no concern

@prgeor prgeor merged commit 09daeb2 into sonic-net:master Aug 30, 2023
@mihirpat1
Copy link
Copy Markdown
Contributor Author

@StormLiangMS - Can you please help with cherry-pick for this to 202305?
ADO - 24867010

@ZhaohuiS
Copy link
Copy Markdown
Contributor

@prgeor can I ask why you removed Request for 202205 branch? Nightly test is based on 202205 image.

StormLiangMS pushed a commit that referenced this pull request Sep 3, 2023
* [sfputil] Retrieve error-status for transceiver using sfp API

Signed-off-by: Mihir Patel <[email protected]>

* Resolved testcase failures

* Improved code coverage

---------

Signed-off-by: Mihir Patel <[email protected]>
@prgeor
Copy link
Copy Markdown
Contributor

prgeor commented Sep 11, 2023

@prgeor can I ask why you removed Request for 202205 branch? Nightly test is based on 202205 image.

@ZhaohuiS this will break 202205. This issue is fixed differently in 202205

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants