[sfputil] Retrieve error-status for transceiver using sfp API#2953
[sfputil] Retrieve error-status for transceiver using sfp API#2953prgeor merged 3 commits intosonic-net:masterfrom
Conversation
Signed-off-by: Mihir Patel <[email protected]>
| """ | ||
| if port is None: | ||
| logical_port_list = natsort.natsorted(platform_sfputil.logical) | ||
| # Create a list containing the logical port names of all ports we're interested in |
There was a problem hiding this comment.
SFP platform API can be invoked form the pmon docker only but it is invoked from master, which causes the exception you mentioned below.
Regarding the starting index, I think spf.index should start from 1 which is assumed also in xcvrd. if it doesn’t mismatch with physical index on some platforms, I think it should be fixed in vendor’s platform API.
There was a problem hiding this comment.
I don't think we need to use the SFP index as it is internal to vendor platforms. SFP API can be invoked from the host too like it was done for presence and eeprom API in the same file. All the API's implemented in sfputil/main.py are using host files only and implementing this function in same way as other function will fix this issue.
There was a problem hiding this comment.
ok. you are probably correct according to @Junchao-Mellanox 's explaination.
There was a problem hiding this comment.
If this PR is going to be cherry-picked to 202211/202205, then, this will be an issue.
There was a problem hiding this comment.
@Junchao-Mellanox I just verified the fix in 202205 branch in Dell platforms and it worked fine. Could you please elaborate more on the issue that you expect?.
There was a problem hiding this comment.
In 202205/202211, SFP platform API can be invoked from the pmon container only. So, it will be an issue.
There was a problem hiding this comment.
If SFP platform API can be invoked from PMON container only, then existing CLI's like presence, eeprom would have failed.
From the earlier implementation, error description uses PMON container for invoking platform API which deviates from other CLI's implemenation.
From this PR, error description API uses the same method as other CLI's like presence, EEPROM where it executes from the host.
sonic-utilities/sfputil/main.py
Line 518 in d69432d
SFP platform API can be invoked from host too in 202205/202211.
I confirmed the same in Dell platforms.
root@sonic:~# show ver
SONiC Software Version: SONiC.202205.345706-611449dc8
SONiC OS Version: 11
Distribution: Debian 11.7
Kernel: 5.10.0-18-2-amd64
Build commit: 611449dc8
Build date: Thu Aug 24 13:38:00 UTC 2023
Built by: AzDevOps@vmss-soni001UN2
Platform: x86_64-dell_s6100_c2538-r0
HwSKU: Force10-S6100
ASIC: broadcom
ASIC Count: 1
ASIC Count: 1
Serial Number: CFXYZP2
Model Number: 0F6N2R
Hardware Revision: A03
Uptime: 03:07:14 up 2 days, 20:54, 1 user, load average: 1.25, 1.62, 1.58
Date: Mon 28 Aug 2023 03:07:14
root@sonic:~# python3
Python 3.9.2 (default, Feb 28 2021, 17:03:44)
[GCC 10.2.1 20210110] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>
>>> from sonic_platform.chassis import Chassis
>>> c=Chassis()
>>> c.get_sfp(32).get_presence()
True
>>> c.get_sfp(32).get_error_description()
'OK'
>>>
There was a problem hiding this comment.
This is a vendor specific behavior. Not all SFP API, but some of them can only be triggered in PMON container. That's why we have the logic to run "docker exec" there. It is for compatible with all vendors.
There was a problem hiding this comment.
If that's the case, i would encourage to not to use SFP index as those attributes are internal to vendor specific and invoke standard API to fetch the data. From my POV, if SFP API like can be invoked in master branch, i expect it to work in other branches too. Without this fix, it causes 5 seconds delay on Dell S6100 platform.
Why is this SFP API deviation between master and 202205/202211 branch?.
|
@Junchao-Mellanox @keboliu @stephenxs please approve if no concern |
|
@StormLiangMS - Can you please help with cherry-pick for this to 202305? |
|
@prgeor can I ask why you removed Request for 202205 branch? Nightly test is based on 202205 image. |
* [sfputil] Retrieve error-status for transceiver using sfp API Signed-off-by: Mihir Patel <[email protected]> * Resolved testcase failures * Improved code coverage --------- Signed-off-by: Mihir Patel <[email protected]>
What I did
The mismatch in 0 based or 1 based port numbering between sfp.index and physical_port number is causing the below issues while executing the "sfputil show error-status -hw" command
Issue details with an example of sfp.index being 1 based while physical_port being 0 based:
The sfp.index
sonic-utilities/sfputil/main.py
Line 904 in 7435b1c
Hence, for Ethernet0 (sfp.index = 1), there is no key-value defined in output_dict (dictionary of <sfp.index>:<error_status>) for physical_port = 0.
sonic-utilities/sfputil/main.py
Line 940 in 7435b1c
How I did it
I am now directly calling get_error_description using sfp object to retrieve the error description. This removes the dependency with sfp.index to retrieve error description for a port.
How to verify it
Ran platform_tests/sfp/test_sfputil.py::test_check_sfputil_error_status sonic-mgmt testcase and ensured it passes successfully.
Previous command output (if the output of a command-line utility has changed)
No change in CLI o/p. Also, platforms which didn't work with this CLI earlier, will now show a valid CLI output.
New command output (if the output of a command-line utility has changed)