[sfp-refactoring] xcvrd: add initial support for CMIS application initialization#217
[sfp-refactoring] xcvrd: add initial support for CMIS application initialization#217prgeor merged 14 commits intosonic-net:masterfrom
Conversation
|
This pull request introduces 3 alerts when merging f2b0adc into 6e4cf6f - view on LGTM.com new alerts:
|
|
This pull request introduces 1 alert when merging 999e156 into ddb22b9 - view on LGTM.com new alerts:
|
|
This pull request introduces 1 alert when merging a61387c into 1fd3d16 - view on LGTM.com new alerts:
|
|
This pull request introduces 1 alert when merging abe64de into 7b95211 - view on LGTM.com new alerts:
|
sonic-xcvrd/xcvrd/xcvrd.py
Outdated
| vendor_pn_str = transceiver_dict['model'] | ||
| vendor_key = (vendor_name_str.strip() + '#' + vendor_pn_str.strip()).upper() | ||
|
|
||
| for k, v in g_dict[CMIS_MEDIA_SETTINGS_KEY].items(): |
There was a problem hiding this comment.
I could see that this could is optic specific(vendor_name_str + part number)?.
Can it be made it as generic?.
This code skips application code for flat memory/media types other than QSFP_DD, so can we apply the settings for other media types directly?.
There was a problem hiding this comment.
Does all the CMIS vendors also need to specify "CMIS_MEDIA_SETTINGS" in media_settings.json in their platforms?.
There was a problem hiding this comment.
- If the attached xcvr is not specified in CMIS_MEDIA_SETTINGS, or the CMIS_MEDIA_SETTINGS is missing, the CMIS application selection defaults to be enabled
- The CMIS registers of application selection are at page 0x10 and 0x11, hence it's not available on modules with Flat Memory.
- The idea for module-specific controls is for certain modules that have paged memory support while application selection is not necessary
There was a problem hiding this comment.
you may include a condition to return failure for flat memory media. you can make use of the function defined in SFP refactored code. https://github.com/Azure/sonic-platform-common/blob/c8eceec53625b6c021b7c6619b2555d0c7a1b38d/sonic_platform_base/sonic_xcvr/api/public/cmis.py#L291
There was a problem hiding this comment.
"> 3. The idea for module-specific controls is for certain modules that have paged memory support while application selection is not necessary
"
How do you determine whether the specific media needs application selection?.
There was a problem hiding this comment.
- Thanks for feedback, although we already have a checker for flat media in the xcvrd, I agree it's better to have this in the cmis.py as well.
- Actually, the CMIS_MEDIA_SETTINGS is a backup plan for the faulty transceiver if any. And as of now, we don't really need this, please feel free to let me know if we should have it removed, thanks
There was a problem hiding this comment.
Just for a record, the CMIS_MEDIA_SETTINGS is already removed.
sonic-xcvrd/xcvrd/xcvrd.py
Outdated
|
|
||
| has_cmis = False | ||
| for sfp in platform_chassis.get_all_sfps(): | ||
| if sfp.get_port_type() in [sfp.SFP_PORT_TYPE_QSFPDD]: |
There was a problem hiding this comment.
more than just QSFPDD supports CMIS
There was a problem hiding this comment.
As of now, only QSFPDD (type=18h) is verified, we could circle back on OSFP(type=19h) once the sample is available
There was a problem hiding this comment.
Just for a record, the get_port_type() checker is already removed in the latest commit
sonic-xcvrd/xcvrd/xcvrd.py
Outdated
| has_cmis = True | ||
| break | ||
|
|
||
| if not has_cmis: |
There was a problem hiding this comment.
modules may be plugged in or removed dynamically
There was a problem hiding this comment.
The check here is to validate the cage type rather than the media type of the attached transceiver, hence it does not matter what's the module attached to right now.
The main purpose is to reduce the CPU/memory usage when the service is not appliable to the target unit.
e.g. For a box with only SFP and QSFP28 cages, the CMIS Manager will not be activated.
There was a problem hiding this comment.
Just for a record, the get_port_type() checker is already removed in the latest commit
sonic-xcvrd/xcvrd/xcvrd.py
Outdated
| if conf.get('application_selection', True): | ||
| self.log_notice("{}: application update for {}G, {}-lanes".format( | ||
| lport, int(speed/1000), len(lanes))) | ||
| ret = sfp.set_cmis_application(speed, len(lanes)) |
There was a problem hiding this comment.
How do we ensure that this is done after the ASIC SerDes taps are configured?
There was a problem hiding this comment.
If you're referring to the pre-emphasis configuration on the MAC, it's handled by the orchagent when the serdes tap parameters are uploaded to the APPL_DB, hence it's outside the scope of this PR, the serdes tap parameters are independent with this PR, and the framework of media_settings.json needs to be revised to support speed/mode switches between PAM4 and NRZ if necessary.
If you're referring to the pre-emphasis configuration on the PHY, it's not supported as of now, we could circle back on this when the module-specific parameters are available and needs to be applied to the CMIS PHY registers.
There was a problem hiding this comment.
Note: In the case of Broadocm ASIC, we have different value range for PAM4 and NRZ, hence having NRZ/PAM4 pre-emphasise applied to SerDes in PAM4/NRZ mode may or may not lead to checker failure in SAI, and in turn, the orchagent may or may not crash.
There was a problem hiding this comment.
My concern is that (per my understanding) the ASIC SerDes pre/main/post setitngs should be configured before the CMIS ApSel sequence is started. There doesn't seem to be anything that enforces this sequencing here.
There was a problem hiding this comment.
Yes, we need a synchronization mechanism between syncd/SAI and the xcvrd to gracefully switch in between the applications. Prince @ MSFT and I had this discussion yesterday, and he will be working on this, and we could deal with this in another PR
|
This pull request introduces 4 alerts when merging a1d8e10 into e53ff26 - view on LGTM.com new alerts:
|
|
|
||
| def subscribe_port_config_change(db_list=['CONFIG_DB']): | ||
| port_tbl_map = { | ||
| 'APPL_DB': swsscommon.APP_PORT_TABLE_NAME, |
There was a problem hiding this comment.
We'll listen to APPL_DB for the PORT CONFIG changes, and STATE_DB for XCVR INSERTION/REMOVAL
There was a problem hiding this comment.
I think we should listen to PORT_TABLE in STATE_TB instead of PORT_TABLE in APP_DB because, when you get SET event for a port in APP_DB, the orchagent has still not called the sai api to create the port yet. When SET event is received for a port in STATE_DB then its confirmed the PORT create has completed.
There was a problem hiding this comment.
The trick is, not all the fields in APPL_DB will be updated to STATE_DB by the orchagent as of now, it's part of synchronization issue between syncd and orchagent, it's outside the scope of this PR
|
This pull request introduces 4 alerts when merging 2bf4580 into e53ff26 - view on LGTM.com new alerts:
|
| PORT_REMOVE = 1 | ||
| def __init__(self, port_name, port_index, asic_id, event_type): | ||
| PORT_SET = 2 | ||
| PORT_DEL = 3 |
There was a problem hiding this comment.
it hard to differentiate the four actions here from the name, would you mind rephrasing?
There was a problem hiding this comment.
No, I could update the namings, but do you have any suggestions?
There was a problem hiding this comment.
While I could surely update these namings, but I don't know which one is better, could you please help me on this?
sonic-xcvrd/xcvrd/xcvrd.py
Outdated
|
|
||
| self.dbg_print("Stopped") | ||
|
|
||
| def task_run(self, y_cable_presence): |
There was a problem hiding this comment.
what's the purpose of having "y_cable_presence" here?
There was a problem hiding this comment.
Yep, you're right, I'll remove it
|
|
||
|
|
||
| def handle_port_config_change(sel, asic_context, stop_event, port_mapping, logger, port_change_event_handler): | ||
| def handle_port_config_change(sel, asic_context, stop_event, port_mapping, logger, port_change_event_handler, promiscuous=False): |
There was a problem hiding this comment.
please add a separate handler for handling PORT_TABLE update events in APP_DB
There was a problem hiding this comment.
Please check the latest commit, the requested changes are already done
| else: | ||
| return None | ||
|
|
||
| def subscribe_port_config_change(db_list=['CONFIG_DB']): |
There was a problem hiding this comment.
please add another function to subscribe
subscribe_port_table_update
There was a problem hiding this comment.
It's subscribe_port_update_events() in the latest commit, do we need to have it rephrased to subscribe_port_table_update()?
| port_mapping.get_logical_to_physical(key)[0], | ||
| asic_context[port_tbl], | ||
| PortChangeEvent.PORT_REMOVE) | ||
| if promiscuous: |
There was a problem hiding this comment.
what is the promiscuous check for?
There was a problem hiding this comment.
This is a dead code that's already removed, please check the latest commit
sonic-xcvrd/xcvrd/xcvrd.py
Outdated
| has_cmis = False | ||
| for sfp in platform_chassis.get_all_sfps(): | ||
| try: | ||
| ptype = sfp.get_port_type() |
There was a problem hiding this comment.
please remove this check for checking the cage type. Don't want platforms to implement this new API for enabling CMIS datapath initialization. Its OK for one thread running and doing nothing (CMIS state will be in failed state for those platform having only SFP).
There was a problem hiding this comment.
It's already removed in the earlier commit, please check it out
|
|
||
| def subscribe_port_config_change(db_list=['CONFIG_DB']): | ||
| port_tbl_map = { | ||
| 'APPL_DB': swsscommon.APP_PORT_TABLE_NAME, |
There was a problem hiding this comment.
I think we should listen to PORT_TABLE in STATE_TB instead of PORT_TABLE in APP_DB because, when you get SET event for a port in APP_DB, the orchagent has still not called the sai api to create the port yet. When SET event is received for a port in STATE_DB then its confirmed the PORT create has completed.
sonic-xcvrd/xcvrd/xcvrd.py
Outdated
|
|
||
| self.port_dict[lport]['cmis_state'] = self.CMIS_STATE_DP_DEINIT | ||
| elif state == self.CMIS_STATE_DP_DEINIT: | ||
| if api.get_module_state() != 'ModuleReady': |
There was a problem hiding this comment.
we need timeout, if the module does NOT settle for 'ModuleReady' see page 1 byte 167
| done = False | ||
| continue | ||
| if not done: | ||
| continue |
There was a problem hiding this comment.
please bail out if not ConfigSuccess, otherwise we will keep continuing even though the configuration has failed.
| done = False | ||
| continue | ||
| if not done: | ||
| continue |
There was a problem hiding this comment.
please bail out if datapath init failed. There is a timeout in page 1, byte 144
There was a problem hiding this comment.
This information is not available in the mem map as of now, hence we'll use a default expiration time for now, let's do the fine-grained expiration in phase 2 (DPB support)
|
|
||
| # D.2.2 Software Deinitialization | ||
| api.set_datapath_deinit(host_lanes) | ||
| api.set_lpmode(True) |
There was a problem hiding this comment.
please make sure module indeed transitioned to low power.
There was a problem hiding this comment.
Done, but it looks like one of my CMIS4 optics will never be transitioned to ModuleLowPwr, hence I add 'ModuleReady' in the following checker
sonic-xcvrd/xcvrd/xcvrd.py
Outdated
| if not done: | ||
| continue |
There was a problem hiding this comment.
please bail out if the datapath has not transitioned to DataPathActivated . see page 1 , byte 168 for timeout for completing datapath tx turn ON.
commit a1d8e10 Author: Dante Su <[email protected]> Date: Thu Nov 18 12:45:01 2021 +0000 okay Signed-off-by: Dante Su <[email protected]> commit 11b6850 Author: Dante Su <[email protected]> Date: Fri Nov 12 07:39:34 2021 +0000 Add custom media setting for CMIS Signed-off-by: Dante Su <[email protected]> commit 6dd3608 Author: Dante Su <[email protected]> Date: Tue Nov 9 11:34:10 2021 +0000 add missing unittest for CMIS Signed-off-by: Dante Su <[email protected]> commit ceea223 Author: Dante Su <[email protected]> Date: Tue Nov 9 03:34:59 2021 +0000 fix port remove event Signed-off-by: Dante Su <[email protected]> commit 9be5135 Author: Dante Su <[email protected]> Date: Tue Nov 9 01:46:27 2021 +0000 port_mapping: drop the hacks for CMIS Signed-off-by: Dante Su <[email protected]> commit 501134b Author: Dante Su <[email protected]> Date: Fri Nov 5 09:08:59 2021 +0000 [sfp-refactoring] xcvrd: add initial support for CMIS application initialization Signed-off-by: Dante Su <[email protected]>
Signed-off-by: Dante Su <[email protected]>
Signed-off-by: Dante Su <[email protected]>
Signed-off-by: Dante Su <[email protected]>
Signed-off-by: Dante Su <[email protected]>
Signed-off-by: Dante Su <[email protected]>
Signed-off-by: Dante Su <[email protected]>
… type checker Signed-off-by: Dante Su <[email protected]>
Signed-off-by: Dante Su <[email protected]>
Signed-off-by: Dante Su <[email protected]>
Signed-off-by: Dante Su <[email protected]>
Signed-off-by: Dante Su <[email protected]>
Signed-off-by: Dante Su <[email protected]>
Signed-off-by: Dante Su <[email protected]>
|
@prgeor are we planning to to take this SFP refractoring into 202111, Is this needed for 400G in Chassis ? |
|
This change is not needed if the default 400G application is needed. |
…e 1.2 (sonic-net#217) This release goes in sync with the following firmware version of Broadcom Y cable, which is consistent with release 6 { "version_nic_active": "D203.9.D103.1", "version_nic_inactive": "D203.9.D103.1", "version_nic_next": "D203.9.D103.1", "version_peer_active": "D302.8", "version_peer_inactive": "D302.8", "version_peer_next": "D302.8", "version_self_active": "D302.8", "version_self_inactive": "D302.8", "version_self_next": "D302.8" } Signed-off-by: vaibhav-dahiya [email protected] Description Basically a vendor specific implementation of abstract YCableBase class . detailed design discussion can be found https://github.com/Azure/SONiC/pull/757/files Motivation and Context to support the Y-Cable API required to support Broadcom's Y-Cable. Signed-off-by: vaibhav-dahiya <[email protected]>
Signed-off-by: Dante Su [email protected]
Why I did it
Add initial support for CMIS application initialization
How I did it
Add CmisManagerTask to handle the per-port CMIS application initialization in parallel
How to verify it
Which release branch to backport (provide reason below if selected)
Description for the changelog
A picture of a cute animal (not mandatory but encouraged)