Separate periodic v/s fixed EEPROM reads between threads and optimize xcvrd boot-up time#360
Conversation
… xcvrd boot-up time Signed-off-by: Mihir Patel <patelmi@microsoft.com>
… xcvrd boot-up time Signed-off-by: Mihir Patel <patelmi@microsoft.com>
|
@keboliu @Junchao-Mellanox @shyam77git please review |
| post_port_pm_info_to_db(logical_port_name, self.port_mapping, self.xcvr_table_helper.get_pm_tbl(asic_index), self.task_stopping_event, pm_info_cache=pm_info_cache) | ||
| except (KeyError, TypeError): | ||
| #continue to process next port since execption could be raised due to port reset, transceiver removal | ||
| continue |
There was a problem hiding this comment.
should we log something here?
There was a problem hiding this comment.
I have now added a warning message for each of the 3 functions.
|
|
||
| # Do not notify media settings during warm reboot to avoid dataplane traffic impact | ||
| if is_warm_start == False: | ||
| notify_media_setting(logical_port_name, transceiver_dict, xcvr_table_helper.get_app_port_tbl(asic_index), port_mapping) |
There was a problem hiding this comment.
does cmis manager depend on notify_media_setting? Before the change, notify_media_setting is always called before starting cmis manager; after the change, notify_media_setting may start in parallel of cmis manager.
There was a problem hiding this comment.
No, it doesn't seem that CMIS manager depends on notify_media_setting.
|
@keboliu @Junchao-Mellanox do you mind testing this change in your platform. @liat-grozovik vis |
|
@mihirpat1 can u test this on Dell Z9332F and confirm the media settings are getting updated in APPL_DB by Xcvrd |
|
@mihirpat1 can u test during warm-boot on DellZ9332F Xcvrd does not post the media setting to APPL_DB |
Yes, I have now tested on X9332F and have confirmed that the media settings are being updated in APPL_DB. |
Yes, verified that xcvrd doesn't post the media setting to APPL_DB after warm-reboot. After warm-reboot
After cold-reboot
|
Will do |
@Junchao-Mellanox - Did you get a chance to test the changes? |
|
In progress, will update result in a day or two. |
Thank you! |
|
I have finished the test, looks good. |
Thank you! |
sonic-xcvrd/xcvrd/xcvrd.py
Outdated
| return event | ||
|
|
||
| # Update port sfp info and dom threshold in db during xcvrd bootup | ||
| def _post_port_sfp_info_dom_thr_to_db_during_xcvrd_bootup(self, port_mapping, xcvr_table_helper, stop_event=threading.Event()): |
There was a problem hiding this comment.
rename to ? post_port_sfp_info_and_dom_thr_to_db_once()
There was a problem hiding this comment.
I have addressed this now.
| transceiver_dict = {} | ||
| retry_eeprom_set = set() | ||
|
|
||
| warmstart = swsscommon.WarmStart() | ||
| warmstart.initialize("xcvrd", "pmon") | ||
| warmstart.checkWarmStart("xcvrd", "pmon", False) | ||
| is_warm_start = warmstart.isWarmStart() | ||
|
|
||
| # Post all the current interface sfp/dom threshold info to STATE_DB | ||
| logical_port_list = port_mapping.logical_port_list | ||
| for logical_port_name in logical_port_list: | ||
| if stop_event.is_set(): | ||
| break | ||
|
|
||
| # Get the asic to which this port belongs | ||
| asic_index = port_mapping.get_asic_id_for_logical_port(logical_port_name) | ||
| if asic_index is None: | ||
| helper_logger.log_warning("Got invalid asic index for {}, ignored while posting SFP info during boot-up".format(logical_port_name)) | ||
| continue | ||
| rc = post_port_sfp_info_to_db(logical_port_name, port_mapping, xcvr_table_helper.get_intf_tbl(asic_index), transceiver_dict, stop_event) | ||
| if rc != SFP_EEPROM_NOT_READY: | ||
| post_port_dom_threshold_info_to_db(logical_port_name, port_mapping, xcvr_table_helper.get_dom_threshold_tbl(asic_index), stop_event) | ||
|
|
||
| # Do not notify media settings during warm reboot to avoid dataplane traffic impact | ||
| if is_warm_start == False: |
There was a problem hiding this comment.
@mihirpat1 I think we can further break down this fn into two where notif media setting is called and the 2nd part is called after CMIS init is completed. May be a TODO for future?
There was a problem hiding this comment.
Sure, have noted this to address while working on #356.
|
@yxieca - It will be great if you can help in merging this to 202205 ADO is 23051688 |
… xcvrd boot-up time (#360) * Separate periodic v/s fixed EEPROM reads between threads and optimize xcvrd boot-up time Signed-off-by: Mihir Patel <patelmi@microsoft.com>
…pulled while a bulk get method is interrogating said module (sonic-net#360)
Description
It seems that currently, xcvrd consumes a lot of time (in the order of minutes) during the init phase to update the STATE_DB with information from transceiver EEPROM followed by spawning threads.
This initial update to state DB is a blocking call in the xcvrd main thread and hence, it increases the overall time to spawn threads. Hence, we need to ensure that CMIS init executes soon after xcvrd is spawned and parallelly proceed with STATE_DB updates with the relevant thread.
We also need to separate periodic v/s fixed EEPROM reads between threads to ensure that the DomInfoUpdateTask thread polls information from EEPROM which changes periodically and SfpStateUpdateTask updates STATE_DB with information which remains static after transceiver detection.
Motivation and Context
How Has This Been Tested?
Ensured that the o/p of transceiver EEPROM related CLIs (based on using data through redis-db) still displays the o/p without any changes to the previous behavior. This has been verified for 400ZR, 400G, 100G and 10G optics.
The below CLI o/p was captured before and after transceiver removal and insertion to ensure that the o/p is displayed correctly.
show int transceiver info $port_num
show int transceiver eeprom -d $port_num
show int transceiver pm $port_num
redis-cli -n 6 hgetall "TRANSCEIVER_INFO|$port_num
redis-cli -n 6 hgetall "TRANSCEIVER_DOM_SENSOR|$port_num
redis-cli -n 6 hgetall "TRANSCEIVER_DOM_THRESHOLD|$port_num
redis-cli -n 6 hgetall "TRANSCEIVER_PM|$port_num
redis-cli -n 6 hgetall "TRANSCEIVER_STATUS|$port_num
o/p for 10G transceiver removal and insertion
Additional Information (Optional)