Skip to content

[cmis] add read only cache decorator#562

Merged
keboliu merged 15 commits intosonic-net:masterfrom
jianyuewu:cmis_cache
Jun 18, 2025
Merged

[cmis] add read only cache decorator#562
keboliu merged 15 commits intosonic-net:masterfrom
jianyuewu:cmis_cache

Conversation

@jianyuewu
Copy link
Contributor

@jianyuewu jianyuewu commented May 8, 2025

Improve CMIS startup performance by adding a read-only cache decorator.

Description

This patch introduces read only cache decorator, which caches the results of static EEPROM-read methods in CmisApi to avoid redundant I2C transactions. By memoizing these methods, we greatly reduce the number of EEPROM reads during startup. Unit tests have been added to verify correct caching behavior and cache invalidation.
Some problematic cable scenarios:
If user could have issued reset since some fields in EEPROM were not accessible initially (the cable has some problem in this case, first plugin data read is incorrect, after re-plugin, then data will be correct.) In this case, with cache enabled, there is a WA: we can reset the sfp by xcvrd restart, then data will be correct.

Motivation and Context

On systems managing up to 512 CMIS ports, the cumulative cost of repeated EEPROM reads at startup causes noticeable delays. Caching static getter methods eliminates unnecessary I2C calls and speeds up initialization.

How Has This Been Tested?

Test with config reload, speed up around 2 seconds.
Before:

$ time sudo config reload -y
Acquired lock on /etc/sonic/reload.lock
Disabling container and routeCheck monitoring ...
Stopping SONiC target ...
Running command: /usr/local/bin/sonic-cfggen -j /etc/sonic/init_cfg.json -j /etc/sonic/config_db.json --write-to-db
Running command: /usr/local/bin/db_migrator.py -o migrate
Running command: /usr/local/bin/sonic-cfggen -d -y /etc/sonic/sonic_version.yml -t /usr/share/sonic/templates/sonic-environment.j2,/etc/sonic/sonic-environment
Restarting SONiC target ...
Enabling container and routeCheck monitoring ...
Reloading Monit configuration ...
Reinitializing monit daemon
Released lock on /etc/sonic/reload.lock

real    0m22.324s
user    0m1.388s
sys     0m0.361s

After:

$ time sudo config reload -y -f                                                                                                                                                                                                       
Acquired lock on /etc/sonic/reload.lock                                                                                                                                                                                                                   
Disabling container and routeCheck monitoring ...                                                                                                                                                                                                         
Stopping SONiC target ...                                                                                                                                                                                                                                 
Running command: /usr/local/bin/sonic-cfggen -j /etc/sonic/init_cfg.json -j /etc/sonic/config_db.json --write-to-db                                                                                                                                       
Running command: /usr/local/bin/db_migrator.py -o migrate                                                                                                                                                                                                 
Running command: /usr/local/bin/sonic-cfggen -d -y /etc/sonic/sonic_version.yml -t /usr/share/sonic/templates/sonic-environment.j2,/etc/sonic/sonic-environment                                                                                           
Restarting SONiC target ...                                                                                                                                                                                                                               
Enabling container and routeCheck monitoring ...                                                                                                                                                                                                          
Reloading Monit configuration ...                                                                                            
Reinitializing monit daemon                                                                                                  
Released lock on /etc/sonic/reload.lock                                                                                      
                                                                                                                             
real    0m20.811s                                                                                                            
user    0m1.367s                                                                                                             
sys     0m0.361s

jianyuewu added 4 commits May 8, 2025 09:07
Defined read only cache decorator after logger setup.
Cached return values of static get methods to reduce EEPROM reads.
Add unit tests.

Signed-off-by: Jianyue Wu <[email protected]>
Signed-off-by: Jianyue Wu <[email protected]>
@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Signed-off-by: Jianyue Wu <[email protected]>
@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

keboliu
keboliu previously approved these changes May 9, 2025
@jianyuewu jianyuewu changed the title Add read only cache decorator [cmis] Add read only cache decorator May 9, 2025
@jianyuewu jianyuewu changed the title [cmis] Add read only cache decorator [cmis] add read only cache decorator May 9, 2025
@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@jianyuewu
Copy link
Contributor Author

Some problematic cable scenarios:
If user could have issued reset since some fields in EEPROM were not accessible initially (the cable has some problem in this case, after re-plugin, then data will be correct.) In this case, with cache enabled, there is a WA: we can reset the sfp by xcvrd restart, then data will be correct.

keboliu
keboliu previously approved these changes Jun 13, 2025
Signed-off-by: Jianyue Wu <[email protected]>
@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@keboliu
Copy link
Collaborator

keboliu commented Jun 16, 2025

@prgeor @mihirpat1 would you please review and approve?

@keboliu keboliu merged commit 8ed2e16 into sonic-net:master Jun 18, 2025
5 checks passed
@mssonicbld
Copy link
Collaborator

Cherry-pick PR to 202505: #577

mssonicbld added a commit to mssonicbld/sonic-platform-daemons that referenced this pull request Jun 18, 2025
Depends on: sonic-net/sonic-platform-common#562

When an SFP is physically removed from a port, the SFP API object should be removed from the sfp_obj_dict to prevent stale object references and ensure proper cleanup. This change ensures that when the SFP status is detected as removed, the corresponding SFP API object is properly deleted from the dictionary.

<!-- Provide a general summary of your changes in the Title above -->

#### Description
<!--
     Describe your changes in detail
-->
Add check for SFP_STATUS_REMOVED in SfpStateUpdateTask.
Remove SFP API object from sfp_obj_dict when SFP is removed.
Add unit test to verify SFP object removal functionality.

#### Motivation and Context
<!--
     Why is this change required? What problem does it solve?
     If this pull request closes/resolves an open Issue, make sure you
     include the text "fixes #xxxx", "closes #xxxx" or "resolves #xxxx" here
-->
Make sure SFP API object is removed when SFP removed, so cmis cache can also be re-created.

#### How Has This Been Tested?
<!--
     Please describe in detail how you tested your changes.
     Include details of your testing environment, and the tests you ran to
     see how your change affects other areas of the code, etc.
-->
Tested with config reload.
mssonicbld added a commit to sonic-net/sonic-platform-daemons that referenced this pull request Jun 18, 2025
Depends on: sonic-net/sonic-platform-common#562

When an SFP is physically removed from a port, the SFP API object should be removed from the sfp_obj_dict to prevent stale object references and ensure proper cleanup. This change ensures that when the SFP status is detected as removed, the corresponding SFP API object is properly deleted from the dictionary.

<!-- Provide a general summary of your changes in the Title above -->

#### Description
<!--
 Describe your changes in detail
-->
Add check for SFP_STATUS_REMOVED in SfpStateUpdateTask.
Remove SFP API object from sfp_obj_dict when SFP is removed.
Add unit test to verify SFP object removal functionality.

#### Motivation and Context
<!--
 Why is this change required? What problem does it solve?
 If this pull request closes/resolves an open Issue, make sure you
 include the text "fixes #xxxx", "closes #xxxx" or "resolves #xxxx" here
-->
Make sure SFP API object is removed when SFP removed, so cmis cache can also be re-created.

#### How Has This Been Tested?
<!--
 Please describe in detail how you tested your changes.
 Include details of your testing environment, and the tests you ran to
 see how your change affects other areas of the code, etc.
-->
Tested with config reload.
@r12f
Copy link

r12f commented Jun 18, 2025

hi @jianyuewu , do you mind to help create manual cherry-pick PR to 202412?

mssonicbld added a commit to mssonicbld/sonic-platform-daemons.msft that referenced this pull request Jun 18, 2025
Depends on: sonic-net/sonic-platform-common#562

When an SFP is physically removed from a port, the SFP API object should be removed from the sfp_obj_dict to prevent stale object references and ensure proper cleanup. This change ensures that when the SFP status is detected as removed, the corresponding SFP API object is properly deleted from the dictionary.

<!-- Provide a general summary of your changes in the Title above -->

#### Description
<!--
     Describe your changes in detail
-->
Add check for SFP_STATUS_REMOVED in SfpStateUpdateTask.
Remove SFP API object from sfp_obj_dict when SFP is removed.
Add unit test to verify SFP object removal functionality.

#### Motivation and Context
<!--
     Why is this change required? What problem does it solve?
     If this pull request closes/resolves an open Issue, make sure you
     include the text "fixes #xxxx", "closes #xxxx" or "resolves #xxxx" here
-->
Make sure SFP API object is removed when SFP removed, so cmis cache can also be re-created.

#### How Has This Been Tested?
<!--
     Please describe in detail how you tested your changes.
     Include details of your testing environment, and the tests you ran to
     see how your change affects other areas of the code, etc.
-->
Tested with config reload.
mssonicbld added a commit to Azure/sonic-platform-daemons.msft that referenced this pull request Jun 18, 2025
Depends on: sonic-net/sonic-platform-common#562

When an SFP is physically removed from a port, the SFP API object should be removed from the sfp_obj_dict to prevent stale object references and ensure proper cleanup. This change ensures that when the SFP status is detected as removed, the corresponding SFP API object is properly deleted from the dictionary.

<!-- Provide a general summary of your changes in the Title above -->

#### Description
<!--
 Describe your changes in detail
-->
Add check for SFP_STATUS_REMOVED in SfpStateUpdateTask.
Remove SFP API object from sfp_obj_dict when SFP is removed.
Add unit test to verify SFP object removal functionality.

#### Motivation and Context
<!--
 Why is this change required? What problem does it solve?
 If this pull request closes/resolves an open Issue, make sure you
 include the text "fixes #xxxx", "closes #xxxx" or "resolves #xxxx" here
-->
Make sure SFP API object is removed when SFP removed, so cmis cache can also be re-created.

#### How Has This Been Tested?
<!--
 Please describe in detail how you tested your changes.
 Include details of your testing environment, and the tests you ran to
 see how your change affects other areas of the code, etc.
-->
Tested with config reload.
@jianyuewu
Copy link
Contributor Author

hi @jianyuewu , do you mind to help create manual cherry-pick PR to 202412?

Yes, the PR is Azure/sonic-platform-common.msft#86

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants