Skip to content

[Mellanox] Enable CMIS host management#16846

Merged
liat-grozovik merged 28 commits intosonic-net:masterfrom
dbarashinvd:dbarashi_indep_mode_temp
Dec 7, 2023
Merged

[Mellanox] Enable CMIS host management#16846
liat-grozovik merged 28 commits intosonic-net:masterfrom
dbarashinvd:dbarashi_indep_mode_temp

Conversation

@dbarashinvd
Copy link
Copy Markdown
Contributor

@dbarashinvd dbarashinvd commented Oct 11, 2023

Why I did it

Enable CMIS host management for Mellanox devices which are expected to support the feature

Work item tracking
  • Microsoft ADO (number only):

How I did it

new thread in a new file and changing logic in platform code in chassis.py which is calling this thread from get_change_event()
this thread in the new file handles the state machine per port.
first the static detection takes place once the thread is up (during switch bootup sequence), until final decision if it's FW control or SW control module.
After it ends, the dynamic detection takes place, listening to changes in the sysfs fds, per port,
so it will be able to detect plug in or out events of a cable.

How to verify it

  • Enhanced unit tests
  • Run sonic mgmt on Nvidia SN4700 with CMIS host management enabled

Which release branch to backport (provide reason below if selected)

  • 201811
  • 201911
  • 202006
  • 202012
  • 202106
  • 202111
  • 202205
  • 202211
  • 202305
  • 202311

Tested branch (Please provide the tested image version)

Description for the changelog

Link to config_db schema for YANG module changes

A picture of a cute animal (not mandatory but encouraged)

with final decision per port if it is FW or SW control
workaround for FW issue of eeprom access blocked for passive cables
added common code in functions
added power_good sysfs countinga and poll dummy read
added chassis thread destructor commented code
@dbarashinvd dbarashinvd requested a review from lguohan as a code owner October 11, 2023 16:00
@dbarashinvd dbarashinvd marked this pull request as draft October 11, 2023 16:00
Copy link
Copy Markdown
Contributor

@prgeor prgeor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dbarashinvd is there a HLD for this? Not able to understand the flow.

@dbarashinvd dbarashinvd changed the title independent module feature CMIS management feature Oct 23, 2023
@liat-grozovik liat-grozovik changed the title CMIS management feature [Mellanox] Enable CMIS host management Oct 30, 2023
@liat-grozovik
Copy link
Copy Markdown
Collaborator

@prgeor kindly reminder to approve and merge this PR in case no additional feedback

@dbarashinvd dbarashinvd marked this pull request as ready for review November 20, 2023 07:37

MAX_EEPROM_ERROR_RESET_RETRIES = 4

class ModulesMgmtTask(threading.Thread):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dbarashinvd can you elaborate more on this thread in the PR description, please?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the main thread of this file, which does the state machine per port.
first the static detection that takes place once the thread is up (during switch bootup sequence).
and after it ends the dynamic detection takes place, listening to changes in the sysfs fds, per port.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this info also to the PR description.
please note that I updated the PR to fix some issues found recently.

module_sm_obj.set_final_state(STATE_HW_NOT_PRESENT)
return STATE_HW_NOT_PRESENT

def power_on_module(self, port, module_sm_obj, dynamic=False):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dbarashinvd where is this called?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all of these state machine functions are called from get_sm_func, which takes each time the next state to run (the next function).
you can see the list of function and its resolution in the get_sm_func function.
it's called both in the static detection and in the dynamic detection since basically it's the same flow on both, the flow that is run to detect the modules properly.
from a check that the cable is plugged in, to the power on check, and through power good check, power cap check, module type check and so on.
until final decision if it's FW control or SW control module.

@dbarashinvd dbarashinvd requested a review from prgeor December 5, 2023 08:39
@liat-grozovik liat-grozovik merged commit 000a2ef into sonic-net:master Dec 7, 2023
keboliu pushed a commit to keboliu/sonic-buildimage that referenced this pull request Dec 19, 2023
- Why I did it
Enable CMIS host management for Mellanox devices which are expected to support the feature

- How I did it
new thread in a new file and changing logic in platform code in chassis.py which is calling this thread from get_change_event()
this thread in the new file handles the state machine per port.
first the static detection takes place once the thread is up (during switch bootup sequence), until final decision if it's FW control or SW control module.
After it ends, the dynamic detection takes place, listening to changes in the sysfs fds, per port,
so it will be able to detect plug in or out events of a cable.

- How to verify it
Enhanced unit tests
run sonic mgmt on Nvidia SN4700 with CMIS host management enabled
@mssonicbld
Copy link
Copy Markdown
Collaborator

@dbarashinvd PR conflicts with 202311 branch

Junchao-Mellanox pushed a commit to Junchao-Mellanox/sonic-buildimage that referenced this pull request Jan 5, 2024
- Why I did it
Enable CMIS host management for Mellanox devices which are expected to support the feature

- How I did it
new thread in a new file and changing logic in platform code in chassis.py which is calling this thread from get_change_event()
this thread in the new file handles the state machine per port.
first the static detection takes place once the thread is up (during switch bootup sequence), until final decision if it's FW control or SW control module.
After it ends, the dynamic detection takes place, listening to changes in the sysfs fds, per port,
so it will be able to detect plug in or out events of a cable.

- How to verify it
Enhanced unit tests
run sonic mgmt on Nvidia SN4700 with CMIS host management enabled
yxieca pushed a commit that referenced this pull request Jan 5, 2024
- Why I did it
Enable CMIS host management for Mellanox devices which are expected to support the feature

- How I did it
new thread in a new file and changing logic in platform code in chassis.py which is calling this thread from get_change_event()
this thread in the new file handles the state machine per port.
first the static detection takes place once the thread is up (during switch bootup sequence), until final decision if it's FW control or SW control module.
After it ends, the dynamic detection takes place, listening to changes in the sysfs fds, per port,
so it will be able to detect plug in or out events of a cable.

- How to verify it
Enhanced unit tests
run sonic mgmt on Nvidia SN4700 with CMIS host management enabled

Co-authored-by: dbarashinvd <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants