Skip to content

[Mellanox] mlnx-sfpd init flow enhancement#3294

Merged
qiluo-msft merged 9 commits intosonic-net:masterfrom
keboliu:pr-mlnx-sfpd-fix
Aug 8, 2019
Merged

[Mellanox] mlnx-sfpd init flow enhancement#3294
qiluo-msft merged 9 commits intosonic-net:masterfrom
keboliu:pr-mlnx-sfpd-fix

Conversation

@stephenxs
Copy link
Collaborator

- What I did
change the mlnx-sfpd start flow to make it more robust

- How I did it

  1. Before start calling the SDK APIs, make sure SDK daemon was started, this was done by detecting "sdk_reay" existence.
  2. Before calling sx_api_host_ifc_trap_id_register_set, make sure switch has already been created inside SDK, this is done by detecting the existence of switch id.

- How to verify it
test whether mlnx-sfpd can successfully start in procedures like config reload, warm reboot.

- Description for the changelog
[Mellanox] mlnx-sfpd init flow enhancement

- A picture of a cute animal (not mandatory but encouraged)

@jleveque
Copy link
Contributor

jleveque commented Aug 6, 2019

Retest this please

qiluo-msft
qiluo-msft previously approved these changes Aug 7, 2019
@qiluo-msft qiluo-msft self-requested a review August 7, 2019 06:32
@qiluo-msft qiluo-msft dismissed their stale review August 7, 2019 06:32

Ask one more question

@stephenxs
Copy link
Collaborator Author

stephenxs commented Aug 7, 2019 via email

@qiluo-msft
Copy link
Collaborator

Yes. After warm-reboot the file exists.

I mean after warm-reboot and before SDK start, is the old file exist? If yes, mlnx-sfpd may think SDK ready but actually not ready.

@stephenxs
Copy link
Collaborator Author

stephenxs commented Aug 7, 2019 via email

@stephenxs
Copy link
Collaborator Author

retest this please

@qiluo-msft qiluo-msft merged commit d16ece2 into sonic-net:master Aug 8, 2019
@stephenxs stephenxs deleted the pr-mlnx-sfpd-fix branch August 8, 2019 21:57
yxieca pushed a commit that referenced this pull request Aug 14, 2019
* fix sfpd initialize issue
* fix review comments
* rephrase the output log
* fix retry counter
* change the retry time to 10, means set max waiting time 1024s
* fix mlnx-sfpd init flow with new solution
* [mlnx-sfpd] address comments
1. wait for 5 seconds * 30 times, 150 seconds totally. use constant wait time for each retry.
2. use try/except structure so that error can be handled in a graceful way
* [mlnx-sfpd] wait 5 seconds after SDK_DAEMON_READY_FILE exists to make sure SDK is fully up.
* [mlnx-sfpd]simplify initialization by using deinitialize on initializing failure
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants