Skip to content

[chassis][midplane] Add notification to Supervisor when LC is graceful reboot#3292

Merged
rlhui merged 2 commits intosonic-net:masterfrom
mlok-nokia:midplane-connectivity-log
May 15, 2024
Merged

[chassis][midplane] Add notification to Supervisor when LC is graceful reboot#3292
rlhui merged 2 commits intosonic-net:masterfrom
mlok-nokia:midplane-connectivity-log

Conversation

@mlok-nokia
Copy link
Copy Markdown
Contributor

@mlok-nokia mlok-nokia commented Apr 26, 2024

What I did

Modify the "sudo reboot" script to notify the Supervisor card by creating/inserting CHASSIS_MODULE_REBOOT_INFO_TABLE|LINE-CARD#" entry to CHASSIS_STATE_DB when reboot command is issued on the Linecard. This provides the sufficient information to allow Supervisor to log a proper message to address issue sonic-net/sonic-buildimage#18540

How I did it

Add a new function linecard_reboot_notity_supervisor() to the reboot script. If this platform is a linecard in a chassis, call sonic-db-cli to add a "CHASSIS_MODULE_REBOOT_INFO_TABLE|LINE-CARD#" to the CHASSIS_STATE_DB. This provides the information to chassisd on Supervisor card to log a proper message.
This PRs requires the following 2 PRs to address issue sonic-net/sonic-buildimage#18540 :
sonic-net/sonic-buildimage#18805
sonic-net/sonic-platform-daemons#480
sonic-net/sonic-buildimage#18862

This PR is needed by branch 202205

How to verify it

  1. Test expected log. Use the CLI command "sudo reboot" to reboot a linecard, then check the syslog on Supervisor. The below message is logged
Apr 25 19:44:40.818378 ixre-cpm-chassis7 WARNING pmon#chassisd: Expected: Module LINE-CARD0 lost midplane connectivity
  1. Test unepxpected log. Using "sudo /sbin/reboot" or reboot a linecard with any crash method, then ccheck the syslog on Supervusor. The below message is logged.
Apr 25 19:50:22.549416 ixre-cpm-chassis7 WARNING pmon#chassisd: Unexpected: Module LINE-CARD0 lost midplane connectivity
  1. Test the expexcted reboot with timeout case. Use the CLI command "sudo reboot" on linecard. and keep it down for more than 4 minutes. The below messages are logged.
Apr 25 01:25:53.877143 ixre-cpm-chassis7 WARNING sr_device_mgr: Unable to reach slot 1 (Linecard) via Midplane
Apr 25 01:25:58.402511 ixre-cpm-chassis7 WARNING pmon#chassisd: Module LINE-CARD0 went off-line!
Apr 25 01:26:01.658959 ixre-cpm-chassis7 WARNING pmon#chassisd: Expected: Module LINE-CARD0 lost midplane connectivity.
( 3 minutes after the first log)
Apr 25 01:29:10.259527 ixre-cpm-chassis7 WARNING pmon#chassisd: Unexpected: Module LINE-CARD0 midplane connectivity is not restored in 180 seconds

Previous command output (if the output of a command-line utility has changed)

NA

New command output (if the output of a command-line utility has changed)

NA

…l reboot

Signed-off-by: mlok <marty.lok@nokia.com>
@mlok-nokia
Copy link
Copy Markdown
Contributor Author

@deepak-singhal0408 @judyjoseph This PR is for an issue of logging lost midplane connectivity log. Total 3 PRs. Please review them. Thanks

@abdosi
Copy link
Copy Markdown
Contributor

abdosi commented Apr 30, 2024

@bmridul please help review.

@abdosi abdosi requested a review from bmridul April 30, 2024 16:22
Comment thread scripts/reboot
…ntry in CHASSIS_STATE_DB

Signed-off-by: mlok <marty.lok@nokia.com>
@gechiang
Copy link
Copy Markdown
Contributor

gechiang commented May 3, 2024

@mlok-nokia ,
What is the dependency of this PR with "sonic-net/sonic-platform-daemons#480"?
If let's say we backport this to .msft repo 202205 branch but not the platorm-daemons PR (480), will there be any build issue or functionality issue? Reason I am asking this is because I don't think "sonic-net/sonic-platform-daemons#480" will be allowed to 202205 branch and since we don't have a .msft 202205 repo for this platform-deamons submodule, the complete bug fix will be incomplete for the community building with 202205... But we should be able tomake internal build with patch. Just want to make sure there are no negative impact to the rest of the community.
please confirm.
Thanks!

@mlok-nokia
Copy link
Copy Markdown
Contributor Author

@mlok-nokia , What is the dependency of this PR with "sonic-net/sonic-platform-daemons#480"? If let's say we backport this to .msft repo 202205 branch but not the platorm-daemons PR (480), will there be any build issue or functionality issue? Reason I am asking this is because I don't think "sonic-net/sonic-platform-daemons#480" will be allowed to 202205 branch and since we don't have a .msft 202205 repo for this platform-deamons submodule, the complete bug fix will be incomplete for the community building with 202205... But we should be able tomake internal build with patch. Just want to make sure there are no negative impact to the rest of the community. please confirm. Thanks!

There should not be any functionality impact. PR #3292 just create a entry in CHASSIS_STATE_DB for platform-daemon PR sonic-net/sonic-platform-daemons#480 to use. If Platform-daemon is not in the branch, The data in DB will not be used.

Copy link
Copy Markdown
Contributor

@judyjoseph judyjoseph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@judyjoseph
Copy link
Copy Markdown
Contributor

@kenneth-arista could you review as well

@rlhui rlhui merged commit 547d5ee into sonic-net:master May 15, 2024
@rlhui rlhui added the p0 label May 15, 2024
@gechiang
Copy link
Copy Markdown
Contributor

gechiang commented May 15, 2024

MSFT ADO: 28074312
@StormLiangMS , @yxieca , Please help review/approve this BUG FIX for chassis for 202305 and 202311 branches.
Thanks!

mssonicbld pushed a commit to mssonicbld/sonic-utilities that referenced this pull request May 15, 2024
…l reboot (sonic-net#3292)

* [chassis][midplane] Add notification to Supervisor when LC is graceful reboot

* Address review comment by adding log message when failed to create wentry in CHASSIS_STATE_DB

Signed-off-by: mlok <marty.lok@nokia.com>
@mssonicbld
Copy link
Copy Markdown
Collaborator

Cherry-pick PR to 202311: #3324

mssonicbld pushed a commit that referenced this pull request May 15, 2024
…l reboot (#3292)

* [chassis][midplane] Add notification to Supervisor when LC is graceful reboot

* Address review comment by adding log message when failed to create wentry in CHASSIS_STATE_DB

Signed-off-by: mlok <marty.lok@nokia.com>
rlhui pushed a commit to sonic-net/sonic-buildimage that referenced this pull request May 31, 2024
… for Nokia-IXR7250E platform (#18862)

This PR add the platform specified linecard_reboot_timeout value to the platform_evn.conf. It works PR sonic-net/sonic-platform-daemons#480 and sonic-net/sonic-utilities#3292 to address issue #18540

Signed-off-by: mlok <marty.lok@nokia.com>
@gechiang gechiang added the included in chassis for 202205 branch indicate that this PR got merged into the "chassis for 202205 branch" label Jun 13, 2024
@gechiang
Copy link
Copy Markdown
Contributor

@StormLiangMS , no more backport allowed for 202305?? can you help review/approve/deny the backport request to 202305?
Thanks!

arfeigin pushed a commit to arfeigin/sonic-utilities that referenced this pull request Jun 16, 2024
…l reboot (sonic-net#3292)

* [chassis][midplane] Add notification to Supervisor when LC is graceful reboot

* Address review comment by adding log message when failed to create wentry in CHASSIS_STATE_DB

Signed-off-by: mlok <marty.lok@nokia.com>
henrymao-zz pushed a commit to canonical/sonic-buildimage that referenced this pull request Jun 23, 2024
… for Nokia-IXR7250E platform (#18862)

This PR add the platform specified linecard_reboot_timeout value to the platform_evn.conf. It works PR sonic-net/sonic-platform-daemons#480 and sonic-net/sonic-utilities#3292 to address issue #18540

Signed-off-by: mlok <marty.lok@nokia.com>
nmoray pushed a commit to nmoray/sonic-utilities that referenced this pull request Jun 25, 2025
…l reboot (sonic-net#3292)

* [chassis][midplane] Add notification to Supervisor when LC is graceful reboot

* Address review comment by adding log message when failed to create wentry in CHASSIS_STATE_DB

Signed-off-by: mlok <marty.lok@nokia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

9 participants