Skip to content

[supervisor][sfm]Fix the issue of swss.sh shows backtrace when shutdown a SFM#18393

Merged
lguohan merged 1 commit intosonic-net:masterfrom
mlok-nokia:fix-sfm-shutdown-backtrace
Mar 30, 2024
Merged

[supervisor][sfm]Fix the issue of swss.sh shows backtrace when shutdown a SFM#18393
lguohan merged 1 commit intosonic-net:masterfrom
mlok-nokia:fix-sfm-shutdown-backtrace

Conversation

@mlok-nokia
Copy link
Copy Markdown
Contributor

Why I did it

On a Supervisor card of a VOQ chassis, when remove or shutdown a Fabric card, swss.sh shows Stacktrace for all related empty SFM slots in the syslog file. This PR fixes #18384

Work item tracking
  • Microsoft ADO (number only):

How I did it

In the asic_status.py, all empty SFM slots related swss.sh is in the waiting state to wait for the presence event of SFM -- SET operation. The subscriber event handler also includes the "DEL" operation when a SFM is shutdown/removal. When a SFM is shutdown, all empty slot's swss.sh also get the "DEL" event although it is not for them. In the "DEL" operation, the current implementation doesn't check if this "DEL" operation for them, and then they exit the wait state and proceed to docker-wait-any with wrong operation in the wrong slot. docker-wait0any raise the backtarce.

How to verify it

  1. In a chassis which has some empty SMF slot, remove or shutdown a SFM. There should not be related stacktrace shown in syslog

Which release branch to backport (provide reason below if selected)

  • 201811
  • 201911
  • 202006
  • 202012
  • 202106
  • 202111
  • 202205
  • 202211
  • 202305

Tested branch (Please provide the tested image version)

Description for the changelog

Link to config_db schema for YANG module changes

A picture of a cute animal (not mandatory but encouraged)

…wn a SFM.

Signed-off-by: mlok <marty.lok@nokia.com>
@mlok-nokia mlok-nokia requested a review from lguohan as a code owner March 19, 2024 17:19
@mlok-nokia
Copy link
Copy Markdown
Contributor Author

@arlakshm This PR fixed the backtarce which we saw in the syslog file when we test the shutdown a SFM. Please review it

@mlok-nokia
Copy link
Copy Markdown
Contributor Author

@judyjoseph @rlhui Please take a look this PR if it should be in the next 202205 image build. Thanks

Copy link
Copy Markdown
Contributor

@arlakshm arlakshm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm.

@lguohan lguohan added the Chassis for 202205 branch PRs needed for 202205 branch in msft repo label Mar 30, 2024
@lguohan lguohan merged commit a56cf79 into sonic-net:master Mar 30, 2024
mlok-nokia added a commit to mlok-nokia/sonic-buildimage that referenced this pull request Jun 5, 2024
…wn a SFM. (sonic-net#18393)

On a Supervisor card of a VOQ chassis, when remove or shutdown a Fabric card, swss.sh shows Stacktrace for all related empty SFM slots in the syslog file. This PR fixes sonic-net#18384

How I did it
In the asic_status.py, all empty SFM slots related swss.sh is in the waiting state to wait for the presence event of SFM -- SET operation. The subscriber event handler also includes the "DEL" operation when a SFM is shutdown/removal. When a SFM is shutdown, all empty slot's swss.sh also get the "DEL" event although it is not for them. In the "DEL" operation, the current implementation doesn't check if this "DEL" operation for them, and then they exit the wait state and proceed to docker-wait-any with wrong operation in the wrong slot. docker-wait0any raise the backtarce.

How to verify it
In a chassis which has some empty SMF slot, remove or shutdown a SFM. There should not be related stacktrace shown in syslog

Signed-off-by: mlok <marty.lok@nokia.com>
@mlok-nokia mlok-nokia deleted the fix-sfm-shutdown-backtrace branch September 27, 2024 15:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Chassis for 202205 branch PRs needed for 202205 branch in msft repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[chassis][sfm][swss] swss.sh shows backtrace for all empty SFM slots when shutdown or remove a SFM on Supervisor

3 participants