Skip to content

Add support to make determine/process reboot-cause services restartable#17220

Closed
anamehra wants to merge 2 commits intosonic-net:masterfrom
anamehra:anamehra/reboot_cause_restart
Closed

Add support to make determine/process reboot-cause services restartable#17220
anamehra wants to merge 2 commits intosonic-net:masterfrom
anamehra:anamehra/reboot_cause_restart

Conversation

@anamehra
Copy link
Copy Markdown
Contributor

@anamehra anamehra commented Nov 18, 2023

Why I did it

Fixes #16990

Requires: sonic-net/sonic-host-services#86

  1. determine-reboot-cause and process-reboot-cause service does not start If the database service fails to restart in the first attempt. Even if the Database service succeeds in next attempt, these reboot-cause services do not start.

  2. The process-reboot-cause service also do not restart if the docker or database service restarts, which leads to an empty reboot-cause history

  3. deploy-mg from sonic-mgmt also triggers the docker service restart. The restart of the docker service caused the issue stated in 2 above. The docker restart also triggers determine-reboot-cause to restart which creates an additional reboot-cause file in history and modifies the last reboot-cause.

This PR along with sonic-host-services PR 82 fixes these issues by making both processes start again when dependency meets after dependency failure, making both processes restart when the database service restarts, and preventing duplicate processing of the last reboot reason.

Work item tracking
  • Microsoft ADO 25892856

How I did it

  1. Modified systemd unit files to make determine-reboot-cause and process-reboot-cause services restartable when the database service restarts.
  2. On the restart, determine-reboot-cause service should not recreate a new reboot-cause entry in the database. Added check for first start or restart to skip entry for restart case.

How to verify it

On single ASIC pizza box:

  1. Installed the image and check reboot-cause history
  2. restart the database service and verify that determine-reboot-cause and process-reboot-cause services also restart. Verify that reboot-cause shows correct data and no new entry is created for restart.

On Chassis:

  1. Installed the image and check reboot-cause history
  2. restart the database service and verify that determine-reboot-cause and process-reboot-cause services also restart. Verify that reboot-cause shows correct data and no new entry is created for restart.
  3. Reboot LC. On Supervicor, stop database-chassis service.
    Let database service on LC fail the first time. determine-reboot-cause and process-reboot-cause would fail to start due to dependency failure
    start database-chassis on Supervisor. Database service on LC should now start successfully.
    Verify determine-reboot-cause and process-reboot-cause also starts
    Verify show reboot-cause history output

Which release branch to backport (provide reason below if selected)

  • 201811
  • 201911
  • 202006
  • 202012
  • 202106
  • 202111
  • 202205
  • 202211
  • 202305

Tested branch (Please provide the tested image version)

Description for the changelog

Add support to make determine/process reboot-cause services restartable

Link to config_db schema for YANG module changes

A picture of a cute animal (not mandatory but encouraged)

@anamehra anamehra requested a review from lguohan as a code owner November 18, 2023 00:58
@anamehra
Copy link
Copy Markdown
Contributor Author

Hi @abdosi , @gechiang , please review. Thanks

@gechiang
Copy link
Copy Markdown
Collaborator

MSFT ADO: 25892856

@gechiang
Copy link
Copy Markdown
Collaborator

@anamehra , PR Tests are failing... Can you please take a look and address the failures.

Copy link
Copy Markdown
Contributor

@prgeor prgeor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@anamehra we probably don't need this change provided we follow same approach of sonic-net/sonic-host-services#86 in the unit service file of src/sonic-host-services-data/debian/sonic-host-services-data.determine-reboot-cause.service

@prgeor
Copy link
Copy Markdown
Contributor

prgeor commented Nov 22, 2023

@anamehra @gechiang lets discuss a better solution as suggested above before merging

@anamehra
Copy link
Copy Markdown
Contributor Author

@anamehra we probably don't need this change provided we follow same approach of sonic-net/sonic-host-services#86 in the unit service file of src/sonic-host-services-data/debian/sonic-host-services-data.determine-reboot-cause.service

Hi @prgeor, process-reboot-cause service is timer timer-based simple service, and that is why these changes are required to make it restartable. I was testing an approach by removing timer logic and making the process the same as determine-reboot-cause but systemd does not work to start the service when it fails due to dependency failure when the database service fails.

@prgeor
Copy link
Copy Markdown
Contributor

prgeor commented Nov 22, 2023

@anamehra you can use this unit file which is a cleaner approach than modifying the reboot cause script file

image

Here is the systemd log showing skipping of additional runs of determine-reboot-cause service

image

@anamehra anamehra closed this Nov 22, 2023
@anamehra
Copy link
Copy Markdown
Contributor Author

The files are moved to host-services submodule. I will open a new PR in sonic-host-services repo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[chassis]: determine/process-reboot-cause services fail to start if database service fails in first run during boot

3 participants