Skip to content

Orchagent send heartbeat during warm-reboot to prevent Orchagent stuck alert.#2923

Merged
prsunny merged 2 commits intosonic-net:masterfrom
liuh-80:dev/liuh/improve-restart-freeze
Nov 8, 2023
Merged

Orchagent send heartbeat during warm-reboot to prevent Orchagent stuck alert.#2923
prsunny merged 2 commits intosonic-net:masterfrom
liuh-80:dev/liuh/improve-restart-freeze

Conversation

@liuh-80
Copy link
Copy Markdown
Contributor

@liuh-80 liuh-80 commented Oct 11, 2023

Orchangent send heartbeat during warm-reboot to prevent Orchagent stuck alert.

Why I did it

Orchangent will freese during warm-reboot, then supervisor-proc-exit-listener will generate false alert during warm reboot:
sonic-net/sonic-buildimage#16686

Work item tracking
  • Microsoft ADO: 25295846

How I did it

Send heartbeat during warm-reboot freeze.

How to verify it

Pass all UT.
Manually verify issue fixed by check syslog.

Which release branch to backport (provide reason below if selected)

  • 201811
  • 201911
  • 202006
  • 202012
  • 202106
  • 202111
  • 202205
  • 202211
  • 202305

Tested branch (Please provide the tested image version)

  • SONiC.master-17060.400216-0d0a0dba4
  • SONiC.202305-17081.401641-ec2aed854

Description for the changelog

Orchangent send heartbeat during warm-reboot to prevent Orchagent stuck alert.

Link to config_db schema for YANG module changes

A picture of a cute animal (not mandatory but encouraged)

@vaibhavhd
Copy link
Copy Markdown
Contributor

I think this qualifies for a new UT, to check of heartbeats are still sent after Orchagent pause.

@liuh-80
Copy link
Copy Markdown
Contributor Author

liuh-80 commented Oct 12, 2023

I think this qualifies for a new UT, to check of heartbeats are still sent after Orchagent pause.

Fixed, add heartbeat message check to warm reboot UT.

@liuh-80
Copy link
Copy Markdown
Contributor Author

liuh-80 commented Oct 17, 2023

Trigger build again

@liuh-80 liuh-80 closed this Oct 17, 2023
@liuh-80 liuh-80 reopened this Oct 17, 2023
qiluo-msft
qiluo-msft previously approved these changes Oct 18, 2023
@liuh-80

This comment was marked as resolved.

@mssonicbld
Copy link
Copy Markdown
Collaborator

/AzurePipelines run Azure.sonic-swss

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@dgsudharsan
Copy link
Copy Markdown
Collaborator

@liuh-80 Can you please address the pipeline failures?

@liuh-80
Copy link
Copy Markdown
Contributor Author

liuh-80 commented Oct 30, 2023

@liuh-80 Can you please address the pipeline failures?

Sure, I'm still working on this PR.

@liuh-80
Copy link
Copy Markdown
Contributor Author

liuh-80 commented Nov 1, 2023

The UT failed because supervisord not update process stdout to syslog immediately, it always delays few minutes, so test cause in this repo is very difficult to check the heartbeat signal.

Spend too much time on write UT in this repo, will add UT in sonic-mgmt repo to test heartbeat in warmreboot, which will be much easy.

@liuh-80 liuh-80 force-pushed the dev/liuh/improve-restart-freeze branch from 05ae172 to 6e19948 Compare November 1, 2023 08:43
@prsunny
Copy link
Copy Markdown
Collaborator

prsunny commented Nov 2, 2023

@qiluo-msft , @vaibhavhd to signoff

@prsunny prsunny changed the title Orchangent send heartbeat during warm-reboot to prevent Orchagent stuck alert. Orchagent send heartbeat during warm-reboot to prevent Orchagent stuck alert. Nov 2, 2023
@dgsudharsan
Copy link
Copy Markdown
Collaborator

@liuh-80 This PR conflicts when I cherry-pick to 202305 branch. Can you please create a new PR for fixing it in 202305?

@dgsudharsan
Copy link
Copy Markdown
Collaborator

@prsunny @qiluo-msft @vaibhavhd Can you please sign off this PR?

@prsunny
Copy link
Copy Markdown
Collaborator

prsunny commented Nov 8, 2023

I think this qualifies for a new UT, to check of heartbeats are still sent after Orchagent pause.

Fixed, add heartbeat message check to warm reboot UT.

@liuh-80 , i dont see the UT changes. Can you check?

@liuh-80
Copy link
Copy Markdown
Contributor Author

liuh-80 commented Nov 8, 2023

I think this qualifies for a new UT, to check of heartbeats are still sent after Orchagent pause.

Fixed, add heartbeat message check to warm reboot UT.

@liuh-80 , i dont see the UT changes. Can you check?

@prsunny , I try to add UT but found it's difficult to create a UT in this repo, because heartbeat message is send to systemd and sytemd have few minutes delay to write syslog.

So, my plan is to create test case in sonic-mgmt repo, will create PR in sonic-mgmt later.

Currently I manually verified change in this PR works.

@prsunny prsunny merged commit 2b02c24 into sonic-net:master Nov 8, 2023
@liuh-80
Copy link
Copy Markdown
Contributor Author

liuh-80 commented Nov 8, 2023

UT created in sonic-mgmt repo: sonic-net/sonic-mgmt#10676
Will public UT for review after sonic-swss submodule update.

@StormLiangMS
Copy link
Copy Markdown
Contributor

@liuh-80 could you help to update the test result with 202305?

saksarav-nokia pushed a commit to saksarav-nokia/sonic-swss that referenced this pull request Nov 8, 2023
*Orchagent send heartbeat during warm-reboot to prevent Orchagent stuck alert.
@liuh-80
Copy link
Copy Markdown
Contributor Author

liuh-80 commented Nov 9, 2023

@liuh-80 could you help to update the test result with 202305?

Description updated, tested on SONiC.202305-17081.401641-ec2aed854

@StormLiangMS
Copy link
Copy Markdown
Contributor

@liuh-80 cherry pick conflict, could you file separate PR to 202305?

liuh-80 added a commit to liuh-80/sonic-swss that referenced this pull request Nov 9, 2023
*Orchagent send heartbeat during warm-reboot to prevent Orchagent stuck alert.
@liuh-80
Copy link
Copy Markdown
Contributor Author

liuh-80 commented Nov 9, 2023

@liuh-80 cherry pick conflict, could you file separate PR to 202305?

202305 branch PR created: #2956

StormLiangMS pushed a commit that referenced this pull request Nov 11, 2023
Orchangent send heartbeat during warm-reboot to prevent Orchagent stuck alert.

Why I did it
Orchangent will freese during warm-reboot, then supervisor-proc-exit-listener will generate false alert during warm reboot:
sonic-net/sonic-buildimage#16686

Work item tracking
Microsoft ADO: 25295846
How I did it
Send heartbeat during warm-reboot freeze.

How to verify it
Pass all UT.
Manually verify issue fixed by check syslog.
Janetxxx pushed a commit to Janetxxx/sonic-swss that referenced this pull request Nov 10, 2025
*Orchagent send heartbeat during warm-reboot to prevent Orchagent stuck alert.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants