Skip to content

LLDP: Fix issue with restart_orchagent as VOQ does not support warm restarting of orchagent#16765

Merged
arlakshm merged 2 commits intosonic-net:masterfrom
wumiaont:lldp2
Apr 18, 2025
Merged

LLDP: Fix issue with restart_orchagent as VOQ does not support warm restarting of orchagent#16765
arlakshm merged 2 commits intosonic-net:masterfrom
wumiaont:lldp2

Conversation

@wumiaont
Copy link
Contributor

@wumiaont wumiaont commented Feb 3, 2025

Description of PR

test_lldp_neighbor_post_orchagent_reboot uses prefix restart_orchagent() to warm restart orchagent. This warm restart of orchagent(kill orchagent and restart orchagent process) is not supported on chassis with type VOQ.

Add code to handle chassis type VOQ case by restarting the swss@0(1) service. This will cause ports to be removed from lldp table and add back in so we can test the issue found in #6560.

Summary:
Fixes # (issue)

Type of change

  • Bug fix
  • Testbed and Framework(new/improvement)
  • New Test case
    • Skipped for non-supported platforms
  • Test case improvement

Back port request

  • 202012
  • 202205
  • 202305
  • 202311
  • 202405
  • 202411

Approach

What is the motivation for this PR?

Currently test_lldp_neighbor_post_orchagent_reboot is failing against chassis with type VOQ. It's found warm restart of orchagent is not supported on VOQ type chassis.

How did you do it?

For VOQ type chassis, uses "systemctl restart swss@0(1)" instead.

How did you verify/test it?

After the fix the test passed consistently against VOQ type chassis.

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@arlakshm
Copy link
Contributor

arlakshm commented Apr 2, 2025

@abdosi, can help signoff on this change

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@wumiaont
Copy link
Contributor Author

wumiaont commented Apr 9, 2025

After using kill orchagent and restart orchagent, it's found that orchagent is stopped finally and never come back up. Following are the error logs:
2025 Apr 9 15:29:00.074651 ixre-egl-board29 WARNING swss0#supervisor-proc-exit-listener: Process 'orchagent' is stuck in namespace 'asic0' (1.0 minutes).
2025 Apr 9 15:30:00.545024 ixre-egl-board29 WARNING swss0#supervisor-proc-exit-listener: message repeated 60 times: [ Process 'orchagent' is stuck in namespace 'asic0' (1.0 minutes).]
2025 Apr 9 15:30:00.545024 ixre-egl-board29 WARNING swss0#supervisor-proc-exit-listener: Process 'orchagent' is stuck in namespace 'asic0' (2.0 minutes).
2025 Apr 9 15:30:21.565735 ixre-egl-board29 WARNING swss0#supervisor-proc-exit-listener: message repeated 20 times: [ Process 'orchagent' is stuck in namespace 'asic0' (2.0 minutes).]
2025 Apr 9 15:30:21.565735 ixre-egl-board29 WARNING swss0#supervisor-proc-exit-listener: Process 'orchagent' is stuck in namespace 'asic0' (2.0 minutes).
2025 Apr 9 15:31:00.601893 ixre-egl-board29 WARNING swss0#supervisor-proc-exit-listener: message repeated 38 times: [ Process 'orchagent' is stuck in namespace 'asic0' (2.0 minutes).]
2025 Apr 9 15:31:00.601893 ixre-egl-board29 WARNING swss0#supervisor-proc-exit-listener: Process 'orchagent' is stuck in namespace 'asic0' (3.0 minutes).
2025 Apr 9 15:31:21.621244 ixre-egl-board29 WARNING swss0#supervisor-proc-exit-listener: message repeated 20 times: [ Process 'orchagent' is stuck in namespace 'asic0' (3.0 minutes).]
2025 Apr 9 15:31:21.621268 ixre-egl-board29 WARNING swss0#supervisor-proc-exit-listener: Process 'orchagent' is stuck in namespace 'asic0' (3.0 minutes).
2025 Apr 9 15:32:00.657657 ixre-egl-board29 WARNING swss0#supervisor-proc-exit-listener: message repeated 38 times: [ Process 'orchagent' is stuck in namespace 'asic0' (3.0 minutes).]
2025 Apr 9 15:32:00.657657 ixre-egl-board29 WARNING swss0#supervisor-proc-exit-listener: Process 'orchagent' is stuck in namespace 'asic0' (4.0 minutes).

@wumiaont
Copy link
Contributor Author

wumiaont commented Apr 9, 2025

@ZhaohuiS Can we have a discussion about #6560? I want to see how that works and what we found is that kill orchagent and restart orchagent is not working on our Nokia chassis which uses VOQ. Also want to check if restart the swss services instead could also be OK for the issue you try to test or not.

@szhmery
Copy link

szhmery commented Apr 10, 2025

@wumiaont I think it's fine, in your change, will restart swss for voq, but for other types of DUT, the behavior is same as before, right?

@wumiaont
Copy link
Contributor Author

@wumiaont I think it's fine, in your change, will restart swss for voq, but for other types of DUT, the behavior is same as before, right?

Yes. That's correct. I only made changes for VOQ to restart swss instead. Other platform will still use the existing method to restart orchagent.

Copy link
Contributor

@ZhaohuiS ZhaohuiS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From your change, it will not impact other platforms.

@arlakshm arlakshm merged commit a5af471 into sonic-net:master Apr 18, 2025
15 checks passed
@arlakshm
Copy link
Contributor

@wumiaont, can you please create a PR for 202503 branch in the msft repo.

@wumiaont
Copy link
Contributor Author

@wumiaont, can you please create a PR for 202503 branch in the msft repo.

OK. Will do.

@wumiaont
Copy link
Contributor Author

@wumiaont, can you please create a PR for 202503 branch in the msft repo.

OK. Will do.

Azure/sonic-mgmt.msft#218

auspham pushed a commit to auspham/sonic-mgmt that referenced this pull request May 30, 2025
…estarting of orchagent (sonic-net#16765) (sonic-net#218)

Description of PR
test_lldp_neighbor_post_orchagent_reboot uses prefix restart_orchagent()
to warm restart orchagent. This warm restart of orchagent(kill orchagent
and restart orchagent process) is not supported on chassis with type
VOQ.

Add code to handle chassis type VOQ case by restarting the swss@0(1)
service. This will cause ports to be removed from lldp table and add
back in so we can test the issue found in sonic-net#6560.

Summary:
Fixes # (issue)

Type of change
 Bug fix
 Testbed and Framework(new/improvement)
 New Test case
 Skipped for non-supported platforms
 Test case improvement
Back port request
 202012
 202205
 202305
 202311
 202405
 202411
Approach
What is the motivation for this PR?
Currently test_lldp_neighbor_post_orchagent_reboot is failing against
chassis with type VOQ. It's found warm restart of orchagent is not
supported on VOQ type chassis.

How did you do it?
For VOQ type chassis, uses "systemctl restart swss@0(1)" instead.

How did you verify/test it?
After the fix the test passed consistently against VOQ type chassis.

<!--
Please make sure you've read and understood our contributing guidelines;
https://github.com/sonic-net/SONiC/blob/gh-pages/CONTRIBUTING.md

Please provide following information to help code review process a bit
easier:
-->
### Description of PR
<!--
- Please include a summary of the change and which issue is fixed.
- Please also include relevant motivation and context. Where should
reviewer start? background context?
- List any dependencies that are required for this change.
-->

Summary:
Fixes # (issue)

### Type of change

<!--
- Fill x for your type of change.
- e.g.
- [x] Bug fix
-->

- [ ] Bug fix
- [ ] Testbed and Framework(new/improvement)
- [ ] Test case(new/improvement)


### Back port request
- [ ] 202012
- [ ] 202205
- [ ] 202305
- [ ] 202311
- [ ] 202405

### Approach
#### What is the motivation for this PR?

#### How did you do it?

#### How did you verify/test it?

#### Any platform specific information?

#### Supported testbed topology if it's a new test case?

### Documentation
<!--
(If it's a new feature, new test case)
Did you update documentation/Wiki relevant to your implementation?
Link to the wiki page?
-->
opcoder0 pushed a commit to opcoder0/sonic-mgmt that referenced this pull request Dec 8, 2025
…estarting of orchagent (sonic-net#16765)

Description of PR
test_lldp_neighbor_post_orchagent_reboot uses prefix restart_orchagent() to warm restart orchagent. This warm restart of orchagent(kill orchagent and restart orchagent process) is not supported on chassis with type VOQ.

Add code to handle chassis type VOQ case by restarting the swss@0(1) service. This will cause ports to be removed from lldp table and add back in so we can test the issue found in sonic-net#6560.

Summary:
Fixes # (issue)

Type of change
 Bug fix
 Testbed and Framework(new/improvement)
 New Test case
 Skipped for non-supported platforms
 Test case improvement
Back port request
 202012
 202205
 202305
 202311
 202405
 202411
Approach
What is the motivation for this PR?
Currently test_lldp_neighbor_post_orchagent_reboot is failing against chassis with type VOQ. It's found warm restart of orchagent is not supported on VOQ type chassis.

How did you do it?
For VOQ type chassis, uses "systemctl restart swss@0(1)" instead.

How did you verify/test it?
After the fix the test passed consistently against VOQ type chassis.

Signed-off-by: opcoder0 <110003254+opcoder0@users.noreply.github.com>
AharonMalkin pushed a commit to AharonMalkin/sonic-mgmt that referenced this pull request Dec 16, 2025
…estarting of orchagent (sonic-net#16765)

Description of PR
test_lldp_neighbor_post_orchagent_reboot uses prefix restart_orchagent() to warm restart orchagent. This warm restart of orchagent(kill orchagent and restart orchagent process) is not supported on chassis with type VOQ.

Add code to handle chassis type VOQ case by restarting the swss@0(1) service. This will cause ports to be removed from lldp table and add back in so we can test the issue found in sonic-net#6560.

Summary:
Fixes # (issue)

Type of change
 Bug fix
 Testbed and Framework(new/improvement)
 New Test case
 Skipped for non-supported platforms
 Test case improvement
Back port request
 202012
 202205
 202305
 202311
 202405
 202411
Approach
What is the motivation for this PR?
Currently test_lldp_neighbor_post_orchagent_reboot is failing against chassis with type VOQ. It's found warm restart of orchagent is not supported on VOQ type chassis.

How did you do it?
For VOQ type chassis, uses "systemctl restart swss@0(1)" instead.

How did you verify/test it?
After the fix the test passed consistently against VOQ type chassis.

Signed-off-by: Aharon Malkin <amalkin@nvidia.com>
gshemesh2 pushed a commit to gshemesh2/sonic-mgmt that referenced this pull request Dec 21, 2025
…estarting of orchagent (sonic-net#16765)

Description of PR
test_lldp_neighbor_post_orchagent_reboot uses prefix restart_orchagent() to warm restart orchagent. This warm restart of orchagent(kill orchagent and restart orchagent process) is not supported on chassis with type VOQ.

Add code to handle chassis type VOQ case by restarting the swss@0(1) service. This will cause ports to be removed from lldp table and add back in so we can test the issue found in sonic-net#6560.

Summary:
Fixes # (issue)

Type of change
 Bug fix
 Testbed and Framework(new/improvement)
 New Test case
 Skipped for non-supported platforms
 Test case improvement
Back port request
 202012
 202205
 202305
 202311
 202405
 202411
Approach
What is the motivation for this PR?
Currently test_lldp_neighbor_post_orchagent_reboot is failing against chassis with type VOQ. It's found warm restart of orchagent is not supported on VOQ type chassis.

How did you do it?
For VOQ type chassis, uses "systemctl restart swss@0(1)" instead.

How did you verify/test it?
After the fix the test passed consistently against VOQ type chassis.

Signed-off-by: Guy Shemesh <gshemesh@nvidia.com>
gshemesh2 pushed a commit to gshemesh2/sonic-mgmt that referenced this pull request Jan 26, 2026
…estarting of orchagent (sonic-net#16765)

Description of PR
test_lldp_neighbor_post_orchagent_reboot uses prefix restart_orchagent() to warm restart orchagent. This warm restart of orchagent(kill orchagent and restart orchagent process) is not supported on chassis with type VOQ.

Add code to handle chassis type VOQ case by restarting the swss@0(1) service. This will cause ports to be removed from lldp table and add back in so we can test the issue found in sonic-net#6560.

Summary:
Fixes # (issue)

Type of change
 Bug fix
 Testbed and Framework(new/improvement)
 New Test case
 Skipped for non-supported platforms
 Test case improvement
Back port request
 202012
 202205
 202305
 202311
 202405
 202411
Approach
What is the motivation for this PR?
Currently test_lldp_neighbor_post_orchagent_reboot is failing against chassis with type VOQ. It's found warm restart of orchagent is not supported on VOQ type chassis.

How did you do it?
For VOQ type chassis, uses "systemctl restart swss@0(1)" instead.

How did you verify/test it?
After the fix the test passed consistently against VOQ type chassis.

Signed-off-by: Guy Shemesh <gshemesh@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

6 participants