Skip to content

[Fastboot] Delay LLDP service for better fastboot performance#10568

Merged
liat-grozovik merged 3 commits intosonic-net:masterfrom
shlomibitton:shlomi_delay_lldp_service
Apr 28, 2022
Merged

[Fastboot] Delay LLDP service for better fastboot performance#10568
liat-grozovik merged 3 commits intosonic-net:masterfrom
shlomibitton:shlomi_delay_lldp_service

Conversation

@shlomibitton
Copy link
Contributor

@shlomibitton shlomibitton commented Apr 13, 2022

Signed-off-by: Shlomi Bitton shlomibi@nvidia.com

Why I did it

Profiling the system state on init after fast-reboot during create_switch function execution, it is possible to see few python scripts running at the same time.
This parallel execution consume CPU time and the duration of create_switch is longer than it should be.
Following this finding, and the motivation to ensure these services will not interfere in the future, LLDP is delayed in 90 seconds until the system finish the init flow after fastboot.

bootchart_no_optimizations

How I did it

  • Add a timer for LLDP service.
  • Copy the timer file to the host bin image.

How to verify it

Run fast-reboot on MLNX platform and observe faster create_switch execution time.
This PR is dependent on PR: #10567

Which release branch to backport (provide reason below if selected)

  • 201811
  • 201911
  • 202006
  • 202012
  • 202106
  • 202111

Description for the changelog

Link to config_db schema for YANG module changes

A picture of a cute animal (not mandatory but encouraged)

Signed-off-by: Shlomi Bitton <shlomibi@nvidia.com>
@shlomibitton shlomibitton requested a review from lguohan as a code owner April 13, 2022 15:36
@liat-grozovik liat-grozovik requested review from stepanblyschak and vaibhavhd and removed request for lguohan April 13, 2022 15:48
@liat-grozovik liat-grozovik added the Request for 202111 Branch For PRs being requested for 202111 branch label Apr 13, 2022
yxieca
yxieca previously approved these changes Apr 13, 2022
[Timer]
OnUnitActiveSec=0 sec
OnBootSec=1min 30 sec
Unit=lldp.service
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lldp.service is a per-namespace container. Will that work for multi-asic? Should we have several timers?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@vaibhavhd
Copy link
Contributor

I think this change will impact all kinds of reboot - warm/fast/cold/etc. Do you agree?

If yes, is that what we want to do? Can we do this only for fast and warmboot case?

@shlomibitton
Copy link
Contributor Author

I think this change will impact all kinds of reboot - warm/fast/cold/etc. Do you agree?

If yes, is that what we want to do? Can we do this only for fast and warmboot case?

@vaibhavhd I agree, I added a condition for it on syncd.sh like we do for pmon today.

/bin/systemctl start pmon
debug "Started pmon service"
fi
if [[ x"$BOOT_TYPE" != x"cold" ]]; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: This is correct, but it may be better to match against whitelist BOOT_TYPEs (fast, warm, fastfast).
This is so that this condition does not evaluate to true for any other boottype that gets added in future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -0,0 +1,12 @@
[Unit]
# This delay is for fast-reboot performance
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correction - fast-reboot and warm-reboot performance.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

vaibhavhd
vaibhavhd previously approved these changes Apr 18, 2022
Copy link
Contributor

@vaibhavhd vaibhavhd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments pending, LGTM otherwise. Please address Stepan's concerns before merging.

… for multi asic service timer.

Change the timer file to timer template file.
Allign init_cfg.json.j2 to delay LLDP.
Fix review comments.
@shlomibitton
Copy link
Contributor Author

@vaibhavhd @stepanblyschak I fixed your comments, can you please review and approve?
Thanks

@liat-grozovik
Copy link
Collaborator

@vaibhavhd @stepanblyschak kindly reminder to review

@liat-grozovik liat-grozovik merged commit 1d84e0d into sonic-net:master Apr 28, 2022
judyjoseph pushed a commit that referenced this pull request May 2, 2022
- Why I did it
Profiling the system state on init after fast-reboot during create_switch function execution, it is possible to see few python scripts running at the same time.
This parallel execution consume CPU time and the duration of create_switch is longer than it should be.
Following this finding, and the motivation to ensure these services will not interfere in the future, LLDP is delayed in 90 seconds until the system finish the init flow after fastboot.

- How I did it
Add a timer for LLDP service.
Copy the timer file to the host bin image.

- How to verify it
Run fast-reboot on MLNX platform and observe faster create_switch execution time.
This PR is dependent on PR: #10567
@qiluo-msft
Copy link
Collaborator

This commit could not be cleanly cherry-picked to 202012. Please submit another PR.

@shlomibitton
Copy link
Contributor Author

@qiluo-msft I created a separate PR for 202012:
#10744

liat-grozovik pushed a commit that referenced this pull request May 15, 2022
#10568) (#10744)

This PR is to backport a fix #10568
This PR is dependent on PR: #10745

- Why I did it
Profiling the system state on init after fast-reboot during create_switch function execution, it is possible to see few python scripts running at the same time.
This parallel execution consume CPU time and the duration of create_switch is longer than it should be.
Following this finding, and the motivation to ensure these services will not interfere in the future, LLDP is delayed in 90 seconds until the system finish the init flow after fastboot.

- How I did it
Add a timer for LLDP service.
Copy the timer file to the host bin image.

- How to verify it
Run fast-reboot on MLNX platform and observe faster create_switch execution time.
@shlomibitton shlomibitton deleted the shlomi_delay_lldp_service branch May 26, 2022 12:25
liushilongbuaa pushed a commit to liushilongbuaa/sonic-buildimage that referenced this pull request Jun 20, 2022
…net#10568)

- Why I did it
Profiling the system state on init after fast-reboot during create_switch function execution, it is possible to see few python scripts running at the same time.
This parallel execution consume CPU time and the duration of create_switch is longer than it should be.
Following this finding, and the motivation to ensure these services will not interfere in the future, LLDP is delayed in 90 seconds until the system finish the init flow after fastboot.

- How I did it
Add a timer for LLDP service.
Copy the timer file to the host bin image.

- How to verify it
Run fast-reboot on MLNX platform and observe faster create_switch execution time.
This PR is dependent on PR: sonic-net#10567
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Included in 202111 Branch Request for 202111 Branch For PRs being requested for 202111 branch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants