Skip to content

Set loglevel for crash kernel to reduce verbosity and improve overall router recovery time#18285

Merged
StormLiangMS merged 1 commit intosonic-net:masterfrom
amulyan7:amunagar/kdump-log
Mar 13, 2024
Merged

Set loglevel for crash kernel to reduce verbosity and improve overall router recovery time#18285
StormLiangMS merged 1 commit intosonic-net:masterfrom
amulyan7:amunagar/kdump-log

Conversation

@amulyan7
Copy link
Contributor

@amulyan7 amulyan7 commented Mar 6, 2024

Why I did it

On certain routers with baud rate 9600, crash kernel is taking a long time , close to ~5mins, to complete kernel dump and reload the box. On contrast to routers with baud rate 115200, crash kernel dump process is observed to be completed under 35s-60s (depending on the platform). Currently, all debug and informational messages are printed on the console which also factors in for the delay seen. Unless the router is monitored on console in real time, these messages are not very useful. Setting the loglevel to warning will help reduce the verbosity of logs on console, in turn allow crash kernel dump process to be completed in a reasonable time which will also help in overall router recovery time.

How I did it

Setting loglevel attribute in crashkernel cmdline

How to verify it

Install SONiC image with crashkernel cmdline with loglevel set to warning and initiate an induced a crash (sysrq-trigger)
crashkernel boot and dump process will be completed in 20s-30s depending on the platform

Which release branch to backport (provide reason below if selected)

  • 201811
  • 201911
  • 202006
  • 202012
  • 202106
  • 202111
  • 202205
  • 202211
  • 202305

Description for the changelog

Set loglevel attribute for crash kernel

@amulyan7 amulyan7 requested a review from lguohan as a code owner March 6, 2024 20:36
@amulyan7
Copy link
Contributor Author

amulyan7 commented Mar 6, 2024

@wsycqyz Please review

Copy link
Contributor

@wsycqyz wsycqyz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@amulyan7
Copy link
Contributor Author

amulyan7 commented Mar 8, 2024

@wsycqyz Shiyan, can you tag appropriate code owners to review and take this PR forward?

@bmridul
Copy link
Contributor

bmridul commented Mar 11, 2024

@rlhui @abdosi , Pls take a look.

Copy link
Contributor

@wsycqyz wsycqyz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@StormLiangMS StormLiangMS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@StormLiangMS StormLiangMS merged commit 8044838 into sonic-net:master Mar 13, 2024
mssonicbld pushed a commit to mssonicbld/sonic-buildimage that referenced this pull request Mar 13, 2024
… router recovery time (sonic-net#18285)

Why I did it
On certain routers with baud rate 9600, crash kernel is taking a long time , close to ~5mins, to complete kernel dump and reload the box. On contrast to routers with baud rate 115200, crash kernel dump process is observed to be completed under 35s-60s (depending on the platform). Currently, all debug and informational messages are printed on the console which also factors in for the delay seen. Unless the router is monitored on console in real time, these messages are not very useful. Setting the loglevel to warning will help reduce the verbosity of logs on console, in turn allow crash kernel dump process to be completed in a reasonable time which will also help in overall router recovery time.

How I did it
Setting loglevel attribute in crashkernel cmdline

How to verify it
Install SONiC image with crashkernel cmdline with loglevel set to warning and initiate an induced a crash (sysrq-trigger)
crashkernel boot and dump process will be completed in 20s-30s depending on the platform
@mssonicbld
Copy link
Collaborator

Cherry-pick PR to 202311: #18351

mssonicbld pushed a commit to mssonicbld/sonic-buildimage that referenced this pull request Mar 13, 2024
… router recovery time (sonic-net#18285)

Why I did it
On certain routers with baud rate 9600, crash kernel is taking a long time , close to ~5mins, to complete kernel dump and reload the box. On contrast to routers with baud rate 115200, crash kernel dump process is observed to be completed under 35s-60s (depending on the platform). Currently, all debug and informational messages are printed on the console which also factors in for the delay seen. Unless the router is monitored on console in real time, these messages are not very useful. Setting the loglevel to warning will help reduce the verbosity of logs on console, in turn allow crash kernel dump process to be completed in a reasonable time which will also help in overall router recovery time.

How I did it
Setting loglevel attribute in crashkernel cmdline

How to verify it
Install SONiC image with crashkernel cmdline with loglevel set to warning and initiate an induced a crash (sysrq-trigger)
crashkernel boot and dump process will be completed in 20s-30s depending on the platform
@mssonicbld
Copy link
Collaborator

Cherry-pick PR to 202305: #18352

mssonicbld pushed a commit that referenced this pull request Mar 13, 2024
… router recovery time (#18285)

Why I did it
On certain routers with baud rate 9600, crash kernel is taking a long time , close to ~5mins, to complete kernel dump and reload the box. On contrast to routers with baud rate 115200, crash kernel dump process is observed to be completed under 35s-60s (depending on the platform). Currently, all debug and informational messages are printed on the console which also factors in for the delay seen. Unless the router is monitored on console in real time, these messages are not very useful. Setting the loglevel to warning will help reduce the verbosity of logs on console, in turn allow crash kernel dump process to be completed in a reasonable time which will also help in overall router recovery time.

How I did it
Setting loglevel attribute in crashkernel cmdline

How to verify it
Install SONiC image with crashkernel cmdline with loglevel set to warning and initiate an induced a crash (sysrq-trigger)
crashkernel boot and dump process will be completed in 20s-30s depending on the platform
mssonicbld pushed a commit that referenced this pull request Mar 15, 2024
… router recovery time (#18285)

Why I did it
On certain routers with baud rate 9600, crash kernel is taking a long time , close to ~5mins, to complete kernel dump and reload the box. On contrast to routers with baud rate 115200, crash kernel dump process is observed to be completed under 35s-60s (depending on the platform). Currently, all debug and informational messages are printed on the console which also factors in for the delay seen. Unless the router is monitored on console in real time, these messages are not very useful. Setting the loglevel to warning will help reduce the verbosity of logs on console, in turn allow crash kernel dump process to be completed in a reasonable time which will also help in overall router recovery time.

How I did it
Setting loglevel attribute in crashkernel cmdline

How to verify it
Install SONiC image with crashkernel cmdline with loglevel set to warning and initiate an induced a crash (sysrq-trigger)
crashkernel boot and dump process will be completed in 20s-30s depending on the platform
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants