Skip to content

[action] [PR:18285] Set loglevel for crash kernel to reduce verbosity and improve overall router recovery time (#18285)#18351

Merged
mssonicbld merged 1 commit intosonic-net:202311from
mssonicbld:cherry/202311/18285
Mar 15, 2024
Merged

[action] [PR:18285] Set loglevel for crash kernel to reduce verbosity and improve overall router recovery time (#18285)#18351
mssonicbld merged 1 commit intosonic-net:202311from
mssonicbld:cherry/202311/18285

Conversation

@mssonicbld
Copy link
Copy Markdown
Collaborator

Why I did it
On certain routers with baud rate 9600, crash kernel is taking a long time , close to ~5mins, to complete kernel dump and reload the box. On contrast to routers with baud rate 115200, crash kernel dump process is observed to be completed under 35s-60s (depending on the platform). Currently, all debug and informational messages are printed on the console which also factors in for the delay seen. Unless the router is monitored on console in real time, these messages are not very useful. Setting the loglevel to warning will help reduce the verbosity of logs on console, in turn allow crash kernel dump process to be completed in a reasonable time which will also help in overall router recovery time.

How I did it
Setting loglevel attribute in crashkernel cmdline

How to verify it
Install SONiC image with crashkernel cmdline with loglevel set to warning and initiate an induced a crash (sysrq-trigger)
crashkernel boot and dump process will be completed in 20s-30s depending on the platform

… router recovery time (sonic-net#18285)

Why I did it
On certain routers with baud rate 9600, crash kernel is taking a long time , close to ~5mins, to complete kernel dump and reload the box. On contrast to routers with baud rate 115200, crash kernel dump process is observed to be completed under 35s-60s (depending on the platform). Currently, all debug and informational messages are printed on the console which also factors in for the delay seen. Unless the router is monitored on console in real time, these messages are not very useful. Setting the loglevel to warning will help reduce the verbosity of logs on console, in turn allow crash kernel dump process to be completed in a reasonable time which will also help in overall router recovery time.

How I did it
Setting loglevel attribute in crashkernel cmdline

How to verify it
Install SONiC image with crashkernel cmdline with loglevel set to warning and initiate an induced a crash (sysrq-trigger)
crashkernel boot and dump process will be completed in 20s-30s depending on the platform
@mssonicbld
Copy link
Copy Markdown
Collaborator Author

Original PR: #18285

@mssonicbld
Copy link
Copy Markdown
Collaborator Author

/azp run Azure.sonic-buildimage

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Copy Markdown
Collaborator Author

/azp run Azure.sonic-buildimage

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld mssonicbld merged commit 5b91eeb into sonic-net:202311 Mar 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants