Skip to content

Change the default number of Kernel dumps to 3#20647

Closed
bmridul wants to merge 1 commit intosonic-net:masterfrom
bmridul:num_kdumps
Closed

Change the default number of Kernel dumps to 3#20647
bmridul wants to merge 1 commit intosonic-net:masterfrom
bmridul:num_kdumps

Conversation

@bmridul
Copy link
Copy Markdown
Contributor

@bmridul bmridul commented Oct 30, 2024

Why I did it

Currently there is no limit on the number of kernel dumps that will be captured in the system. This leads to excessive disk space usage if the system encounters many kernel crashes (e.g. as part of sonic-mgmt test suite runs).

According to the HLD, the default number of kdumps should be 3. However the fix is missing in code.

https://github.com/sonic-net/SONiC/blob/master/doc/kdump/SONiC-kdump.md#config-kdump-num_dumps-number

This PR is providing the fix.

Work item tracking
  • Microsoft ADO (number only):

How I did it

Set the number of kernel dumps to 3 in /etc/default/kdump-tools

How to verify it

UT Log:
root@sonic:/home/cisco# show reboot h
Name Cause Time User Comment


2024_08_26_19_35_54 Kernel Panic Mon Aug 26 07:32:18 PM UTC 2024 N/A N/A
2024_08_26_19_29_47 Kernel Panic Mon Aug 26 07:26:16 PM UTC 2024 N/A N/A
2024_08_26_19_04_03 Kernel Panic Mon Aug 26 07:00:36 PM UTC 2024 N/A N/A
2024_08_26_18_54_39 Kernel Panic Mon Aug 26 06:51:35 PM UTC 2024 N/A N/A
2024_08_26_18_43_13 reboot Mon Aug 26 06:36:53 PM UTC 2024 cisco N/A
...
root@sonic:/home/cisco# show kdump files
Kernel core dump files Kernel dmesg files


/var/crash/202408261932/kdump.202408261932 /var/crash/202408261932/dmesg.202408261932
/var/crash/202408261926/kdump.202408261926 /var/crash/202408261926/dmesg.202408261926
/var/crash/202408261900/kdump.202408261900 /var/crash/202408261900/dmesg.202408261900
root@sonic:/home/cisco# ls /var/crash/
202408261900 202408261926 202408261932 kdump_lock kexec_cmd

Which release branch to backport (provide reason below if selected)

  • 201811
  • 201911
  • 202006
  • 202012
  • 202106
  • 202111
  • 202205
  • 202211
  • 202305

Tested branch (Please provide the tested image version)

  • 202405

Description for the changelog

Set the number of kernel dumps to 3 in /etc/default/kdump-tools

Link to config_db schema for YANG module changes

N/A

A picture of a cute animal (not mandatory but encouraged)

@bmridul bmridul marked this pull request as ready for review October 30, 2024 07:31
@bmridul bmridul requested a review from lguohan as a code owner October 30, 2024 07:31
@bmridul
Copy link
Copy Markdown
Contributor Author

bmridul commented Oct 30, 2024

@prgeor , Pls review

@bmridul
Copy link
Copy Markdown
Contributor Author

bmridul commented Oct 30, 2024

@abdosi , Pls check

@abdosi abdosi self-requested a review November 1, 2024 00:18
@abdosi
Copy link
Copy Markdown
Contributor

abdosi commented Nov 1, 2024

@saiarcot895 : can you please review this.

Copy link
Copy Markdown
Contributor

@saiarcot895 saiarcot895 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't appear to be needed. On a device where kdump is disabled, running sudo config kdump enable modifies /etc/default/kdump-tools and sets KDUMP_NUM_DUMPS to 3.

@bmridul
Copy link
Copy Markdown
Contributor Author

bmridul commented Nov 4, 2024

This doesn't appear to be needed. On a device where kdump is disabled, running sudo config kdump enable modifies /etc/default/kdump-tools and sets KDUMP_NUM_DUMPS to 3.

The above assumes that the kernel dump is explicitly enabled by CLI. We have enabled kdump by default in this PR for Cisco platforms.
#16224
With that the KDUMP_NUM_DUMPS is not set as default in /etc/default/kdump-tools

@bmridul
Copy link
Copy Markdown
Contributor Author

bmridul commented Nov 13, 2024

@saiarcot895 Pls check the response above.

@saiarcot895
Copy link
Copy Markdown
Contributor

@bmridul That's because hostcfgd is not applying the changes from the default state to the runtime configuration. Please modify hostcfgd to add proper support for this.

@bmridul
Copy link
Copy Markdown
Contributor Author

bmridul commented Nov 14, 2024

@bmridul That's because hostcfgd is not applying the changes from the default state to the runtime configuration. Please modify hostcfgd to add proper support for this.

Ack. I will check.

@bmridul
Copy link
Copy Markdown
Contributor Author

bmridul commented Jan 10, 2025

Closing this PR and opened sonic-net/sonic-host-services#202

@bmridul bmridul closed this Jan 10, 2025
@bmridul
Copy link
Copy Markdown
Contributor Author

bmridul commented Jan 15, 2025

@saiarcot895 , Pls review sonic-net/sonic-host-services#202

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants