add kernel coredump and analysis on sonic kernel#3276
add kernel coredump and analysis on sonic kernel#3276sun-siyuan wants to merge 10 commits intosonic-net:masterfrom
Conversation
|
can you describe the test you have done? |
|
yes, 256M is correct, I cherry-pick from dev-master, let me check why this happened
------------------------------------------------------------------
From:Jipan Yang <[email protected]>
Sent At:2019 Aug. 2 (Fri.) 10:44
To:Azure/sonic-buildimage <[email protected]>
Cc:Siyuan <[email protected]>; Author <[email protected]>
Subject:Re: [Azure/sonic-buildimage] apply kdump supported package and config on fsroot (#3276)
@jipanyang commented on this pull request.
In files/image_config/platform/rc.local:
@@ -318,6 +318,11 @@ if [ -f $FIRST_BOOT_FILE ]; then
# Initialize the SONiC's grub config
mv /host/grub.cfg /host/grub/grub.cfg
fi
+ sed -i 's/[^M] quiet/ crashkernel=768M quiet/' /host/grub/grub.cfg
Please check whether 768M is absolutely necessary, 256M could be more reasonable?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or mute the thread.
|
sun-siyuan
left a comment
There was a problem hiding this comment.
can you describe the test you have done?
just add test section to commit comments
|
Can you please share console output after you issued commands Thanks |
sure, please see below: [1568872.292681] sysrq: SysRq : Trigger a crash |
|
@sun-siyuan Thanks |
|
yes, reboot is the default action after crash. if we need to keep the data flow running temporarily without reboot, which could be done as well.
…------------------------------------------------------------------
From:pavel-shirshov <[email protected]>
Sent At:2019 Aug. 5 (Mon.) 18:06
To:Azure/sonic-buildimage <[email protected]>
Cc:Siyuan <[email protected]>; Mention <[email protected]>
Subject:Re: [Azure/sonic-buildimage] apply kdump supported package and config on fsroot (#3276)
@sun-siyuan
Thank you for the output.
But after the crash the system rebooted itself?
Thanks
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
pavel-shirshov
left a comment
There was a problem hiding this comment.
Thank you for the patch.
This patch looks good for me except two things:
- 256M of memory would be used for kdump.
- Reboot would take longer in case of the kernel crash
Probably we need to put this feature as an option in sonic-buildimage.
But if @lguohan ok we can keep it as it is. I'm ready to approve it.
Thanks
|
how do we decide to reserve 256M? I also agree with pavel, we should make it optional. |
pavel-shirshov
left a comment
There was a problem hiding this comment.
Can you please make this feature optional with default no?
See comments from @lguohan
we were using 768M according to redhat recommend but consider it could be a waste of running memory, and test with 256M which works fine. I just update the PR to make it optional |
please check the new diff, which make this optional and default NO |
This comment has been minimized.
This comment has been minimized.
|
@pavel-shirshov please let me know whether you are ok with the new changes |
|
retest this please |
This comment has been minimized.
This comment has been minimized.
|
besides the build option, I think we should have a command line to enable/disable this kernel crash dump feature. like config kdump enable --size=265M |
c831d57 to
8d723c5
Compare
|
is there any plan to update the PR based on the feedback? |
|
this is addressed as well, it will be updated once I get all test done
------------------------------------------------------------------
From:lguohan <[email protected]>
Sent At:2019 Sep. 3 (Tue.) 12:31
To:Azure/sonic-buildimage <[email protected]>
Cc:Siyuan <[email protected]>; Mention <[email protected]>
Subject:Re: [Azure/sonic-buildimage] apply kdump supported package and config on fsroot (#3276)
@lguohan commented on this pull request.
In files/build_templates/sonic_debian_extension.j2:
@@ -331,6 +331,17 @@ sudo LANG=C DEBIAN_FRONTEND=noninteractive chroot $FILESYSTEM_ROOT apt-get purge
sudo LANG=C DEBIAN_FRONTEND=noninteractive chroot $FILESYSTEM_ROOT apt-get clean -y
sudo LANG=C DEBIAN_FRONTEND=noninteractive chroot $FILESYSTEM_ROOT apt-get autoremove -y
+# install kernel dump utility
+{%- if sonic_kdump_enable == "y" %}
+{% set coredump = [ "crash", "makedumpfile" ] -%}
+{% for pkg in coredump -%}
+sudo LANG=C DEBIAN_FRONTEND=noninteractive chroot $FILESYSTEM_ROOT apt-get install -y {{ pkg }}
+{% endfor -%}
+sudo LANG=C DEBIAN_FRONTEND=noninteractive chroot $FILESYSTEM_ROOT apt-get install -y kdump-tools || true
+sudo sed -i 's/\/MODULES=dep\//\/MODULES=most\//' $FILESYSTEM_ROOT/etc/kernel/postinst.d/kdump-tools
can you address this question?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
|
vsimage failed due to no space error, please fix and trigger retest WARNING: No swap limit support Step 1/18 : FROM debian:stretch |
Summary: add kdump package to j2 template and config after Test Plan: test with new image, no issue observed Reviewers: P604087 Subscribers: P604087 Differential Revision: https://aone.alibaba-inc.com/code/D891921
correct crashkernel size to 256M, error introduced by cherry-pick
fix SONIC_ENABLE_KDUMP
…d build option to en/disable it
|
test failed with below error, need retest Setting status of 451664d to FAILURE with url https://sonic-jenkins.westus2.cloudapp.azure.com/job/broadcom/job/buildimage-brcm-all-pr/1278/ and message: 'Build finished. No test results found.' |
|
retest please |
|
please retest |
|
retest please |
|
retest all please |
|
retest this please |
|
core dump added by broadcom |
Summary: add kdump package to j2 template and config after
this is the to add kdump to capture the kernel crash core and for further analysis by crash tool
in this PR, contain two part
1, install kdump tool chain to host environment
2, configure kdump tool in both boot up via grub.cfg and system level
test done:
test build process, build sonic-broadcom.bin and sonic-aboot-broadcom.swi
-rwxr-xr-x 1 sun sun 562086493 Jul 28 21:28 sonic-aboot-broadcom.swi
-rw-r--r-- 1 sun sun 215882 Jul 28 21:28 sonic-aboot-broadcom.swi.log
-rwxr-xr-x 1 sun sun 569867542 Jul 28 00:52 sonic-broadcom.bin
-rw-r--r-- 1 sun sun 285585 Jul 28 00:52 sonic-broadcom.bin.log
test image, load sonic-broadcom.bin to switch
Signed-off-by: siyuan sun [email protected]