Skip to content

[pull] master from Azure:master#1364

Merged
pull[bot] merged 1 commit intomudsut4ke:masterfrom
sonic-net:master
Feb 2, 2021
Merged

[pull] master from Azure:master#1364
pull[bot] merged 1 commit intomudsut4ke:masterfrom
sonic-net:master

Conversation

@pull
Copy link
Copy Markdown

@pull pull bot commented Feb 2, 2021

See Commits and Changes for more details.


Created by pull[bot]

Can you help keep this open source service alive? 💖 Please sponsor : )

A few issues where discovered with crashkernel on Arista platforms.

1) platforms using `docker_inram=on` would end up OOM in kdump environment.
This happens because the same initramfs is used by SONiC and the crashkernel.
With `docker_inram=on` the `dockerfs.tar.gz` is extracted in a `tmpfs` created for the occasion.
Since `dockerfs.tar.gz` weights more than 1.5G, it doesn't fit into the kdump environment and ends up OOM.
This OOM event can in turn trigger a panic.

2) Arista platforms with `secureboot` enabled would fail to load the crashkernel because the kernel parameter would be discarded on boot.
This happens because the `boot0` in secureboot mode is strict about kernel parameter injection.

3) The secureboot path allowlist would remove kernel crash reports.

4) The kdump service would fail on Arista products since `/boot/` is empty in `secureboot`

**- How I did it**

1) To prevent an OOM event in the crashkernel the fix is to avoid the codepaths in `union-mount` that create tmpfs and populate them. Some more codepath specific to Arista devices are also skipped to make the kdump process faster.
This relies on detecting that the initramfs is starting in a kdump environment and skipping some initialization.
The `/usr/sbin/kdump-config` tool appends a few kernel cmdline arguments when loading the crashkernel.
The most unique one is `systemd.unit=kdump-tools.service` which is used in a few initramfs hooks to set `in_kdump`.

2) To allow `kdump` to work in `secureboot` environment the cmdline generation in boot0 was slightly modified.
The codepath to load kernel parameters changed by SONiC is now running for booting in secure mode.
It was altered to prevent an append only behavior which would grow the `kernel-cmdline` at every reboot.
This ever growing behavior would lead `kexec` to fail to load the kernel due to a too long cmdline.

3) To get the kernel crash under /var/crash this path has to be added to `allowlist_paths`

4) The `/host/image-XXX/boot` folder is now populated in `secureboot` mode but not used.

**- How to verify it**

Regular boot:
 - enable kdump
 - enable docker_inram=on via kernel-params
 - reboot
 - generate a crash `echo c > /proc/sysrq-trigger`
 - before: witness OOM events on the console
 - after: crash kernel works and crash available under /var/crash

Secure boot:
 - enable kdump
 - reboot
 - generate a crash `echo c > /proc/sysrq-trigger`
 - before: witness no kdump
 - after: crash kernel works and crash available under /var/crash


Co-authored-by: Boyang Yu <byu@arista.com>
@pull pull bot added the ⤵️ pull label Feb 2, 2021
@pull pull bot merged commit 0c4d4ac into mudsut4ke:master Feb 2, 2021
pull bot pushed a commit that referenced this pull request Feb 4, 2021
* 28d358f 2021-02-01 | [show] Run fwutil with sudo (#1364) (HEAD) [Volodymyr Boiko]
* a50b7a2 2021-01-29 | [ecnconfig] Allow ecn unit test to run without sudo (#1390) [Neetha John]
* 8a1109e 2021-01-29 | [sonic-installer] Add information to syslog (#1369) [Dmytro]
* c7c01e4 2021-01-27 | [show] fix "show interfaces breakout" command (#1198) [Dmytro Shevchuk]
* 7a8024a 2021-01-27 | Prevent user from adding more then a single untagged VLAN to an interface (#1382) [Eran Dahan]
* 41e62c6 2021-01-26 | [pcieutil] Add 'pcie-aer' sub-command to display AER stats (#1169) [Arun Saravanan Balachandran]
* 47f412b 2021-01-25 | Improve robustness of consutil plugin loading (#1353) [Samuel Angebault]
* 64aa1b8 2021-01-26 | [show] Fix warnings, related to gearbox, while show commands execution (#1343) [maksymbelei95]
* ff226d0 2021-01-25 | Prevent configuring IP interface on a port which is a member of VLAN (#1374) [Eran Dahan]
* f1522b9 2021-01-21 | [config_mgmt.py]: Set leaf-list to empty list while port breakout. (#1268) [Praveen Chaudhary]
* 99c05d5 2021-01-21 | add vlan_intf_object only if there are ipv4 or ipv6 mappings (#1377) [Sumukha Tumkur Vani]
* b082684 2021-01-21 | [ecn] Add tests for ecnconfig command (#1372) [Neetha John]
* 23e0920 2021-01-21 | [sfpshow] Enhance QSFP-DD DOM information (#1207) [shlomibitton]
* f4edba1 2021-01-20 | [ecnconfig] handle backend port names when extracting port I/F ID from the port name (#1361) [Mahesh Maddikayala]

Signed-off-by: Guohan Lu <lguohan@gmail.com>
mudsut4ke pushed a commit that referenced this pull request Jul 19, 2021
…sonic-net#6701)

sonic-swss:
- [Mux] Route handling based on mux status, kernel tunnel support (#1615)
- Reduce noise during frequent route update (#1624)
- Changed Error log to Notice log during FDB flush notification after VLAN delete (#1618)
- [PortsOrch] Add reference counting to ports for ACL bindings (#1614)
- [crm]: Ignore unsupported/non-implemented switch attributes (#1613)
- [Mux] Fix repeating logs in case of tunnel creation fail (#1610)

sonic-utilities:
- [config reload]: Restart mux container (#1401)
- [storyteller] Enhance the storyteller utility (#1400)
- [show] Fix int status when portchannel is in the system (#1376)
- [config][show] cli support for retrieving ber, eye-info and configuring prbs, loopback on Y-cable  (#1386)
- Skip route check for tun0 interfaces (#1399)
- do not parse stderr to get correct routing stack (#1398)
- [storyteller] allow storyteller to work on downloaded logs (#1388)
- [show] Run fwutil with sudo (#1364)

Signed-off-by: Danny Allen <daall@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant