Skip to content

Conversation

@pothos
Copy link
Member

@pothos pothos commented Sep 3, 2025

The growth of binaries over time and the inclusion of new features
filled the available boot partition space, so that the kernel+initrd
almost couldn't fit twice anymore as required for updates. We employed
workarounds such as wrapper scripts for ignition, afterburn and other
binaries so that they are loaded from /usr. However, this was still not
enough and we would have to do the same for (network) kernel modules and
firmware. To avoid making this ever more complex we can use a dedicated
initrd focused on loading the full initrd from /usr and then this full
initrd can use dracut as before and even drop all the workarounds we
accumulated.

Generate a minimal initrd to use instead of the full bootengine initrd.
The bootengine initrd gets stored as squashfs on /usr. The minimal
initrd still includes the early_cpio for amd64 microcode updates.
We have a fixed list of modules or module directories to include, only
focused on loading /usr and any emergency console interaction. This
requires also checking for module dependencies to copy over.
The busybox, veritysetup, and kmod binaries are needed and get their
required libraries resolved and copied over. They are not static and
use shared libraries which should be ok for now. The resulting vmlinuz
file is 27 MB for amd64, down from ~60 MB, so we have enough room to
include more kernel modules and so on for the next years while we also
grow the boot partition and wait for users to redeploy until we can rely
on a larger boot partition and eventually drop the minimal initrd again.

Pulls in flatcar/bootengine#110 for the
minimal initrd script and flatcar/seismograph#12
for making the device mapper discovery for the "rootdev" command more
reliable.

This also requied a backport of a kernel patch from 2017 that exposes
the PARTUUID in the /sys uevent file.

How to use

Depends on flatcar/bootengine#110 and flatcar/seismograph#12

And flatcar/flatcar-build-scripts#174 for the image size report (but that only works when this is included in the first nightly)

Testing done

On all clouds (Equinix Metal arm64 was manually tested) - The build got gc'ed, a more limited new run is here

The bootengine.img initrd size/content reporting only works after the first nightly is built.

  • Changelog entries added in the respective changelog/ directory (user-facing change, bug fix, security fix, update)
  • Inspected CI output for image differences: /boot and /usr size, packages, list files for any missing binaries, kernel modules, config files, kernel modules, etc.

@pothos
Copy link
Member Author

pothos commented Sep 3, 2025

@chewi My idea is to load the "normal" initrd as loopback mount from /usr and switch to it for Ignition, network drivers and so on. If you want we could try to plug your busybox experiments in on this branch. To keep things simple and avoid risking breakage I would assume that the minimal initrd still ships the CPU microcode for the kernel to load because when we load it from userland there are more things to be aware of that aren't well supported, as far as I remember. Also, the /usr verity mount should be done by the minimal initrd and we would bind mount it into the "normal" initrd mount to reuse it (or maybe dm-verity also has no problems doing the work twice? For performance reasons we still might want to only do it once). In the end we could even drop the wrappers if we start to include afterburn and ignition (and other wrappers) again in the initrd and remove it from /usr where they aren't actually needed.

@github-actions
Copy link

github-actions bot commented Sep 3, 2025

Copy link
Contributor

@chewi chewi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although I shared the same concern about losing functionality that we would have to reimplement, I hadn't yet identified any such functionality, so I'm not quite ready to throw out my proposal to go straight from tiny initrd to real /usr. I'd really like to know what your specific concerns are.

This is an interesting approach in any case. My own alternative would have been to mount /usr as an overlay with the initrd, deleting all the duplicate files from the initrd, but I hadn't fully thought it through.

Regarding verity, I think it only needs to be set up once. I didn't enable verity in my own experiment, but /sysroot/usr was simply a bind mount of /usr. I think that would still work with verity applied.

@pothos
Copy link
Member Author

pothos commented Sep 3, 2025

my proposal to go straight from tiny initrd to real /usr. I'd really like to know what your specific concerns are.

My intention was to keep most things untouched so that we can focus on the bare task of jumping into the regular initrd and avoid any risk of reimplementing all needed initrd logic. Things that should run from the initrd are: Ignition stages, hostname setup with afterburn (and basic network setup for them while they prepare the final network setup for the real system), setup of the /etc overlay, A/B sysext setup for OEM and extra sysexts (inc. fallback download), encrypted rootfs unlocking, generation of host-specific /etc files, disk uuid init, first-boot detection, and propably other stuff I don't remember. I think we should rely on the current code for almost all of it to avoid breaking things.

@chewi
Copy link
Contributor

chewi commented Sep 3, 2025

Okay, but I wasn't proposing rewriting all that. Dracut puts those scripts into an initrd. I was just going to put them in /usr instead. It's more or less the same thing. It's the scripts that Dracut itself provides through its own modules that I was concerned about.

@pothos
Copy link
Member Author

pothos commented Sep 4, 2025

The question is on how these things are started because they run in a context with dependencies. Having only one set of systemd units for both the initrd and the final system doesn't work if we want to make use of systemd in the initrd - it would run all enabled units under /usr (unless we inject the initrd check into most of them). Creating a separate environment manually without dracut we would have more work and risk compared to moving the current initrd as a whole. When we get things working and have time left we can still try this out as optimization (preparing an /etc for the initrd stage with /etc/initrd-release, pulling in the ignition systemd units and so on, and with masking or customizing all unnecessary units through drop-ins).

Copy link
Contributor

@chewi chewi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good! I'm tentatively approving this, just a couple of things to consider.

You can drop the sudo calls. RESTRICT="userpriv" means we're already running as root because Dracut needs it. If we didn't have that, sudo wouldn't work anyway.

I'm now somewhat confused about the compression. The kernel documentation says that you're only supposed to pass a single cpio to CONFIG_INITRAMFS_SOURCE. It also says the early cpio must not be compressed. We have been telling Dracut not to compress, so what we've been providing has been totally uncompressed. Copilot says that the kernel build isn't smart enough to only compress the main part, but it also says that the uncompressed early cpio rule only applies when you're passing the initramfs separately at boot time, not when it's built in. I suppose that must be true, since it appears that we've been compressing the whole thing via the kernel build. What you've proposed doesn't change that, but I thought it would be a good opportunity to write this down and check that we're all on the same page.

@pothos
Copy link
Member Author

pothos commented Oct 8, 2025

Ok, dropped the sudo calls.

Yes, good question - I assume that the kernel build system knows whether the first cpio can be compressed or not. We could check with real hardware if we get the microcode update applied or not (with any Flatcar release as we didn't change this).

@chewi
Copy link
Contributor

chewi commented Oct 8, 2025

Yes, probably best to check that the microcode actually works, not merely whether we've changed anything. The microcode was actually missing entirely until I fixed that a few months back! See #2837.

@pothos
Copy link
Member Author

pothos commented Oct 9, 2025

Yes, I think I tested it but that was with the truncation - I don't remember the details (Edit: Tested now and it doesn't seem to work either). After the changes with the new lsinitrd extraction I didn't test it again and just see now that it doesn't seem to work.

@pothos
Copy link
Member Author

pothos commented Oct 9, 2025

Confusing, it doesn't work on Alpha either but on Stable I've seen it applied.

@pothos
Copy link
Member Author

pothos commented Oct 9, 2025

Microcode updating is also not working in Beta.

Sounds like the behavior changed in the PR you linked.
Stable doesn't have your changes and prints this at boot:

[    2.835111] microcode: Current revision: 0x00000100
[    2.840016] microcode: Updated early from: 0x000000f4

But Beta has your changes and it prints:

[    5.932758] microcode: Current revision: 0x000000f4

@pothos
Copy link
Member Author

pothos commented Oct 9, 2025

So I guess the current way of passing it in does indeed not work and we need to change this. But not in this PR.

@pothos
Copy link
Member Author

pothos commented Oct 9, 2025

I created a bugreport for it: flatcar/Flatcar#1909

pothos added a commit to flatcar/bootengine that referenced this pull request Oct 9, 2025
The growth of binaries over time and the inclusion of new features
    filled the available boot partition space, so that the kernel+initrd
almost couldn't fit twice anymore as required for updates. We employed
workarounds such as wrapper scripts for ignition, afterburn and other
binaries so that they are loaded from /usr. However, this was still not
enough and we would have to do the same for (network) kernel modules and
firmware. To avoid making this ever more complex we can use a dedicated
initrd focused on loading the full initrd from /usr and then this full
    initrd can use dracut as before and even drop all the workarounds we
    accumulated.

Introduce a busybox init script that prepares a minimal environment,
has debug toggles and an emergency shell, and only loads the real initrd
    from /usr to switch over to it. Because mdev is not a proper udev
    replacement, some additional scripting is needed. Busybox's modprobe
    can't work with dependencies well and we need the real kmod for that
(which is also good to guarantee have the same modprobe options set).
    Also, some other busybox commands are often lacking things such as
    loading a kernel module automatically and this has to be done
explicitly. We still set up dm-verity for /usr so that we have the same
    security properties (The code comes from the bootengine systemd
    generators we have and also covers the PXE boot with a squashfs /usr
passed from an additional cpio). The real initrd then reuses the mount
point for /usr, and loads any kernel modules and firmware that wasn't
    loaded already.
We also have to make the dependencies for parse-ip-for-networkd.service
a bit more explicit because the removal of the /sysusr mount in the full
    initrd exposed a race condition.

## How to use

With flatcar/scripts#3241

## Testing done

See above
The growth of binaries over time and the inclusion of new features
filled the available boot partition space, so that the kernel+initrd
almost couldn't fit twice anymore as required for updates. We employed
workarounds such as wrapper scripts for ignition, afterburn and other
binaries so that they are loaded from /usr. However, this was still not
enough and we would have to do the same for (network) kernel modules and
firmware. To avoid making this ever more complex we can use a dedicated
initrd focused on loading the full initrd from /usr and then this full
initrd can use dracut as before and even drop all the workarounds we
accumulated.

Generate a minimal initrd to use instead of the full bootengine initrd.
The bootengine initrd gets stored as squashfs on /usr. The minimal
initrd still includes the early_cpio for amd64 microcode updates.
We have a fixed list of modules or module directories to include, only
focused on loading /usr and any emergency console interaction. This
requires also checking for module dependencies to copy over.
The busybox, veritysetup, and kmod binaries are needed and get their
required libraries resolved and copied over. They are not static and
use shared libraries which should be ok for now. The resulting vmlinuz
file is 27 MB for amd64, down from ~60 MB, so we have enough room to
include more kernel modules and so on for the next years while we also
grow the boot partition and wait for users to redeploy until we can rely
on a larger boot partition and eventually drop the minimal initrd again.

Pulls in flatcar/bootengine#110 for the
minimal initrd script and flatcar/seismograph#12
for making the device mapper discovery for the "rootdev" command more
reliable.

This also requied a backport of a kernel patch from 2017 that exposes
the PARTUUID in the /sys uevent file.

Co-authored-by: James Le Cuirot <[email protected]>
Signed-off-by: Kai Lueke <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants