-
Notifications
You must be signed in to change notification settings - Fork 80
Use a minimal initrd to switch to the full initrd stored in /usr #3241
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@chewi My idea is to load the "normal" initrd as loopback mount from |
|
Build action triggered: https://github.com/flatcar/scripts/actions/runs/18366905408 |
chewi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although I shared the same concern about losing functionality that we would have to reimplement, I hadn't yet identified any such functionality, so I'm not quite ready to throw out my proposal to go straight from tiny initrd to real /usr. I'd really like to know what your specific concerns are.
This is an interesting approach in any case. My own alternative would have been to mount /usr as an overlay with the initrd, deleting all the duplicate files from the initrd, but I hadn't fully thought it through.
Regarding verity, I think it only needs to be set up once. I didn't enable verity in my own experiment, but /sysroot/usr was simply a bind mount of /usr. I think that would still work with verity applied.
...ntainer/src/third_party/coreos-overlay/sys-kernel/coreos-kernel/coreos-kernel-6.12.44.ebuild
Outdated
Show resolved
Hide resolved
My intention was to keep most things untouched so that we can focus on the bare task of jumping into the regular initrd and avoid any risk of reimplementing all needed initrd logic. Things that should run from the initrd are: Ignition stages, hostname setup with afterburn (and basic network setup for them while they prepare the final network setup for the real system), setup of the |
|
Okay, but I wasn't proposing rewriting all that. Dracut puts those scripts into an initrd. I was just going to put them in /usr instead. It's more or less the same thing. It's the scripts that Dracut itself provides through its own modules that I was concerned about. |
|
The question is on how these things are started because they run in a context with dependencies. Having only one set of systemd units for both the initrd and the final system doesn't work if we want to make use of systemd in the initrd - it would run all enabled units under |
4306d75 to
0bfc20a
Compare
b250dfa to
647190c
Compare
647190c to
3561af4
Compare
777af55 to
55490bb
Compare
55490bb to
d1f0555
Compare
d1f0555 to
360aa17
Compare
360aa17 to
2323e9c
Compare
chewi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good! I'm tentatively approving this, just a couple of things to consider.
You can drop the sudo calls. RESTRICT="userpriv" means we're already running as root because Dracut needs it. If we didn't have that, sudo wouldn't work anyway.
I'm now somewhat confused about the compression. The kernel documentation says that you're only supposed to pass a single cpio to CONFIG_INITRAMFS_SOURCE. It also says the early cpio must not be compressed. We have been telling Dracut not to compress, so what we've been providing has been totally uncompressed. Copilot says that the kernel build isn't smart enough to only compress the main part, but it also says that the uncompressed early cpio rule only applies when you're passing the initramfs separately at boot time, not when it's built in. I suppose that must be true, since it appears that we've been compressing the whole thing via the kernel build. What you've proposed doesn't change that, but I thought it would be a good opportunity to write this down and check that we're all on the same page.
2323e9c to
f7eaf89
Compare
|
Ok, dropped the sudo calls. Yes, good question - I assume that the kernel build system knows whether the first cpio can be compressed or not. We could check with real hardware if we get the microcode update applied or not (with any Flatcar release as we didn't change this). |
|
Yes, probably best to check that the microcode actually works, not merely whether we've changed anything. The microcode was actually missing entirely until I fixed that a few months back! See #2837. |
|
Yes, I think I tested it but that was with the truncation - I don't remember the details (Edit: Tested now and it doesn't seem to work either). After the changes with the new lsinitrd extraction I didn't test it again and just see now that it doesn't seem to work. |
|
Confusing, it doesn't work on Alpha either but on Stable I've seen it applied. |
|
Microcode updating is also not working in Beta. Sounds like the behavior changed in the PR you linked. But Beta has your changes and it prints: |
|
So I guess the current way of passing it in does indeed not work and we need to change this. But not in this PR. |
|
I created a bugreport for it: flatcar/Flatcar#1909 |
The growth of binaries over time and the inclusion of new features
filled the available boot partition space, so that the kernel+initrd
almost couldn't fit twice anymore as required for updates. We employed
workarounds such as wrapper scripts for ignition, afterburn and other
binaries so that they are loaded from /usr. However, this was still not
enough and we would have to do the same for (network) kernel modules and
firmware. To avoid making this ever more complex we can use a dedicated
initrd focused on loading the full initrd from /usr and then this full
initrd can use dracut as before and even drop all the workarounds we
accumulated.
Introduce a busybox init script that prepares a minimal environment,
has debug toggles and an emergency shell, and only loads the real initrd
from /usr to switch over to it. Because mdev is not a proper udev
replacement, some additional scripting is needed. Busybox's modprobe
can't work with dependencies well and we need the real kmod for that
(which is also good to guarantee have the same modprobe options set).
Also, some other busybox commands are often lacking things such as
loading a kernel module automatically and this has to be done
explicitly. We still set up dm-verity for /usr so that we have the same
security properties (The code comes from the bootengine systemd
generators we have and also covers the PXE boot with a squashfs /usr
passed from an additional cpio). The real initrd then reuses the mount
point for /usr, and loads any kernel modules and firmware that wasn't
loaded already.
We also have to make the dependencies for parse-ip-for-networkd.service
a bit more explicit because the removal of the /sysusr mount in the full
initrd exposed a race condition.
## How to use
With flatcar/scripts#3241
## Testing done
See above
The growth of binaries over time and the inclusion of new features filled the available boot partition space, so that the kernel+initrd almost couldn't fit twice anymore as required for updates. We employed workarounds such as wrapper scripts for ignition, afterburn and other binaries so that they are loaded from /usr. However, this was still not enough and we would have to do the same for (network) kernel modules and firmware. To avoid making this ever more complex we can use a dedicated initrd focused on loading the full initrd from /usr and then this full initrd can use dracut as before and even drop all the workarounds we accumulated. Generate a minimal initrd to use instead of the full bootengine initrd. The bootengine initrd gets stored as squashfs on /usr. The minimal initrd still includes the early_cpio for amd64 microcode updates. We have a fixed list of modules or module directories to include, only focused on loading /usr and any emergency console interaction. This requires also checking for module dependencies to copy over. The busybox, veritysetup, and kmod binaries are needed and get their required libraries resolved and copied over. They are not static and use shared libraries which should be ok for now. The resulting vmlinuz file is 27 MB for amd64, down from ~60 MB, so we have enough room to include more kernel modules and so on for the next years while we also grow the boot partition and wait for users to redeploy until we can rely on a larger boot partition and eventually drop the minimal initrd again. Pulls in flatcar/bootengine#110 for the minimal initrd script and flatcar/seismograph#12 for making the device mapper discovery for the "rootdev" command more reliable. This also requied a backport of a kernel patch from 2017 that exposes the PARTUUID in the /sys uevent file. Co-authored-by: James Le Cuirot <[email protected]> Signed-off-by: Kai Lueke <[email protected]>
f7eaf89 to
5f1944b
Compare
The growth of binaries over time and the inclusion of new features
filled the available boot partition space, so that the kernel+initrd
almost couldn't fit twice anymore as required for updates. We employed
workarounds such as wrapper scripts for ignition, afterburn and other
binaries so that they are loaded from /usr. However, this was still not
enough and we would have to do the same for (network) kernel modules and
firmware. To avoid making this ever more complex we can use a dedicated
initrd focused on loading the full initrd from /usr and then this full
initrd can use dracut as before and even drop all the workarounds we
accumulated.
Generate a minimal initrd to use instead of the full bootengine initrd.
The bootengine initrd gets stored as squashfs on /usr. The minimal
initrd still includes the early_cpio for amd64 microcode updates.
We have a fixed list of modules or module directories to include, only
focused on loading /usr and any emergency console interaction. This
requires also checking for module dependencies to copy over.
The busybox, veritysetup, and kmod binaries are needed and get their
required libraries resolved and copied over. They are not static and
use shared libraries which should be ok for now. The resulting vmlinuz
file is 27 MB for amd64, down from ~60 MB, so we have enough room to
include more kernel modules and so on for the next years while we also
grow the boot partition and wait for users to redeploy until we can rely
on a larger boot partition and eventually drop the minimal initrd again.
Pulls in flatcar/bootengine#110 for the
minimal initrd script and flatcar/seismograph#12
for making the device mapper discovery for the "rootdev" command more
reliable.
This also requied a backport of a kernel patch from 2017 that exposes
the PARTUUID in the /sys uevent file.
How to use
Depends on flatcar/bootengine#110 and flatcar/seismograph#12
And flatcar/flatcar-build-scripts#174 for the image size report (but that only works when this is included in the first nightly)
Testing done
On all clouds (Equinix Metal arm64 was manually tested) - The build got gc'ed, a more limited new run is here
The bootengine.img initrd size/content reporting only works after the first nightly is built.
changelog/directory (user-facing change, bug fix, security fix, update)/bootand/usrsize, packages, list files for any missing binaries, kernel modules, config files, kernel modules, etc.