Set cgroupdriver to systemd on GPU nodes#3007
Merged
Callisto13 merged 2 commits intoeksctl-io:masterfrom Jan 4, 2021
Merged
Conversation
aclevername
approved these changes
Jan 4, 2021
Contributor
aclevername
left a comment
There was a problem hiding this comment.
LGTM (I haven't tested the sed command myself)
An integration test would be a great way to ensure we don't break this again in the future! 😄
Contributor
Author
|
👍 I will do the integration test separately bc I want to get a release out today |
michaelbeaumont
approved these changes
Jan 4, 2021
Less neat to read maybe, but removes an unnecessary outer loop when it comes to processing the list. Also, only write docker daemon.json when not using GPU instance type. The bootstrap script in those AMIs will remove the daemon file, so there is no reason to write it.
0c7ffbe to
31b371a
Compare
cPu1
approved these changes
Jan 4, 2021
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Fixes #3005. Maybe.
#2962 set the cgroupdriver of both the docker daemon and the kubelet to systemd. Unfortunately, the GPU AMIs remove the
/etc/docker/daemon.jsonfile* and set things via flags. This meant that0.35.0broke things for people who want to create new node groups with GPU instances: the kubelet was set to usesystemd, while the docker daemon (that file removed) defaulted tocgroupfs.* annoying.
There are 2 workarounds (in the form of
preBootstrapCommands) for users. This PR is a quick hack to get things working again, while we hopefully figure out something better OR wait for AWS to default tosystemdin those accelerated AMIs (the standard ones are about to do this).Alternatives to simply sedding the flag into that
OPTIONSstring:/etc/sysconfig/dockerfile/etc/systemd/system/docker.service.d/override.confsed🤷♀️The first 2 run the risk of getting out of step with the originals. The 3rd may be more trouble than it is worth.
I have asked AWS to ensure those accelerated AMIs set the cgroup driver to
systemdby default, so we could just leave the hack in while we wait. Don't know how long that will be tho... (The standard AMIs are about to dosystemdby default, but a different builder is used for GPU ones.)I am open to any and all suggestions.
More information can be found here #3005 (comment)
I am gonna add an integration test as part of regression testing.
(I also simplified some other stuff around writing config files, but can take or leave that.)
Checklist
README.md, or theuserdocsdirectory)area/nodegroup), target version (e.g.version/0.12.0) and kind (e.g.kind/improvement)