Skip to content

Use "systemd" cgroup driver as default instead of dockers' "cgroupfs" #490

@jjjms

Description

@jjjms

What would you like to be added:
EKS AMI by default to use "systemd" cgroups driver for both kubelet and docker.

Why is this needed:
Since AL2 is using systemd and used systemd driver for cgroups managing, kubelet and docker using cgroupfs would result in systemd unaware of the resource allocation by cgroupfs and could result in system crash in certain cases.

https://kubernetes.io/docs/setup/production-environment/container-runtimes/#cgroup-drivers

I have tested this by performing the following change in config files and adding the node back to master. In my testing the node was marked as Ready and I was able to create pods in this node.

### Cordoned a workernode:
k drain ip-192-168-0-171.us-west-2.compute.internal --ignore-daemonsets
node/ip-192-168-0-171.us-west-2.compute.internal already cordoned
WARNING: ignoring DaemonSet-managed Pods: kube-system/aws-node-jfnzw, kube-system/kube-proxy-sdx2q
node/ip-192-168-0-171.us-west-2.compute.internal drained


### Remove node entry from EKS so that node will be joined as a new entity altogether
k delete no ip-192-168-0-171.us-west-2.compute.internal


### Stopped kubelet:
[ec2-user@ip-192-168-0-171 ~]$ sudo systemctl stop kubelet docker

### Edited kubelet and docker config files to add systemd as cgroups manager:
[ec2-user@ip-192-168-0-171 ~]$ cat /etc/docker/daemon.json
{
  "bridge": "none",
  "exec-opts": ["native.cgroupdriver=systemd"],
  "log-driver": "json-file",
  "log-opts": {
    "max-size": "10m",
    "max-file": "10"
  },
  "live-restore": true,
  "max-concurrent-downloads": 10
}

### Since I'm using EKSCTl to create my cluster and nodegroup, I have modified the following file:
[root@ip-192-168-66-171 ec2-user]# cat /etc/eksctl/kubelet.yaml
address: 0.0.0.0
apiVersion: kubelet.config.k8s.io/v1beta1
authentication:
  anonymous:
    enabled: false
  webhook:
    cacheTTL: 2m0s
    enabled: true
  x509:
    clientCAFile: /etc/eksctl/ca.crt
authorization:
  mode: Webhook
  webhook:
    cacheAuthorizedTTL: 5m0s
    cacheUnauthorizedTTL: 30s
cgroupDriver: systemd
clusterDNS:
- 10.100.0.10
clusterDomain: cluster.local
featureGates:
  RotateKubeletServerCertificate: true
kind: KubeletConfiguration
kubeReserved:
  cpu: 70m
  ephemeral-storage: 1Gi
  memory: 200Mi
systemReserved:
  cpu: 1000m
  ephemeral-storage: 1Gi
  memory: 2Gi
serverTLSBootstrap: true

### Ran bootstrap.sh for the node to join master:
sudo /etc/eks/bootstrap.sh myclustername

### Found that new node came up healthy and was able to successfully run some nginx test pods on it:
k get no
NAME                                          STATUS   ROLES    AGE   VERSION
ip-192-168-0-171.us-west-2.compute.internal   Ready    <none>   10m   v1.15.11-eks-af3caf
ip-192-168-35-63.us-west-2.compute.internal   Ready    <none>   99m   v1.15.11-eks-af3caf

k get po -owide | grep 171 -c
8

Can we move into "systemd" driver for eks-optimized AMIs ?

Note: Found following GH Issue where setting kube-reserved/system-reserved memory was not taken into while calculating kubepods.slice "MemoryLimit". It was using node memory as its value.
kubernetes/kubernetes#88197

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions