bug(runc): AmazonLinux2 runc 1.3.2 amazon-eks-node-1.32-v20251103 containers fail to start with "failed to write cpu.cfs_quota_us: invalid argument"

**What happened**:

After upgrading nodes to `amazon-eks-node-1.32-v20251103`, containers with the limits set to e.g.

```yaml
        resources:
          limits:
            cpu: 125m
            memory: 32Mi
          requests:
            cpu: 125m
            memory: 32Mi
```
fail to start with the following error:

```
Error: failed to create containerd task: failed to create shim task:
OCI runtime create failed: runc create failed: unable to start container process:
error during container init: error setting cgroup config for procHooks process:
failed to write "13000": write /sys/fs/cgroup/cpu,cpuacct/kubepods.slice/kubepods-podaf0df499_2151_4b3d_b80d_161533ca5b8e.slice/cri-containerd-xxx-xxx-frontend.scope/cpu.cfs_quota_us:
invalid argument: unknown
```

The same workload runs correctly on the previous AMI `amazon-eks-node-1.32-v20251023`.

**What you expected to happen**:

Pods with fractional CPU limits (e.g. `cpu: 120m` or `cpu: 125m`) should start normally as they did on earlier AMIs.

**How to reproduce it (as minimally and precisely as possible)**:

1. Launch a node with AMI `amazon-eks-node-1.32-v20251103`.
2. Deploy any Pod with fractional CPU limits:
   ```yaml
   resources:
     limits:
       cpu: 125m
       memory: 32Mi
     requests:
       cpu: 125m
       memory: 32Mi
   ```
3. Observe the Pod fail to start with the `cpu.cfs_quota_us: invalid argument` error.
4. Run the same manifest on a node using AMI `amazon-eks-node-1.32-v20251023` - Pod starts successfully.

**Environment**:
- AWS Region: eu-central-1
- Instance Type(s): t3a.xlarge
- Cluster Kubernetes version: 1.32
- Node Kubernetes version: 1.32
- AMI Version:
  -  **Broken:** `amazon-eks-node-1.32-v20251103`
    ```
    Kernel: 5.10.245-241.976.amzn2
    containerd: 1.7.27
    runc: 1.3.2
    cgroup fs: tmpfs (v1)
    ```
  -  **Working:** `amazon-eks-node-1.32-v20251023`
    ```
    Kernel: 5.10.244-240.970.amzn2
    containerd: 1.7.27
    runc: 1.3.1
    cgroup fs: tmpfs (v1)
    ```

**Additional context**:

This appears to be caused by a regression in `runc 1.3.2`, which introduces stricter validation of cgroup v1 CPU quotas: https://github.com/opencontainers/runc/releases/tag/v1.3.2 (al2 uses f cgroup v1)

**Workarounds**:
- Downgrade `runc` to `1.3.1`
- Or round CPU limits to cleaner values (`100m`, `250m`, etc.)
- Use old AMI Image amazon-eks-node-1.32-v20251023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug(runc): AmazonLinux2 runc 1.3.2 amazon-eks-node-1.32-v20251103 containers fail to start with "failed to write cpu.cfs_quota_us: invalid argument" #2498

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

bug(runc): AmazonLinux2 runc 1.3.2 amazon-eks-node-1.32-v20251103 containers fail to start with "failed to write cpu.cfs_quota_us: invalid argument" #2498

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions