Skip to content

Sporadic runc exec failures #1884

@liron-l

Description

@liron-l

As of runc 1.0.0-rc5+dev we've started noticing increase rate in sporadic errors during k8s liveness proves.
When the test fails, it produces the following error:

false Error:  OCI runtime exec failed: exec failed: container_linux.go:348: starting container process caused "process_linux.go:138: adding pid 3101 to cgroups caused \"failed to write 3101 to cgroup.procs: write /sys/fs/cgroup/cpu,cpuacct/docker/1d7b29b96dfd7bc97c6a2d6cbff82b00509cdcc4dbf2ac72ef5dd2bef9db7067/cgroup.procs: invalid argument\"": unknown

This might be related to #1326 and moby/moby#31230, but without a root cause or a resolution we can't be sure.
When playing with runc code, it appears that the error disappears if we retry running cgroups.EnterPid on failure (which might indicate this is a transient/race issue).

We've found that it's easier to reproduce this issue if we run perf together with excessive docker exec load (we tried this on ubuntu xenial and other OSs).

  1. Run perf
perf trace --no-syscalls --event 'sched:*'
  1. Run the following code (similar to how k8s is invoking the liveness tests)
package main

import (
	"fmt"
	"github.com/fsouza/go-dockerclient"
	"time"
	"strings"
)

func main() {
	client, err := docker.NewClient("unix:///var/run/docker.sock")
	if err != nil {
		panic(err)
	}
	go func() {
		for {
			fmt.Println("Running", time.Now().String())
			time.Sleep(time.Hour)
		}
	}()
	const container = "test-container"
	client.RemoveContainer(docker.RemoveContainerOptions{
		ID:    container,
		Force: true,
	})
	_, err = client.CreateContainer(docker.CreateContainerOptions{
		Name: container,
		Config: &docker.Config{
			Image:        "busybox",
			Cmd:          []string{"sleep", "10000"},
			Tty:          true,
		},
	})
	if err != nil {
		panic(err)
	}
	if err := client.StartContainer(container, nil); err != nil {
		panic(err)
	}
	var tty bool

	for {
		tty = !tty
		exec, err := client.CreateExec(docker.CreateExecOptions{AttachStdout: true, AttachStderr: true, Tty: tty, Container: container, Cmd: []string{"echo", "A"}})
		if err != nil {
			panic(err)
		}
		m := logger{}
		err = client.StartExec(exec.ID, docker.StartExecOptions{Tty: tty, OutputStream: m, ErrorStream: m})
		if err != nil && strings.Contains(err.Error(), "invalid") {
			fmt.Println(err.Error())
			panic(err)
		} else if err != nil {
			fmt.Println("err", err, time.Now())
		}
	}
}

type logger struct{}
func (m logger) Write(p []byte) (n int, err error) {
	if strings.Contains(string(p), "cgroup") {
		fmt.Println(time.Now().String(), "Error:", string(p))
	}
	return len(p), nil
}

Runc version

runc version 1.0.0-rc5+dev
commit: 69663f0bd4b60df09991c08812a60108003fa340-dirty
spec: 1.0.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions