Skip to content

Tight container limits may cause "read init-p: connection reset by peer" #1914

@danail-branekov

Description

@danail-branekov

Steps to reproduce:

  1. Create a container via runc create <id>
  2. Set the pid limit of the container to 1 via echo 1 > /sys/fs/cgroup/pids/.../<id>/pids.max
  3. Run a process: runc exec <id> /bin/echo hi. The following error occurs:
runtime/cgo: pthread_create failed: Resource temporarily unavailable
runtime/cgo: pthread_create failed: Resource temporarily unavailable
SIGABRT: abort
PC=0x7f4cadc230bb m=3 sigcode=18446744073709551610

goroutine 0 [idle]:
runtime: unknown pc 0x7f4cadc230bb
stack: frame={sp:0x7f4cad3e9830, fp:0x0} stack=[0x7f4cacbea2a0,0x7f4cad3e9ea0)
00007f4cad3e9730:  0000000000000000  0000000000000000
00007f4cad3e9740:  0000000000000000  0000000000000000
00007f4cad3e9750:  0000000000000000  0000000000000000
00007f4cad3e9760:  0000000000000000  0000000000000000
00007f4cad3e9770:  0000000000000000  0000000000000000
00007f4cad3e9780:  0000000000000000  0000000000000000
00007f4cad3e9790:  0000000000000000  0000000000000000
00007f4cad3e97a0:  0000000000000000  0000000000000000
00007f4cad3e97b0:  0000000000000000  0000000000000000
00007f4cad3e97c0:  0000000000000000  0000000000000000
00007f4cad3e97d0:  0000000000000000  0000000000000000
00007f4cad3e97e0:  0000000000000000  0000000000000000
00007f4cad3e97f0:  0000000000000000  0000000000000000
00007f4cad3e9800:  0000000000000000  0000000000000000
00007f4cad3e9810:  0000000000000000  0000000000000000
00007f4cad3e9820:  0000000000000000  0000000000000000
00007f4cad3e9830: <0000000000000000  0000000000000000
00007f4cad3e9840:  0000000000000000  0000000000000000
00007f4cad3e9850:  0000000000000000  0000000000000000
00007f4cad3e9860:  0000000000000000  0000000000000000
00007f4cad3e9870:  0000000000000000  0000000000000000
00007f4cad3e9880:  0000000000000000  0000000000000000
00007f4cad3e9890:  0000000000000000  0000000000000000
00007f4cad3e98a0:  0000000000000000  0000000000000000
00007f4cad3e98b0:  fffffffe7fffffff  ffffffffffffffff
00007f4cad3e98c0:  ffffffffffffffff  ffffffffffffffff
00007f4cad3e98d0:  ffffffffffffffff  ffffffffffffffff
00007f4cad3e98e0:  ffffffffffffffff  ffffffffffffffff
00007f4cad3e98f0:  ffffffffffffffff  ffffffffffffffff
00007f4cad3e9900:  ffffffffffffffff  ffffffffffffffff
00007f4cad3e9910:  ffffffffffffffff  ffffffffffffffff
00007f4cad3e9920:  ffffffffffffffff  ffffffffffffffff
runtime: unknown pc 0x7f4cadc230bb
stack: frame={sp:0x7f4cad3e9830, fp:0x0} stack=[0x7f4cacbea2a0,0x7f4cad3e9ea0)
00007f4cad3e9730:  0000000000000000  0000000000000000
00007f4cad3e9740:  0000000000000000  0000000000000000
00007f4cad3e9750:  0000000000000000  0000000000000000
00007f4cad3e9760:  0000000000000000  0000000000000000
00007f4cad3e9770:  0000000000000000  0000000000000000
00007f4cad3e9780:  0000000000000000  0000000000000000
00007f4cad3e9790:  0000000000000000  0000000000000000
00007f4cad3e97a0:  0000000000000000  0000000000000000
00007f4cad3e97b0:  0000000000000000  0000000000000000
00007f4cad3e97c0:  0000000000000000  0000000000000000
00007f4cad3e97d0:  0000000000000000  0000000000000000
00007f4cad3e97e0:  0000000000000000  0000000000000000
00007f4cad3e97f0:  0000000000000000  0000000000000000
00007f4cad3e9800:  0000000000000000  0000000000000000
00007f4cad3e9810:  0000000000000000  0000000000000000
00007f4cad3e9820:  0000000000000000  0000000000000000
00007f4cad3e9830: <0000000000000000  0000000000000000
00007f4cad3e9840:  0000000000000000  0000000000000000
00007f4cad3e9850:  0000000000000000  0000000000000000
00007f4cad3e9860:  0000000000000000  0000000000000000
00007f4cad3e9870:  0000000000000000  0000000000000000
00007f4cad3e9880:  0000000000000000  0000000000000000
00007f4cad3e9890:  0000000000000000  0000000000000000
00007f4cad3e98a0:  0000000000000000  0000000000000000
00007f4cad3e98b0:  fffffffe7fffffff  ffffffffffffffff
00007f4cad3e98c0:  ffffffffffffffff  ffffffffffffffff
00007f4cad3e98d0:  ffffffffffffffff  ffffffffffffffff
00007f4cad3e98e0:  ffffffffffffffff  ffffffffffffffff
00007f4cad3e98f0:  ffffffffffffffff  ffffffffffffffff
00007f4cad3e9900:  ffffffffffffffff  ffffffffffffffff
00007f4cad3e9910:  ffffffffffffffff  ffffffffffffffff
00007f4cad3e9920:  ffffffffffffffff  ffffffffffffffff

goroutine 1 [runnable, locked to thread]:
runtime.chanrecv(0xc42005a000, 0x0, 0x1, 0x5653f5a1b1fa)
        /usr/local/go/src/runtime/chan.go:415 +0x6a0
runtime.chanrecv1(0xc42005a000, 0x0)
        /usr/local/go/src/runtime/chan.go:400 +0x2b
runtime.gcenable()
        /usr/local/go/src/runtime/mgc.go:217 +0x71
runtime.main()
        /usr/local/go/src/runtime/proc.go:161 +0x126
runtime.goexit()
        /usr/local/go/src/runtime/asm_amd64.s:2361 +0x1

rax    0x0
rbx    0x7f4cadfc7800
rcx    0x7f4cadc230bb
rdx    0x0
rdi    0x2
rsi    0x7f4cad3e9830
rbp    0x5653f5d15152
rsp    0x7f4cad3e9830
r8     0x0
r9     0x7f4cad3e9830
r10    0x8
r11    0x246
r12    0x5653f76c8480
r13    0xf1
r14    0x11
r15    0x0
rip    0x7f4cadc230bb
rflags 0x246
cs     0x33
fs     0x0
gs     0x0
exec failed: container_linux.go:336: starting container process caused "read init-p: connection reset by peer"

After some debugging we found out what causes this error:

  1. Runc starts the new process without waiting for it here
  2. In parallel runc would put the process into the limited cgroup here
  3. The conditions above create a race where the process can join the restricted cgroup too early while the Golang runtime is initializing and creating its internal threads

In order to prove that we added a sleep of 100ms before the process is joined to the cgroup and this significantly reduced the failure rate. Removing the code that joins the cgroup "fixed" it entirely.

We realise that such a tight limit has quite a limited practical use but we wanted to share the knowledge with the community. We believe that this error may also occur when exceeding any container cgroup limit (e.g. memory, cpu, pids).

Cheers, CF Garden Team

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions