Yes, we reproduce it with go1.11.2 and go1.13beta1.
We are running docker test on arm64/aarch64 physical machine with kernel 4.19.36, docker containers configured with health-cmd, so dockerd will call exec command periodically. Test framework also execute docker run/stop commands to docker containers. The core dump happens on containerd-shim and runc.(1-5 core dumps per day.)
containerd-shim core with go1.13beta1.
Core was generated by `containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.c'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x00000000000554cc in runtime.sigtrampgo (sig=<optimized out>, info=0x0, ctx=0x0) at /usr/local/go/src/runtime/signal_unix.go:308
308 if sp < g.m.gsignal.stack.lo || sp >= g.m.gsignal.stack.hi {
Dump of assembler code for function runtime.sigtrampgo:
0x0000000000055480 <+0>: str x30, [sp, #-192]!
0x0000000000055484 <+4>: stur x29, [sp, #-8]
0x0000000000055488 <+8>: sub x29, sp, #0x8
0x000000000005548c <+12>: ldr w0, [sp, #200]
0x0000000000055490 <+16>: str w0, [sp, #8]
0x0000000000055494 <+20>: ldr x0, [sp, #208]
0x0000000000055498 <+24>: str x0, [sp, #16]
0x000000000005549c <+28>: ldr x1, [sp, #216]
0x00000000000554a0 <+32>: str x1, [sp, #24]
0x00000000000554a4 <+36>: bl 0x55f20 <runtime.sigfwdgo>
0x00000000000554a8 <+40>: ldrb w0, [sp, #32]
0x00000000000554ac <+44>: cbnz x0, 0x55758 <runtime.sigtrampgo+728>
0x00000000000554b0 <+48>: mov x0, x28
0x00000000000554b4 <+52>: cbz x0, 0x556e0 <runtime.sigtrampgo+608>
0x00000000000554b8 <+56>: str x0, [sp, #96]
0x00000000000554bc <+60>: stp xzr, xzr, [sp, #56]
0x00000000000554c0 <+64>: stp xzr, xzr, [sp, #72]
0x00000000000554c4 <+68>: str xzr, [sp, #88]
0x00000000000554c8 <+72>: ldr x1, [x0, #48]
=> 0x00000000000554cc <+76>: ldr x2, [x1, #80]
0x00000000000554d0 <+80>: add x3, sp, #0xc8
0x00000000000554d4 <+84>: ldr x4, [x2]
0x00000000000554d8 <+88>: cmp x3, x4
---
i reg
x0 0x4000000480 274877908096
x1 0xd 13
x2 0xb 11
x3 0x1 1
x4 0x69b0a0 6926496
x5 0x989680 10000000
x6 0xa000000 167772160
x7 0x18 24
x8 0x65 101
x9 0x3938700000000 1006632960000000
x10 0x500ad04848b2 88007374293170
x11 0xffffffffa2f24bde -1561179170
x12 0x43b20 277280
x13 0x178 376
x14 0xb 11
x15 0x8 8
x16 0xffffcd0cbf68 281474121908072
x17 0xffffcd0cbf48 281474121908040
x18 0x0 0
x19 0x0 0
x20 0x4000043ee0 274878185184
x21 0xd 13
x22 0x0 0
x23 0x0 0
x24 0x0 0
x25 0x0 0
x26 0x4000043dd0 274878184912
x27 0x69a952 6924626
x28 0x4000000480 274877908096
x29 0x400003cc08 274878155784
x30 0x554a8 349352
sp 0x400003cc10 0x400003cc10
pc 0x554cc 0x554cc <runtime.sigtrampgo+76>
cpsr 0x80000000 [ EL=0 N ]
fpsr 0x10 16
fpcr 0x0 0
---
(gdb) p /x $x0
$1 = 0x4000000480
(gdb) p g
$2 = (runtime.g *) 0x4000000480
p /x g.m.gsignal.stack
$3 = {lo = 0x4000036000, hi = 0x400003e000}
(gdb) p *g
$4 = {stack = {lo = 274878177280, hi = 274878185472}, stackguard0 = 274878178160, stackguard1 = 274878178160, _panic = 0x0, _defer = 0x0, m = 0x4000034000, sched = {sp = 274878185408,
pc = 277388, g = 274877908096, ctxt = 0x0, ret = 0, lr = 0, bp = 0}, syscallsp = 0, syscallpc = 0, stktopsp = 0, param = 0x0, atomicstatus = 0, stackLock = 0, goid = 0, schedlink = 0,
waitsince = 0, waitreason = 0 '\000', preempt = false, paniconfault = false, preemptscan = false, gcscandone = false, gcscanvalid = false, throwsplit = false, raceignore = 0 '\000',
sysblocktraced = false, sysexitticks = 0, traceseq = 0, tracelastp = 0, lockedm = 0, sig = 0, writebuf = {array = 0x0, len = 0, cap = 0}, sigcode0 = 0, sigcode1 = 0, sigpc = 0, gopc = 0,
ancestors = 0x0, startpc = 0, racectx = 0, waiting = 0x0, cgoCtxt = {array = 0x0, len = 0, cap = 0}, labels = 0x0, timer = 0x0, selectDone = 0, gcAssistBytes = 0}
(gdb) p *g.m
$5 = {g0 = 0x4000000480, morebuf = {sp = 0, pc = 0, g = 0, ctxt = 0x0, ret = 0, lr = 0, bp = 0}, divmod = 0, procid = 32875, gsignal = 0x4000000300, goSigStack = {stack = {lo = 0, hi = 0},
stackguard0 = 0, stackguard1 = 0, stktopsp = 0}, sigmask = {0, 0}, tls = {0, 0, 0, 0, 0, 0}, mstartfn = {void (void)} 0x4000034000, curg = 0x0, caughtsig = 0, p = 0, nextp = 0,
oldp = 0, id = 1, mallocing = 0, throwing = 0, preemptoff = 0x0 "", locks = 0, dying = 0, profilehz = 0, spinning = false, blocked = false, newSigstack = true, printlock = 0 '\000',
incgo = false, freeWait = 0, fastrand = {1597334677, 4294407959}, needextram = false, traceback = 0 '\000', ncgocall = 0, ncgo = 0, cgoCallersUse = 0, cgoCallers = 0x0, park = {key = 0},
alllink = 0x67ffe0 <runtime.m0>, schedlink = 0, mcache = 0x0, lockedg = 0, createstack = {0 <repeats 32 times>}, lockedExt = 0, lockedInt = 0, nextwaitm = 0,
waitunlockf = {void (runtime.g *, void *, bool *)} 0x4000034000, waitlock = 0x0, waittraceev = 0 '\000', waittraceskip = 0, startingtrace = false, syscalltick = 0, thread = 0,
freelink = 0x0, libcall = {fn = 0, n = 0, args = 0, r1 = 0, r2 = 0, err = 0}, libcallpc = 0, libcallsp = 0, libcallg = 0, syscall = {fn = 0, n = 0, args = 0, r1 = 0, r2 = 0, err = 0},
vdsoSP = 0, vdsoPC = 307440, dlogPerM = {<No data fields>}, mOS = {<No data fields>}}
--- drop futexsleep thread info, left two useful stack.
Thread 8 (LWP 32881):
#0 syscall.Syscall6 () at /usr/local/go/src/syscall/asm_linux_arm64.s:44
#1 0x00000000002dc864 in github.com/containerd/containerd/vendor/golang.org/x/sys/unix.EpollWait (epfd=9, events=..., msec=-1, n=0, err=...)
at /root/containerd-1.2.0/.gopath/src/github.com/containerd/containerd/vendor/golang.org/x/sys/unix/zsyscall_linux_arm64.go:1499
#2 0x00000000002ddae4 in github.com/containerd/containerd/vendor/github.com/containerd/console.(*Epoller).Wait (e=0x400007e080, ~r0=...)
at /root/containerd-1.2.0/.gopath/src/github.com/containerd/containerd/vendor/github.com/containerd/console/console_linux.go:110
#3 0x000000000006dda4 in runtime.goexit () at /usr/local/go/src/runtime/asm_arm64.s:1128
Thread 7 (LWP 32882):
#0 runtime.epollwait () at /usr/local/go/src/runtime/sys_linux_arm64.s:596
#1 0x000000000003ca1c in runtime.netpoll (block=true, ~r1=...) at /usr/local/go/src/runtime/netpoll_epoll.go:71
#2 0x000000000004617c in runtime.findrunnable (gp=<optimized out>, inheritTime=<optimized out>) at /usr/local/go/src/runtime/proc.go:2372
#3 0x0000000000046e88 in runtime.schedule () at /usr/local/go/src/runtime/proc.go:2524
#4 0x0000000000043c40 in runtime.mstart1 () at /usr/local/go/src/runtime/proc.go:1208
#5 0x0000000000043b8c in runtime.mstart () at /usr/local/go/src/runtime/proc.go:1167
#6 0x000000000006ec80 in runtime.clone () at /usr/local/go/src/runtime/sys_linux_arm64.s:525
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
It looks like X0 is valid g struct, and g.m.gsignal.stack in memory is fine. but X1 which load from g.m is 0xd the bad one.
runc crash the same position, but X1=0. The runc core with go1.11.2.
What version of Go are you using (
go version)?Does this issue reproduce with the latest release?
Yes, we reproduce it with go1.11.2 and go1.13beta1.
What operating system and processor architecture are you using (
go env)?go envOutputWhat did you do?
We are running docker test on arm64/aarch64 physical machine with kernel 4.19.36, docker containers configured with health-cmd, so dockerd will call exec command periodically. Test framework also execute docker run/stop commands to docker containers. The core dump happens on containerd-shim and runc.(1-5 core dumps per day.)
What did you expect to see?
no crash in runtime
What did you see instead?
containerd-shim core with go1.13beta1.
It looks like X0 is valid g struct, and g.m.gsignal.stack in memory is fine. but X1 which load from g.m is 0xd the bad one.
runc core dump is the same:
runc crash the same position, but X1=0. The runc core with go1.11.2.