Skip to content

runtime: linux/arm64 crash in runtime.sigtrampgo #32912

@jing-rui

Description

@jing-rui

What version of Go are you using (go version)?

$ go version
go version go1.13beta1 linux/arm64

Does this issue reproduce with the latest release?

Yes, we reproduce it with go1.11.2 and go1.13beta1.

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="arm64"
GOBIN=""
GOCACHE="/root/.cache/go-build"
GOENV="/root/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="arm64"
GOHOSTOS="linux"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/root/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_arm64"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build139440885=/tmp/go-build -gno-record-gcc-switches"

What did you do?

We are running docker test on arm64/aarch64 physical machine with kernel 4.19.36, docker containers configured with health-cmd, so dockerd will call exec command periodically. Test framework also execute docker run/stop commands to docker containers. The core dump happens on containerd-shim and runc.(1-5 core dumps per day.)

What did you expect to see?

no crash in runtime

What did you see instead?

containerd-shim core with go1.13beta1.

Core was generated by `containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.c'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00000000000554cc in runtime.sigtrampgo (sig=<optimized out>, info=0x0, ctx=0x0) at /usr/local/go/src/runtime/signal_unix.go:308
308             if sp < g.m.gsignal.stack.lo || sp >= g.m.gsignal.stack.hi {
Dump of assembler code for function runtime.sigtrampgo:
   0x0000000000055480 <+0>:     str     x30, [sp, #-192]!
   0x0000000000055484 <+4>:     stur    x29, [sp, #-8]
   0x0000000000055488 <+8>:     sub     x29, sp, #0x8
   0x000000000005548c <+12>:    ldr     w0, [sp, #200]
   0x0000000000055490 <+16>:    str     w0, [sp, #8]
   0x0000000000055494 <+20>:    ldr     x0, [sp, #208]
   0x0000000000055498 <+24>:    str     x0, [sp, #16]
   0x000000000005549c <+28>:    ldr     x1, [sp, #216]
   0x00000000000554a0 <+32>:    str     x1, [sp, #24]
   0x00000000000554a4 <+36>:    bl      0x55f20 <runtime.sigfwdgo>
   0x00000000000554a8 <+40>:    ldrb    w0, [sp, #32]
   0x00000000000554ac <+44>:    cbnz    x0, 0x55758 <runtime.sigtrampgo+728>
   0x00000000000554b0 <+48>:    mov     x0, x28
   0x00000000000554b4 <+52>:    cbz     x0, 0x556e0 <runtime.sigtrampgo+608>
   0x00000000000554b8 <+56>:    str     x0, [sp, #96]
   0x00000000000554bc <+60>:    stp     xzr, xzr, [sp, #56]
   0x00000000000554c0 <+64>:    stp     xzr, xzr, [sp, #72]
   0x00000000000554c4 <+68>:    str     xzr, [sp, #88]
   0x00000000000554c8 <+72>:    ldr     x1, [x0, #48]
=> 0x00000000000554cc <+76>:    ldr     x2, [x1, #80]
   0x00000000000554d0 <+80>:    add     x3, sp, #0xc8
   0x00000000000554d4 <+84>:    ldr     x4, [x2]
   0x00000000000554d8 <+88>:    cmp     x3, x4
---
i reg
x0             0x4000000480        274877908096
x1             0xd                 13
x2             0xb                 11
x3             0x1                 1
x4             0x69b0a0            6926496
x5             0x989680            10000000
x6             0xa000000           167772160
x7             0x18                24
x8             0x65                101
x9             0x3938700000000     1006632960000000
x10            0x500ad04848b2      88007374293170
x11            0xffffffffa2f24bde  -1561179170
x12            0x43b20             277280
x13            0x178               376
x14            0xb                 11
x15            0x8                 8
x16            0xffffcd0cbf68      281474121908072
x17            0xffffcd0cbf48      281474121908040
x18            0x0                 0
x19            0x0                 0
x20            0x4000043ee0        274878185184
x21            0xd                 13
x22            0x0                 0
x23            0x0                 0
x24            0x0                 0
x25            0x0                 0
x26            0x4000043dd0        274878184912
x27            0x69a952            6924626
x28            0x4000000480        274877908096
x29            0x400003cc08        274878155784
x30            0x554a8             349352
sp             0x400003cc10        0x400003cc10
pc             0x554cc             0x554cc <runtime.sigtrampgo+76>
cpsr           0x80000000          [ EL=0 N ]
fpsr           0x10                16
fpcr           0x0                 0
---
(gdb) p /x $x0
$1 = 0x4000000480
(gdb) p g
$2 = (runtime.g *) 0x4000000480
p /x g.m.gsignal.stack
$3 = {lo = 0x4000036000, hi = 0x400003e000}


(gdb) p *g
$4 = {stack = {lo = 274878177280, hi = 274878185472}, stackguard0 = 274878178160, stackguard1 = 274878178160, _panic = 0x0, _defer = 0x0, m = 0x4000034000, sched = {sp = 274878185408,
    pc = 277388, g = 274877908096, ctxt = 0x0, ret = 0, lr = 0, bp = 0}, syscallsp = 0, syscallpc = 0, stktopsp = 0, param = 0x0, atomicstatus = 0, stackLock = 0, goid = 0, schedlink = 0,
  waitsince = 0, waitreason = 0 '\000', preempt = false, paniconfault = false, preemptscan = false, gcscandone = false, gcscanvalid = false, throwsplit = false, raceignore = 0 '\000',
  sysblocktraced = false, sysexitticks = 0, traceseq = 0, tracelastp = 0, lockedm = 0, sig = 0, writebuf = {array = 0x0, len = 0, cap = 0}, sigcode0 = 0, sigcode1 = 0, sigpc = 0, gopc = 0,
  ancestors = 0x0, startpc = 0, racectx = 0, waiting = 0x0, cgoCtxt = {array = 0x0, len = 0, cap = 0}, labels = 0x0, timer = 0x0, selectDone = 0, gcAssistBytes = 0}
(gdb) p *g.m
$5 = {g0 = 0x4000000480, morebuf = {sp = 0, pc = 0, g = 0, ctxt = 0x0, ret = 0, lr = 0, bp = 0}, divmod = 0, procid = 32875, gsignal = 0x4000000300, goSigStack = {stack = {lo = 0, hi = 0},
    stackguard0 = 0, stackguard1 = 0, stktopsp = 0}, sigmask = {0, 0}, tls = {0, 0, 0, 0, 0, 0}, mstartfn = {void (void)} 0x4000034000, curg = 0x0, caughtsig = 0, p = 0, nextp = 0,
  oldp = 0, id = 1, mallocing = 0, throwing = 0, preemptoff = 0x0 "", locks = 0, dying = 0, profilehz = 0, spinning = false, blocked = false, newSigstack = true, printlock = 0 '\000',
  incgo = false, freeWait = 0, fastrand = {1597334677, 4294407959}, needextram = false, traceback = 0 '\000', ncgocall = 0, ncgo = 0, cgoCallersUse = 0, cgoCallers = 0x0, park = {key = 0},
  alllink = 0x67ffe0 <runtime.m0>, schedlink = 0, mcache = 0x0, lockedg = 0, createstack = {0 <repeats 32 times>}, lockedExt = 0, lockedInt = 0, nextwaitm = 0,
  waitunlockf = {void (runtime.g *, void *, bool *)} 0x4000034000, waitlock = 0x0, waittraceev = 0 '\000', waittraceskip = 0, startingtrace = false, syscalltick = 0, thread = 0,
  freelink = 0x0, libcall = {fn = 0, n = 0, args = 0, r1 = 0, r2 = 0, err = 0}, libcallpc = 0, libcallsp = 0, libcallg = 0, syscall = {fn = 0, n = 0, args = 0, r1 = 0, r2 = 0, err = 0},
  vdsoSP = 0, vdsoPC = 307440, dlogPerM = {<No data fields>}, mOS = {<No data fields>}}

--- drop futexsleep thread info, left two useful stack.
Thread 8 (LWP 32881):
#0  syscall.Syscall6 () at /usr/local/go/src/syscall/asm_linux_arm64.s:44
#1  0x00000000002dc864 in github.com/containerd/containerd/vendor/golang.org/x/sys/unix.EpollWait (epfd=9, events=..., msec=-1, n=0, err=...)
    at /root/containerd-1.2.0/.gopath/src/github.com/containerd/containerd/vendor/golang.org/x/sys/unix/zsyscall_linux_arm64.go:1499
#2  0x00000000002ddae4 in github.com/containerd/containerd/vendor/github.com/containerd/console.(*Epoller).Wait (e=0x400007e080, ~r0=...)
    at /root/containerd-1.2.0/.gopath/src/github.com/containerd/containerd/vendor/github.com/containerd/console/console_linux.go:110
#3  0x000000000006dda4 in runtime.goexit () at /usr/local/go/src/runtime/asm_arm64.s:1128

Thread 7 (LWP 32882):
#0  runtime.epollwait () at /usr/local/go/src/runtime/sys_linux_arm64.s:596
#1  0x000000000003ca1c in runtime.netpoll (block=true, ~r1=...) at /usr/local/go/src/runtime/netpoll_epoll.go:71
#2  0x000000000004617c in runtime.findrunnable (gp=<optimized out>, inheritTime=<optimized out>) at /usr/local/go/src/runtime/proc.go:2372
#3  0x0000000000046e88 in runtime.schedule () at /usr/local/go/src/runtime/proc.go:2524
#4  0x0000000000043c40 in runtime.mstart1 () at /usr/local/go/src/runtime/proc.go:1208
#5  0x0000000000043b8c in runtime.mstart () at /usr/local/go/src/runtime/proc.go:1167
#6  0x000000000006ec80 in runtime.clone () at /usr/local/go/src/runtime/sys_linux_arm64.s:525
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

It looks like X0 is valid g struct, and g.m.gsignal.stack in memory is fine. but X1 which load from g.m is 0xd the bad one.

runc core dump is the same:

Core was generated by `runc --root /var/run/docker/runtime-runc/moby --log /run/docker/containerd/daem'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x000000000043bfe0 in runtime.sigtrampgo ()
(gdb) thread apply all bt (drop runtime.usleep)

Thread 6 (Thread 0xffffa8b85070 (LWP 7652)):
#0  0x00000000004721c4 in syscall.Syscall ()
#1  0x000000000046f5d4 in syscall.read ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
   0x000000000043bfd8 <+72>:    str     xzr, [sp, #96]
   0x000000000043bfdc <+76>:    ldr     x1, [x0, #48]
=> 0x000000000043bfe0 <+80>:    ldr     x2, [x1, #80]      the same position.
(gdb) i reg
x0             0x4000000d80        274877910400
x1             0x0                 0
x2             0x0                 0
x3             0x1                 1
x4             0xad4900            11356416
x5             0x0                 0
x6             0x7                 7
x7             0x2                 2
x8             0x62                98
x9             0x2                 2
x10            0x0                 0
x11            0x0                 0
x12            0x0                 0
x13            0x0                 0
x14            0x0                 0
x15            0x1                 1
x16            0x4000056d70        274878262640
x17            0x6adc90            7003280
x18            0x270f              9999
x19            0x42c5e0            4376032
x20            0x451d60            4529504
x21            0xffffec93f44e      281474650862670
x22            0xffffec93f44f      281474650862671
x23            0xad1fb0            11345840
x24            0xffffec93f530      281474650862896
x25            0x1000              4096
x26            0x7c2780            8136576
x27            0xad417a            11354490
x28            0x4000000d80        274877910400
x29            0x4000057ff0        274878267376
x30            0x43bfbc            4439996
sp             0x4000056cd0        0x4000056cd0
pc             0x43bfe0            0x43bfe0 <runtime.sigtrampgo+80>
cpsr           0x80000000          [ EL=0 N ]
fpsr           0x10                16
fpcr           0x0                 0

runc crash the same position, but X1=0. The runc core with go1.11.2.

Metadata

Metadata

Assignees

No one assigned

    Labels

    FrozenDueToAgeNeedsInvestigationSomeone must examine and confirm this is a valid issue and not a duplicate of an existing one.

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions