-
Notifications
You must be signed in to change notification settings - Fork 18.5k
Closed
Labels
Milestone
Description
The SudoG recycling code is very sloppy about zeroing fields that are no longer needed.
We have already fixed elem and selectdone, but next and waitlink are basically never
cleared either. This leads in real programs to an arbitrarily large number of SudoG's
leaking.
Specifically, the waitlink field is set during select but not cleared until the SudoG is
next reused for a channel send/recv or select. Uses of a SudoG for sync.Cond don't set
or use waitlink at all. Similarly, the next field is left intact on return from
releaseSudog, and worse it is left intact when the cache head is nil'ed out.
The effect of this is that after a select the entries added to the SudoG cache have a
.waitlink list pointing in the opposite direction of the .next list in the cache.
Operations by sync.Cond can reorder entries on the cache .next list without affecting
the .waitlink list, creating the possibility of a .waitlink pointing at basically the
rest of the list. Then if that entry is being used by a sync.Cond.Wait when a GC happens
and c.sudogcache is set to nil, nothing is collected, because the entry blocked in
sync.Cond.Wait has a waitlist pointing at the whole list that was just
"discarded". Even worse, when that sync.Cond.Wait finishes and puts that SudoG
into c.sudogcache, the .waitlink will still be dangling back to the earlier list,
pinning the whole thing in memory.
In programs with the right mix of non-trivial selects and sync.Cond operations end up
with allocated SudoGs that are heavily cross-linked by all these dangling pointers.
Setting c.sudogcache = nil does make sure that the SudoGs stop being used, but chances
are very good that there is a dangling .waitlink (probably many) pointing into the list
that keeps a large fraction of it from being collected. And then on the next collection
chances are very good that that SudoG keeping the previous generation from being
collected itself finds a way not to get collected, causing the leaks to pile up
aribtrarily.
Below is a simple program that leaks basically all of its SudoG in each iteration of the
for { }.
Attached is a picture of the SudoGs for the first iteration, showing the next and
waitlink links and how nothing gets collected.
This is an abstracted version of something I observed in a Google production server
testing Go 1.4.
package main
import (
"runtime"
"runtime/debug"
"sync"
"time"
)
func main() {
debug.SetGCPercent(1000000) // only GC when we ask for GC
release := func() {}
for {
c := make(chan int)
for i := 0; i < 1000; i++ {
go func() {
select {
case <-c:
case <-c:
case <-c:
}
}()
}
time.Sleep(10 * time.Millisecond)
release()
close(c) // let select put its sudog's into the cache
time.Sleep(10 * time.Millisecond)
// pick up top sudog
var cond1 sync.Cond
var mu1 sync.Mutex
cond1.L = &mu1
go func() {
mu1.Lock()
cond1.Wait()
mu1.Unlock()
}()
time.Sleep(1 * time.Millisecond)
// pick up next sudog
var cond2 sync.Cond
var mu2 sync.Mutex
cond2.L = &mu2
go func() {
mu2.Lock()
cond2.Wait()
mu2.Unlock()
}()
time.Sleep(1 * time.Millisecond)
// put top sudog back
cond1.Broadcast()
time.Sleep(1 * time.Millisecond)
// drop cache on floor
runtime.GC()
// release cond2 after select has gotten to run
release = func() {
cond2.Broadcast()
time.Sleep(1 * time.Millisecond)
}
}
}
Attachments:
- IMG_1907.JPG (377479 bytes)