fix: reduce the number of memory allocations and the latency overhead. #983

dominiquelefevre · 2025-05-17T15:29:42Z

Struct instrumentedConn would trace every call to Read() and Write(). This incurs a significant overhead because the variadic arguments of trace.RecordBytesReceived() escape and need to be heap-allocated. Also, every trace.RecordBytesReceived() would be called in a new goroutine. That makes the call yet more expensive.

It makes no sense to update the performance counter this often. The default scraping interval in Prometheus is 1 minute. Google Cloud Monitoring had the same interval before they added high-resolution counters that scrape services every 10 seconds.

Only update integer counters in the hot path, and update OpenCensus' counters once in 5 seconds.

I have a test that just loops through SELECT * FROM t WHERE id = $1 and has response sizes that range from several dozen bytes to several kilobytes. At 64 connections to Postgres and 10k requests per connection, the test makes approx. 25.5M allocations before this patch, and 7.2M allocations after the patch. The CPU time goes down accordingly because OpenCensus and the garbage collector have less work to do.

hessjcg · 2025-05-19T17:35:48Z

/gcbrun

enocom

Thanks for this PR.

I'm surprised by the arguments to RecordBytesReceived escaping the stack. Do you understand why this is happening?

In any case, you're right that we don't need to schedule goroutines with every read and write. So this is a big improvement. To avoid a breaking change in the metrics, I think we need to make a small adjustment to the code here. See below for my suggestion.

Thanks again for sending this.

dialer.go

enocom · 2025-05-19T17:31:40Z

dialer.go

 	bytesRead, err := i.Conn.Read(b)
 	if err == nil {
-		go trace.RecordBytesReceived(context.Background(), int64(bytesRead), i.connName, i.dialerID)
+		atomic.AddInt64(&i.bytesRead, int64(bytesRead))


Shall we use https://pkg.go.dev/sync/atomic#Int64.Add instead here?

Ditto below.

That does not make a difference, except that a plain atomic.AddInt64() does not rely on the inliner.

There's apparently a bug around AddInt64 for 32-bit platforms, but we can address this in a separate PR.

dialer.go

internal/trace/metrics.go

dominiquelefevre · 2025-05-20T08:49:35Z

I'm surprised by the arguments to RecordBytesReceived escaping the stack. Do you understand why this is happening?

The lifetime of a goroutine is not limited to the lifetime of a stack frame where it was created, so all arguments of a goroutine have to be heap-allocated. The compiler allocates one object to hold all parameters, but it still has to allocate.

Another problem is the use context.WithValue(), pointers-to-functions and interface methods.

func RecordBytesReceived(ctx context.Context, num int64, instance, dialerID string) {
    // tag.New() assumes that output parameters escape so ctx is always heap-allocated.
    // context.WithValue() puts a reference to tags into ctx, so tags escape as well.
    ctx, _ = tag.New(ctx, tag.Upsert(keyInstance, instance), tag.Upsert(keyDialerID, dialerID))

    // stats.Record() forwards its arguments to internal.MeasurementRecorder.(measurementRecorder).
    // It is not a function, but a pointer to a function. The compiler assumes that all arguments escape,
    // in particular, mBytesReceived.M(num). The same is true for all interface methods.
    stats.Record(ctx, mBytesReceived.M(num))
}

enocom

Really appreciate this PR. This is a big win for anyone using the Go Connector.

enocom · 2025-05-20T16:37:50Z

@hessjcg Would you like to take a pass on this as well now that I'm technically not a code owner anymore?

dominiquelefevre · 2025-05-21T05:59:15Z

I will have time to fix the tests like TestMonitoredCache_Close next week. If you help me with those and merge the PR earlier, I will be happy.

enocom · 2025-05-22T15:03:43Z

@hessjcg looks like the test just needs to properly initialize an instrumentedConn based on the changes here.

Struct instrumentedConn would trace every call to Read() and Write(). This incurs a significant overhead because the variadic arguments of trace.RecordBytesReceived() escape and need to be heap-allocated. Also, every trace.RecordBytesReceived() would be called in a new goroutine. That makes the call yet more expensive. It makes no sense to update the performance counter this often. The default scraping interval in Prometheus is 1 minute. Google Cloud Monitoring had the same interval before they added high-resolution counters that scrape services every 10 seconds. Only update integer counters in the hot path, and update OpenCensus' counters once in 5 seconds. I have a test that just loops through `SELECT * FROM t WHERE id = $1` and has response sizes that range from several dozen bytes to several kilobytes. At 64 connections to Postgres and 10k requests per connection, the test makes approx. 25.5M allocations before this patch, and 7.2M allocations after the patch. The CPU time goes down accordingly because OpenCensus and the garbage collector have less work to do.

…ons. Creating a goroutine introduces a latency of its own. Moreover, a Dial() that makes a TLS connection will not benefit from any possiblemicrosend- level savings.

dominiquelefevre · 2025-05-25T08:24:44Z

Fixed TestDialerWithMetrics and TestMonitoredCache_Close.

hessjcg · 2025-05-27T15:52:33Z

Nice work. I'll review the code more carefully today. I noticed this unusual error in the unit tests results, only for the i386 architecture. Any idea what to do about it?

--- FAIL: TestDialerCanConnectToInstance (6.13s)
panic: unaligned 64-bit atomic operation
	panic: unaligned 64-bit atomic operation [recovered]
	panic: unaligned 64-bit atomic operation

goroutine 20 [running]:
testing.tRunner.func1.2({0x8b690c0, 0x8e4ec70})
	/opt/hostedtoolcache/go/1.23.9/x64/src/testing/testing.go:1632 +0x283
testing.tRunner.func1()
	/opt/hostedtoolcache/go/1.23.9/x64/src/testing/testing.go:1635 +0x3fd
panic({0x8b690c0, 0x8e4ec70})
	/opt/hostedtoolcache/go/1.23.9/x64/src/runtime/panic.go:791 +0x103
internal/runtime/atomic.panicUnaligned()
	/opt/hostedtoolcache/go/1.23.9/x64/src/internal/runtime/atomic/unaligned.go:8 +0x2d
internal/runtime/atomic.Xchg64(0xa0ec45c, 0x0)
	/opt/hostedtoolcache/go/1.23.9/x64/src/internal/runtime/atomic/atomic_386.s:178 +0x11
cloud.google.com/go/cloudsqlconn.(*instrumentedConn).reportCounters(0xa0ec420)
	/home/runner/work/cloud-sql-go-connector/cloud-sql-go-connector/dialer.go:648 +0x64
cloud.google.com/go/cloudsqlconn.(*instrumentedConn).Close(0xa0ec420)
	/home/runner/work/cloud-sql-go-connector/cloud-sql-go-connector/dialer.go:642 +0xb0
cloud.google.com/go/cloudsqlconn.testSucessfulDialWithInstanceName.func1()
	/home/runner/work/cloud-sql-go-connector/cloud-sql-go-connector/dialer_test.go:56 +0x23
panic({0x8b690c0, 0x8e4ec70})
	/opt/hostedtoolcache/go/1.23.9/x64/src/runtime/panic.go:791 +0x103
internal/runtime/atomic.panicUnaligned()
	/opt/hostedtoolcache/go/1.23.9/x64/src/internal/runtime/atomic/unaligned.go:8 +0x2d
internal/runtime/atomic.Xadd64(0xa0ec45c, 0xb)
	/opt/hostedtoolcache/go/1.23.9/x64/src/internal/runtime/atomic/atomic_386.s:125 +0x11
cloud.google.com/go/cloudsqlconn.(*instrumentedConn).Read(0xa0ec420, {0xa52aa00, 0x200, 0x200})
	/home/runner/work/cloud-sql-go-connector/cloud-sql-go-connector/dialer.go:609 +0xcc
io.ReadAll({0xf22e5750, 0xa0ec420})
	/opt/hostedtoolcache/go/1.23.9/x64/src/io/io.go:712 +0x83
cloud.google.com/go/cloudsqlconn.testSucessfulDialWithInstanceName({0x8e58310, 0x95f9300}, 0xa1f2208, 0xa0e2300, {0xa0de180, 0x20}, {0x8cde388, 0xb}, {0x0, 0x0, ...})
	/home/runner/work/cloud-sql-go-connector/cloud-sql-go-connector/dialer_test.go:58 +0x139
cloud.google.com/go/cloudsqlconn.testSuccessfulDial(...)
	/home/runner/work/cloud-sql-go-connector/cloud-sql-go-connector/dialer_test.go:44
cloud.google.com/go/cloudsqlconn.TestDialerCanConnectToInstance(0xa1f2208)
	/home/runner/work/cloud-sql-go-connector/cloud-sql-go-connector/dialer_test.go:130 +0x30e
testing.tRunner(0xa1f2208, 0x8d2946c)
	/opt/hostedtoolcache/go/1.23.9/x64/src/testing/testing.go:1690 +0x119
created by testing.(*T).Run in goroutine 1
	/opt/hostedtoolcache/go/1.23.9/x64/src/testing/testing.go:1743 +0x3d1
FAIL	cloud.google.com/go/cloudsqlconn	6.[14](https://github.com/GoogleCloudPlatform/cloud-sql-go-connector/actions/runs/15235949578/job/42972748474?pr=983#step:6:15)4s

enocom · 2025-05-27T17:58:45Z

See GoogleCloudPlatform/alloydb-go-connector#686 where we ported this change and used (*atomic.Int64) Add to avoid alignment problems in 32 bit architectures.

Unlike int64 struct fields, atomic.Int64 fields are guaranteed to be aligned to 8 bytes on 32-bit platoform. This alignement is required for atomic ops to work. Also, replace atomic int32 vars with atomic.Int32 for consistency.

dominiquelefevre · 2025-05-30T09:42:09Z

Ping.

I've replaced plain int64s with atomic.Int64s to have them correctly aligned.

enocom · 2025-05-30T15:55:33Z

@hessjcg to review. LGTM. Thanks again for this.

dominiquelefevre · 2025-06-03T12:58:38Z

Hey @hessjcg! Is anyone out here to read PRs?

hessjcg

Thank you for this improvement. This looks great!

dominiquelefevre · 2025-06-03T18:28:57Z

@hessjcg please make a release so that I can pull in this fix without referencing some random git commit.

chuan-z · 2025-06-04T16:04:22Z

hi @dominiquelefevre , thanks for this great contribution! We have monthly release scheduled. And we will release latest with this PR included in next 1 ~ 2 weeks.

dominiquelefevre requested a review from a team as a code owner May 17, 2025 15:29

blunderbuss-gcf bot assigned hessjcg May 17, 2025

enocom reviewed May 19, 2025

View reviewed changes

enocom mentioned this pull request May 19, 2025

Reduce heap allocation and latency in instrumented conn GoogleCloudPlatform/alloydb-go-connector#684

Closed

enocom approved these changes May 20, 2025

View reviewed changes

nancynh mentioned this pull request May 21, 2025

fix: reduce the number of memory allocations and latency overhead GoogleCloudPlatform/alloydb-go-connector#686

Merged

dominiquelefevre added 2 commits May 25, 2025 11:20

fix: do not spawn goroutines to report dial_latency and open_connecti…

c3703b8

…ons. Creating a goroutine introduces a latency of its own. Moreover, a Dial() that makes a TLS connection will not benefit from any possiblemicrosend- level savings.

chore: use atomic.Int64 and atomic.Int32.

febcb76

Unlike int64 struct fields, atomic.Int64 fields are guaranteed to be aligned to 8 bytes on 32-bit platoform. This alignement is required for atomic ops to work. Also, replace atomic int32 vars with atomic.Int32 for consistency.

hessjcg approved these changes Jun 3, 2025

View reviewed changes

hessjcg merged commit cb641f2 into GoogleCloudPlatform:main Jun 3, 2025
12 of 13 checks passed

release-please bot mentioned this pull request Jun 3, 2025

chore(main): release 1.17.2 #990

Merged

fix: reduce the number of memory allocations and the latency overhead. #983

fix: reduce the number of memory allocations and the latency overhead. #983

Uh oh!

Conversation

dominiquelefevre commented May 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hessjcg commented May 19, 2025

Uh oh!

enocom left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

enocom May 19, 2025

Choose a reason for hiding this comment

Uh oh!

enocom May 19, 2025

Choose a reason for hiding this comment

Uh oh!

dominiquelefevre May 20, 2025

Choose a reason for hiding this comment

Uh oh!

enocom May 20, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

dominiquelefevre commented May 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

enocom left a comment

Choose a reason for hiding this comment

Uh oh!

enocom commented May 20, 2025

Uh oh!

dominiquelefevre commented May 21, 2025

Uh oh!

enocom commented May 22, 2025

Uh oh!

dominiquelefevre commented May 25, 2025

Uh oh!

hessjcg commented May 27, 2025

Uh oh!

enocom commented May 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dominiquelefevre commented May 30, 2025

Uh oh!

enocom commented May 30, 2025

Uh oh!

dominiquelefevre commented Jun 3, 2025

Uh oh!

hessjcg left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dominiquelefevre commented Jun 3, 2025

Uh oh!

chuan-z commented Jun 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

dominiquelefevre commented May 17, 2025 •

edited

Loading

dominiquelefevre commented May 20, 2025 •

edited

Loading

enocom commented May 27, 2025 •

edited

Loading