Skip to content

Conversation

@dominiquelefevre
Copy link
Contributor

@dominiquelefevre dominiquelefevre commented May 17, 2025

Struct instrumentedConn would trace every call to Read() and Write(). This incurs a significant overhead because the variadic arguments of trace.RecordBytesReceived() escape and need to be heap-allocated. Also, every trace.RecordBytesReceived() would be called in a new goroutine. That makes the call yet more expensive.

It makes no sense to update the performance counter this often. The default scraping interval in Prometheus is 1 minute. Google Cloud Monitoring had the same interval before they added high-resolution counters that scrape services every 10 seconds.

Only update integer counters in the hot path, and update OpenCensus' counters once in 5 seconds.

I have a test that just loops through SELECT * FROM t WHERE id = $1 and has response sizes that range from several dozen bytes to several kilobytes. At 64 connections to Postgres and 10k requests per connection, the test makes approx. 25.5M allocations before this patch, and 7.2M allocations after the patch. The CPU time goes down accordingly because OpenCensus and the garbage collector have less work to do.

@dominiquelefevre dominiquelefevre requested a review from a team as a code owner May 17, 2025 15:29
@hessjcg
Copy link
Collaborator

hessjcg commented May 19, 2025

/gcbrun

Copy link
Member

@enocom enocom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this PR.

I'm surprised by the arguments to RecordBytesReceived escaping the stack. Do you understand why this is happening?

In any case, you're right that we don't need to schedule goroutines with every read and write. So this is a big improvement. To avoid a breaking change in the metrics, I think we need to make a small adjustment to the code here. See below for my suggestion.

Thanks again for sending this.

dialer.go Outdated
bytesRead, err := i.Conn.Read(b)
if err == nil {
go trace.RecordBytesReceived(context.Background(), int64(bytesRead), i.connName, i.dialerID)
atomic.AddInt64(&i.bytesRead, int64(bytesRead))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we use https://pkg.go.dev/sync/atomic#Int64.Add instead here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That does not make a difference, except that a plain atomic.AddInt64() does not rely on the inliner.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's apparently a bug around AddInt64 for 32-bit platforms, but we can address this in a separate PR.

@dominiquelefevre
Copy link
Contributor Author

dominiquelefevre commented May 20, 2025

I'm surprised by the arguments to RecordBytesReceived escaping the stack. Do you understand why this is happening?

The lifetime of a goroutine is not limited to the lifetime of a stack frame where it was created, so all arguments of a goroutine have to be heap-allocated. The compiler allocates one object to hold all parameters, but it still has to allocate.

Another problem is the use context.WithValue(), pointers-to-functions and interface methods.

func RecordBytesReceived(ctx context.Context, num int64, instance, dialerID string) {
    // tag.New() assumes that output parameters escape so ctx is always heap-allocated.
    // context.WithValue() puts a reference to tags into ctx, so tags escape as well.
    ctx, _ = tag.New(ctx, tag.Upsert(keyInstance, instance), tag.Upsert(keyDialerID, dialerID))

    // stats.Record() forwards its arguments to internal.MeasurementRecorder.(measurementRecorder).
    // It is not a function, but a pointer to a function. The compiler assumes that all arguments escape,
    // in particular, mBytesReceived.M(num). The same is true for all interface methods.
    stats.Record(ctx, mBytesReceived.M(num))
}

Copy link
Member

@enocom enocom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really appreciate this PR. This is a big win for anyone using the Go Connector.

@enocom
Copy link
Member

enocom commented May 20, 2025

@hessjcg Would you like to take a pass on this as well now that I'm technically not a code owner anymore?

@dominiquelefevre
Copy link
Contributor Author

I will have time to fix the tests like TestMonitoredCache_Close next week. If you help me with those and merge the PR earlier, I will be happy.

@enocom
Copy link
Member

enocom commented May 22, 2025

@hessjcg looks like the test just needs to properly initialize an instrumentedConn based on the changes here.

Struct instrumentedConn would trace every call to Read() and Write().
This incurs a significant overhead because the variadic arguments of
trace.RecordBytesReceived() escape and need to be heap-allocated.
Also, every trace.RecordBytesReceived() would be called in a new
goroutine. That makes the call yet more expensive.

It makes no sense to update the performance counter this often. The
default scraping interval in Prometheus is 1 minute. Google Cloud
Monitoring had the same interval before they added high-resolution
counters that scrape services every 10 seconds.

Only update integer counters in the hot path, and update OpenCensus'
counters once in 5 seconds.

I have a test that just loops through `SELECT * FROM t WHERE id = $1`
and has response sizes that range from several dozen bytes to several
kilobytes. At 64 connections to Postgres and 10k requests per connection,
the test makes approx. 25.5M allocations before this patch, and 7.2M
allocations after the patch. The CPU time goes down accordingly
because OpenCensus and the garbage collector have less work to do.
…ons.

Creating a goroutine introduces a latency of its own. Moreover, a Dial()
that makes a TLS connection will not benefit from any possiblemicrosend-
level savings.
@dominiquelefevre
Copy link
Contributor Author

Fixed TestDialerWithMetrics and TestMonitoredCache_Close.

@hessjcg
Copy link
Collaborator

hessjcg commented May 27, 2025

Nice work. I'll review the code more carefully today. I noticed this unusual error in the unit tests results, only for the i386 architecture. Any idea what to do about it?

--- FAIL: TestDialerCanConnectToInstance (6.13s)
panic: unaligned 64-bit atomic operation
	panic: unaligned 64-bit atomic operation [recovered]
	panic: unaligned 64-bit atomic operation

goroutine 20 [running]:
testing.tRunner.func1.2({0x8b690c0, 0x8e4ec70})
	/opt/hostedtoolcache/go/1.23.9/x64/src/testing/testing.go:1632 +0x283
testing.tRunner.func1()
	/opt/hostedtoolcache/go/1.23.9/x64/src/testing/testing.go:1635 +0x3fd
panic({0x8b690c0, 0x8e4ec70})
	/opt/hostedtoolcache/go/1.23.9/x64/src/runtime/panic.go:791 +0x103
internal/runtime/atomic.panicUnaligned()
	/opt/hostedtoolcache/go/1.23.9/x64/src/internal/runtime/atomic/unaligned.go:8 +0x2d
internal/runtime/atomic.Xchg64(0xa0ec45c, 0x0)
	/opt/hostedtoolcache/go/1.23.9/x64/src/internal/runtime/atomic/atomic_386.s:178 +0x11
cloud.google.com/go/cloudsqlconn.(*instrumentedConn).reportCounters(0xa0ec420)
	/home/runner/work/cloud-sql-go-connector/cloud-sql-go-connector/dialer.go:648 +0x64
cloud.google.com/go/cloudsqlconn.(*instrumentedConn).Close(0xa0ec420)
	/home/runner/work/cloud-sql-go-connector/cloud-sql-go-connector/dialer.go:642 +0xb0
cloud.google.com/go/cloudsqlconn.testSucessfulDialWithInstanceName.func1()
	/home/runner/work/cloud-sql-go-connector/cloud-sql-go-connector/dialer_test.go:56 +0x23
panic({0x8b690c0, 0x8e4ec70})
	/opt/hostedtoolcache/go/1.23.9/x64/src/runtime/panic.go:791 +0x103
internal/runtime/atomic.panicUnaligned()
	/opt/hostedtoolcache/go/1.23.9/x64/src/internal/runtime/atomic/unaligned.go:8 +0x2d
internal/runtime/atomic.Xadd64(0xa0ec45c, 0xb)
	/opt/hostedtoolcache/go/1.23.9/x64/src/internal/runtime/atomic/atomic_386.s:125 +0x11
cloud.google.com/go/cloudsqlconn.(*instrumentedConn).Read(0xa0ec420, {0xa52aa00, 0x200, 0x200})
	/home/runner/work/cloud-sql-go-connector/cloud-sql-go-connector/dialer.go:609 +0xcc
io.ReadAll({0xf22e5750, 0xa0ec420})
	/opt/hostedtoolcache/go/1.23.9/x64/src/io/io.go:712 +0x83
cloud.google.com/go/cloudsqlconn.testSucessfulDialWithInstanceName({0x8e58310, 0x95f9300}, 0xa1f2208, 0xa0e2300, {0xa0de180, 0x20}, {0x8cde388, 0xb}, {0x0, 0x0, ...})
	/home/runner/work/cloud-sql-go-connector/cloud-sql-go-connector/dialer_test.go:58 +0x139
cloud.google.com/go/cloudsqlconn.testSuccessfulDial(...)
	/home/runner/work/cloud-sql-go-connector/cloud-sql-go-connector/dialer_test.go:44
cloud.google.com/go/cloudsqlconn.TestDialerCanConnectToInstance(0xa1f2208)
	/home/runner/work/cloud-sql-go-connector/cloud-sql-go-connector/dialer_test.go:130 +0x30e
testing.tRunner(0xa1f2208, 0x8d2946c)
	/opt/hostedtoolcache/go/1.23.9/x64/src/testing/testing.go:1690 +0x119
created by testing.(*T).Run in goroutine 1
	/opt/hostedtoolcache/go/1.23.9/x64/src/testing/testing.go:1743 +0x3d1
FAIL	cloud.google.com/go/cloudsqlconn	6.[14](https://github.com/GoogleCloudPlatform/cloud-sql-go-connector/actions/runs/15235949578/job/42972748474?pr=983#step:6:15)4s

@enocom
Copy link
Member

enocom commented May 27, 2025

See GoogleCloudPlatform/alloydb-go-connector#686 where we ported this change and used (*atomic.Int64) Add to avoid alignment problems in 32 bit architectures.

Unlike int64 struct fields, atomic.Int64 fields are guaranteed to be
aligned to 8 bytes on 32-bit platoform. This alignement is required
for atomic ops to work.

Also, replace atomic int32 vars with atomic.Int32 for consistency.
@dominiquelefevre
Copy link
Contributor Author

Ping.

I've replaced plain int64s with atomic.Int64s to have them correctly aligned.

@enocom
Copy link
Member

enocom commented May 30, 2025

@hessjcg to review. LGTM. Thanks again for this.

@dominiquelefevre
Copy link
Contributor Author

Hey @hessjcg! Is anyone out here to read PRs?

Copy link
Collaborator

@hessjcg hessjcg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this improvement. This looks great!

@hessjcg hessjcg merged commit cb641f2 into GoogleCloudPlatform:main Jun 3, 2025
12 of 13 checks passed
@dominiquelefevre
Copy link
Contributor Author

@hessjcg please make a release so that I can pull in this fix without referencing some random git commit.

@chuan-z
Copy link

chuan-z commented Jun 4, 2025

hi @dominiquelefevre , thanks for this great contribution! We have monthly release scheduled. And we will release latest with this PR included in next 1 ~ 2 weeks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants