Skip to content

Conversation

@panavenue
Copy link
Contributor

@panavenue panavenue commented Nov 4, 2025

Description - This pull request introduces a comprehensive system for collecting and exporting client-side metrics from the Cloud SQL Go Connector, providing valuable insights into the connector's performance and behavior. The key changes include the integration of OpenTelemetry for metrics collection and the introduction of a dedicated telemetry package.

  1. Setup Metric meter and records in tel with new metrics -
  • connect_latencies: A histogram that records the duration of dial operations, providing insight into connection latency.
  • open_connections: An up-down counter that tracks the number of currently open connections to a Cloud SQL instance.
  • closed_connection_count: A counter that records the total number of closed connections, which helps in understanding connection churn.
  1. Dialer Integration
  • A MetricRecorder is now lazily initialized for each Cloud SQL instance when a connection is first established
  • The Dial method has been enhanced to record metrics at various stages of a connection's lifecycle. For example, it records the connection latency upon a successful dial and increments the open connections counter
  • When a connection is closed, the open connections counter is decremented, and the closed connection count is incremented

@panavenue panavenue requested a review from a team as a code owner November 4, 2025 23:18
@panavenue panavenue force-pushed the system_metrics_setup branch from 8747506 to 1f835d8 Compare November 4, 2025 23:46
@panavenue panavenue force-pushed the system_metrics_setup branch from 1f835d8 to ad4a6e6 Compare November 4, 2025 23:55
@panavenue panavenue requested a review from enocom November 14, 2025 19:02
Copy link
Member

@enocom enocom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Glad to see the AlloyDB implementation was helpful. Left a couple of comments.

n := c.openConnsCount.Add(1)
trace.RecordOpenConnections(ctx, int64(n), d.dialerID, cn.String())
trace.RecordDialLatency(ctx, icn, d.dialerID, latency)
mr.RecordOpenConnection(ctx, attrs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a bug in the AlloyDB implementation around open connections, but wow I can't find it yet. Beware!

Copy link
Contributor Author

@panavenue panavenue Nov 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I think I might know what the potential bug -> The GAUGE metric always have more "open" than "close", because we are sending the metric ONLY on every 60s, and when the application shut down, we never record the "-1" of the open_conn GAUGE metric.

Is this the potential bug? (I was testing on how to go around this, but it's gonna be tough to mitigate this one)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants