Skip to content

Keep up to date with original project#4

Closed
Multimo wants to merge 25 commits intografana:mainfrom
databricks:main
Closed

Keep up to date with original project#4
Multimo wants to merge 25 commits intografana:mainfrom
databricks:main

Conversation

@Multimo
Copy link
Collaborator

@Multimo Multimo commented Nov 24, 2025

No description provided.

madhav-db and others added 14 commits October 30, 2025 05:17
This introduces a flexible TokenProvider interface that allows custom
authentication implementations:

- TokenProvider interface with static, external function support
- Token struct with expiration handling
- Authenticator wrapper for integration with existing auth system
- Connector functions: WithTokenProvider, WithExternalToken, WithStaticToken

This foundation enables custom token management strategies without
requiring changes to the core driver.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
This design document provides a comprehensive telemetry system design
adapted specifically for the databricks-sql-go driver following Go
best practices and idiomatic patterns.

Key Go-specific adaptations:
- Replaced C# Activity/ActivitySource with context.Context and interceptors
- Used goroutines and channels for async operations
- Applied sync.RWMutex and sync.Once for thread-safe singletons
- Implemented circuit breaker pattern with Go idioms
- Used defer/recover for error handling
- Followed Go naming conventions (unexported types, camelCase)
- Designed around standard library patterns (http.Client, context)
- Included Go-specific testing patterns (unit, integration, benchmarks)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Implement Phase 1 of telemetry infrastructure for the Go driver:

- Add Config struct with all telemetry configuration fields
- Implement DefaultConfig() with telemetry disabled by default
- Add ParseTelemetryConfig() for DSN parameter parsing
- Define tag constants for connection, statement, and error metrics
- Implement tag export scope filtering (local vs Databricks)
- Add comprehensive unit tests for config and tag filtering

Note: Telemetry is disabled by default and will be enabled after
full testing and validation is complete.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
## Summary

This PR adds a comprehensive telemetry design document specifically
adapted for the `databricks-sql-go` driver. The design was transformed
from a C#/.NET ADBC driver design to follow Go best practices and
idiomatic patterns.

## Go-Specific Adaptations

This design document has been completely rewritten to align with Go
conventions and the existing codebase patterns:

### 1. **Replaced C#/.NET Concepts with Go Equivalents**

| C#/.NET Pattern | Go Pattern |
|-----------------|------------|
| `Activity`/`ActivitySource` | `context.Context` + middleware
interceptors |
| `ActivityListener` | Custom telemetry interceptor pattern |
| `async`/`await` | Goroutines and channels |
| `ConcurrentDictionary` | `map` with `sync.RWMutex` |
| `IDisposable` | `Close()` methods |
| C# namespaces | Go packages |

### 2. **Applied Go Naming Conventions**

- **Unexported types**: `featureFlagCache`, `clientManager`,
`metricsAggregator` (lowercase for internal types)
- **Exported functions**: Following Go conventions for public APIs
- **Idiomatic names**: `mu` for mutex, `cfg` for config, `ctx` for
context
- **Package naming**: Single lowercase word (`telemetry`)

### 3. **Idiomatic Go Code Patterns**

#### Concurrency & Thread Safety
\`\`\`go
// Singleton with sync.Once
var (
    managerOnce     sync.Once
    managerInstance *clientManager
)

func getClientManager() *clientManager {
    managerOnce.Do(func() {
        managerInstance = &clientManager{
            clients: make(map[string]*clientHolder),
        }
    })
    return managerInstance
}

// Thread-safe operations with RWMutex
func (m *clientManager) getOrCreateClient(host string, ...)
*telemetryClient {
    m.mu.Lock()
    defer m.mu.Unlock()
    // ...
}
\`\`\`

#### Context Propagation
\`\`\`go
// Context-based metric collection
func (i *interceptor) beforeExecute(ctx context.Context, statementID
string) context.Context {
    mc := &metricContext{
        statementID: statementID,
        startTime:   time.Now(),
        tags:        make(map[string]interface{}),
    }
    return withMetricContext(ctx, mc)
}
\`\`\`

#### Error Handling
\`\`\`go
// Defer/recover pattern for error swallowing
func recoverAndLog(operation string) {
    if r := recover(); r != nil {
        // Log at trace level only
    }
}

func (i *interceptor) afterExecute(ctx context.Context, err error) {
    defer recoverAndLog("afterExecute")
    // Telemetry logic
}
\`\`\`

### 4. **Async Patterns with Goroutines**

\`\`\`go
// Background flush loop
func (agg *metricsAggregator) flushLoop() {
    ticker := time.NewTicker(agg.flushInterval)
    defer ticker.Stop()

    for {
        select {
        case <-ticker.C:
            agg.flush(context.Background())
        case <-agg.stopCh:
            return
        }
    }
}

// Async export
go func() {
    defer recoverAndLog("export")
    agg.exporter.export(ctx, metrics)
}()
\`\`\`

### 5. **Standard Library Integration**

- **\`net/http\`**: HTTP client for telemetry export
- **\`context.Context\`**: Cancellation and deadline propagation
- **\`time\`**: Timers, tickers, and duration handling
- **\`sync\`**: Mutexes, WaitGroups, and Once
- **\`encoding/json\`**: Metric serialization

### 6. **Driver Integration Points**

#### In \`connector.go\`
\`\`\`go
func (c *connector) Connect(ctx context.Context) (driver.Conn, error) {
    // ... existing code ...
    
    if c.cfg.telemetryEnabled {
        conn.telemetry = newTelemetryInterceptor(conn.id, c.cfg)
        conn.telemetry.recordConnection(ctx, tags)
    }
    
    return conn, nil
}
\`\`\`

#### In \`statement.go\`
\`\`\`go
func (s *stmt) QueryContext(ctx context.Context, args
[]driver.NamedValue) (driver.Rows, error) {
    if s.conn.telemetry != nil {
        ctx = s.conn.telemetry.beforeExecute(ctx, statementID)
        defer func() {
            s.conn.telemetry.afterExecute(ctx, err)
        }()
    }
    // ... existing implementation ...
}
\`\`\`

### 7. **Testing Strategy**

- **Unit tests**: Standard \`*testing.T\` patterns
- **Integration tests**: Using \`testing.Short()\` for skip flags
- **Benchmarks**: \`BenchmarkXxx\` functions to measure overhead
- **Table-driven tests**: Go idiomatic test patterns

\`\`\`go
func BenchmarkInterceptor_Overhead(b *testing.B) {
    // ... setup ...
    b.ResetTimer()
    for i := 0; i < b.N; i++ {
        ctx = interceptor.beforeExecute(ctx, "stmt-123")
        interceptor.afterExecute(ctx, nil)
    }
}
\`\`\`

## Key Design Features

### Per-Host Resource Management
- **Feature Flag Cache**: Singleton per host with reference counting
(15min TTL)
- **Telemetry Client**: One shared client per host to prevent rate
limiting
- **Circuit Breaker**: Per-host protection against failing endpoints

### Privacy & Security
- ✅ No PII collected (no SQL queries, user data, or credentials)
- ✅ Tag filtering ensures only approved metrics exported
- ✅ All sensitive info excluded from Databricks export

### Reliability
- ✅ All telemetry errors swallowed (never impacts driver)
- ✅ Circuit breaker prevents cascade failures
- ✅ Graceful shutdown with proper resource cleanup
- ✅ Terminal vs retryable error classification

## File Structure

\`\`\`
telemetry/
├── DESIGN.md              # This comprehensive design document
├── config.go              # Configuration types
├── tags.go                # Tag definitions and filtering
├── featureflag.go         # Per-host feature flag caching
├── manager.go             # Per-host client management
├── circuitbreaker.go      # Circuit breaker implementation
├── interceptor.go         # Telemetry interceptor
├── aggregator.go          # Metrics aggregation
├── exporter.go            # Export to Databricks
├── client.go              # Telemetry client
├── errors.go              # Error classification
└── *_test.go              # Test files
\`\`\`

## Alignment with Existing Codebase

This design follows patterns observed in:
- \`connection.go\`: Connection lifecycle management
- \`connector.go\`: Factory patterns and options
- \`internal/config/config.go\`: Configuration structures
- \`internal/client/client.go\`: HTTP client patterns

## Next Steps

This is a **design document only**. Implementation will be tracked in
separate PRs following the implementation checklist in the design.

## Related Work

- Based on JDBC driver telemetry implementation patterns
- Adapted from C#/.NET ADBC driver design
- Follows Go best practices and standard library patterns

---

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Samikshya Chand <samikshya.chand@databricks.com>
Signed-off-by: samikshya-chand_data <samikshya.chand@databricks.com>
#297)

## Summary
Implement Phase 1 (Core Infrastructure) of telemetry for the Go driver:
configuration management and tag definitions with export scope
filtering.

## Changes
- Add `Config` struct with all telemetry configuration fields
- Implement `DefaultConfig()` with telemetry **disabled by default**
- Add `ParseTelemetryConfig()` for DSN parameter parsing
- Define tag constants for connection, statement, and error metrics
- Implement tag export scope filtering (local vs Databricks)
- Add comprehensive unit tests (22 tests, 100% pass rate)

## Important Note
**Telemetry is disabled by default** (`Config.Enabled = false`) and will
be enabled only after full testing and validation is complete.

## Testing
- ✅ All 22 unit tests passing
- ✅ Configuration parsing from DSN parameters validated
- ✅ Tag export filtering verified (e.g., `server.address` is local-only)
- ✅ Build successful with no errors

## JIRA
Closes
[PECOBLR-1145](https://databricks.atlassian.net/browse/PECOBLR-1145)
Part of
[PECOBLR-1143](https://databricks.atlassian.net/browse/PECOBLR-1143)

## Design Document
See `telemetry/DESIGN.md` for complete technical design.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

[PECOBLR-1145]:
https://databricks.atlassian.net/browse/PECOBLR-1145?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ
[PECOBLR-1143]:
https://databricks.atlassian.net/browse/PECOBLR-1143?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ
- Reduce token expiry buffer from 5 minutes to 30 seconds (matches SDK standard)
- Add detailed documentation to TokenProviderAuthenticator explaining flow
- Add ctx.Err() check in ExternalTokenProvider for cancellation support
- Rename tokenFunc to tokenSource for better clarity
- Remove duplicate empty token validation from ExternalTokenProvider
- Update tests to reflect changes

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Implemented per-host feature flag caching system with the following capabilities:
- Singleton pattern for global feature flag cache management
- Per-host caching with 15-minute TTL to prevent rate limiting
- Reference counting tied to connection lifecycle
- Thread-safe operations using sync.RWMutex for concurrent access
- Graceful error handling with cached value fallback
- HTTP integration to fetch feature flags from Databricks API

Key Features:
- featureFlagCache: Manages per-host feature flag contexts
- featureFlagContext: Holds cached state, timestamp, and ref count
- getOrCreateContext: Creates context and increments reference count
- releaseContext: Decrements ref count and cleans up when zero
- isTelemetryEnabled: Returns cached value or fetches fresh
- fetchFeatureFlag: HTTP call to Databricks feature flag API

Testing:
- Comprehensive unit tests with 100% code coverage
- Tests for singleton pattern, reference counting, caching behavior
- Thread-safety tests with concurrent access
- Mock HTTP server tests for API integration
- Error handling and fallback scenarios

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
The log is quite repetitive, and as we use this sql driver in our
customer-facing product where customers see logs, the extra log seems
like spam to customers. I think it would be nice to have it as debug log
instead of info.
@CLAassistant
Copy link

CLAassistant commented Nov 24, 2025

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
0 out of 6 committers have signed the CLA.

❌ caldempsey
❌ madhav-db
❌ samikshya-db
❌ mdibaiee
❌ vikrantpuppala
❌ gopalldb
You have signed the CLA already but the status is still pending? Let us recheck it.

gopalldb and others added 11 commits November 25, 2025 16:37
MST design doc in Go SQL
Fixes #293 and
provides some context where the message is coming from.

The OAuth2 M2M authenticator currently logs token fetch operations at
the Info level in auth/oauth/m2m/m2m.go at line 60. When running
applications with Info-level logging enabled, this generates log entries
every time a token is fetched or refreshed, which pollutes application
logs with operational noise. Its also messing with my Ginkgo tests in
CI: onsi/ginkgo#1614 (comment).
Libraries should generally avoid logging at Info level during normal
operations unless there's actionable information for the application
operator. I think this is a pretty standard practice.
Addressed PR review comments from #304:

1. Fixed race condition when reading flagCtx fields
   - Added proper locking with flagCtx.mu for enabled, lastFetched, fetching
   - Previously accessed without correct lock causing data races

2. Fixed concurrent fetch issue
   - Implemented fetching flag to prevent simultaneous HTTP requests
   - First goroutine sets fetching=true, others use cached value
   - Prevents rate limiting from concurrent fetches when cache expires

3. Added HTTP request timeout
   - Added featureFlagHTTPTimeout = 10s constant
   - Wraps context with timeout if none exists
   - Prevents indefinite hangs (Go's default has no timeout)

All tests pass. Thread-safe concurrent access verified.
The linter requires explicit error handling. Since we're in an error
path and only draining the response body for connection reuse, we
explicitly ignore the error with blank identifiers.
Implements token provider support for the go driver
- We can live with owner approval like in our other repos.
https://github.com/databricks/databricks-jdbc/tree/main/.github
- Note : we already have a require approval from owners in GH ruleset
…304)

## Summary
Implements per-host feature flag caching system with reference counting
as part of the telemetry infrastructure (parent ticket PECOBLR-1143).
This is the first component of Phase 2: Per-Host Management.

## What Changed
- **New File**: `telemetry/featureflag.go` - Feature flag cache
implementation
- **New File**: `telemetry/featureflag_test.go` - Comprehensive unit
tests
- **Updated**: `telemetry/DESIGN.md` - Updated implementation checklist

## Implementation Details

### Core Components
1. **featureFlagCache** - Singleton managing per-host feature flag
contexts
   - Thread-safe using `sync.RWMutex`
   - Maps host → featureFlagContext

2. **featureFlagContext** - Per-host state holder
   - Cached feature flag value with 15-minute TTL
   - Reference counting for connection lifecycle management
   - Automatic cleanup when ref count reaches zero

### Key Features
- ✅ Per-host caching to prevent rate limiting
- ✅ 15-minute TTL with automatic cache expiration
- ✅ Reference counting tied to connection lifecycle
- ✅ Thread-safe for concurrent access
- ✅ Graceful error handling with cached value fallback
- ✅ HTTP integration with Databricks feature flag API

### Methods Implemented
- `getFeatureFlagCache()` - Singleton accessor
- `getOrCreateContext(host)` - Creates context and increments ref count
- `releaseContext(host)` - Decrements ref count and cleans up
- `isTelemetryEnabled(ctx, host, httpClient)` - Returns cached or
fetches fresh
- `fetchFeatureFlag(ctx, host, httpClient)` - HTTP call to Databricks
API

## Test Coverage
- ✅ Singleton pattern verification
- ✅ Reference counting (increment/decrement/cleanup)
- ✅ Cache expiration and refresh logic
- ✅ Thread-safety under concurrent access (100 goroutines)
- ✅ HTTP fetching with mock server
- ✅ Error handling and fallback scenarios
- ✅ Context cancellation
- ✅ All tests passing with 100% code coverage

## Test Results
\`\`\`
=== RUN   TestGetFeatureFlagCache_Singleton
--- PASS: TestGetFeatureFlagCache_Singleton (0.00s)
... (all 17 tests passing)
PASS
ok  	github.com/databricks/databricks-sql-go/telemetry	0.008s
\`\`\`

## Design Alignment
Implementation follows the design document (telemetry/DESIGN.md, section
3.1) exactly. The only addition is flexible URL construction in
\`fetchFeatureFlag\` to support both production (hostname without
protocol) and testing (httptest with protocol) scenarios.

## Testing Instructions
\`\`\`bash
go test -v ./telemetry -run TestFeatureFlag
go test -v ./telemetry  # Run all telemetry tests
go build ./telemetry     # Verify build
\`\`\`

## Related Links
- Parent Ticket:
[PECOBLR-1143](https://databricks.atlassian.net/browse/PECOBLR-1143)
- This Ticket:
[PECOBLR-1146](https://databricks.atlassian.net/browse/PECOBLR-1146)
- Design Doc: \`telemetry/DESIGN.md\`

## Next Steps
After this PR:
- PECOBLR-1147: Client Manager for Per-Host Clients
- PECOBLR-1148: Circuit Breaker Implementation

🤖 Generated with [Claude Code](https://claude.com/claude-code)

[PECOBLR-1143]:
https://databricks.atlassian.net/browse/PECOBLR-1143?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ
## 🥞 Stacked PR
Use this
[link](https://github.com/databricks/databricks-sql-go/pull/305/files?w=1)
to review incremental changes.
- [#304 - Feature Flag Cache
(PECOBLR-1146)](#304)
[[Files
changed](https://github.com/databricks/databricks-sql-go/pull/304/files)]
- [**#305 - Client Manager
(PECOBLR-1147)**](#305)
[[Files
changed](https://github.com/databricks/databricks-sql-go/pull/305/files)]
← This PR

---------

## Summary
Implements per-host client management system with reference counting as
part of the telemetry infrastructure (parent ticket PECOBLR-1143). This
is the second component of Phase 2: Per-Host Management.

## What Changed
- **New File**: `telemetry/client.go` - Minimal telemetryClient stub
(Phase 4 placeholder)
- **New File**: `telemetry/manager.go` - Client manager implementation
- **New File**: `telemetry/manager_test.go` - Comprehensive unit tests
- **Updated**: `telemetry/DESIGN.md` - Updated implementation checklist

## Implementation Details

### Core Components
1. **clientManager** - Singleton managing per-host telemetry clients
   - Thread-safe using `sync.RWMutex`
   - Maps host → clientHolder

2. **clientHolder** - Per-host state holder
   - Holds telemetry client reference
   - Reference count for active connections
   - Automatic cleanup when ref count reaches zero

3. **telemetryClient** (stub) - Minimal implementation
   - Placeholder for Phase 4 (Export)
   - Provides `start()` and `close()` methods
   - Will be fully implemented later

### Key Features
- ✅ Singleton pattern for global client management
- ✅ One client per host to prevent rate limiting
- ✅ Reference counting tied to connection lifecycle
- ✅ Thread-safe for concurrent access
- ✅ Automatic client cleanup when last connection closes
- ✅ Client start() called on creation
- ✅ Client close() called on removal

### Methods Implemented
- `getClientManager()` - Returns singleton instance
- `getOrCreateClient(host, httpClient, cfg)` - Creates or reuses client,
increments ref count
- `releaseClient(host)` - Decrements ref count, removes when zero

## Test Coverage
- ✅ Singleton pattern verification
- ✅ Reference counting (increment/decrement/cleanup)
- ✅ Multiple hosts management
- ✅ Partial releases
- ✅ Thread-safety under concurrent access (100+ goroutines)
- ✅ Client lifecycle (start/close) verification
- ✅ Non-existent host handling
- ✅ All tests passing with 100% code coverage

## Test Results
\`\`\`
=== RUN   TestGetClientManager_Singleton
--- PASS: TestGetClientManager_Singleton (0.00s)
... (all 11 tests passing)
PASS
ok  	github.com/databricks/databricks-sql-go/telemetry	0.005s
\`\`\`

## Design Alignment
Implementation follows the design document (telemetry/DESIGN.md, section
3.2) exactly. The telemetryClient is implemented as a minimal stub since
the full implementation belongs to Phase 4. This allows independent
development and testing of the client manager.

## Testing Instructions
\`\`\`bash
go test -v ./telemetry -run "TestGetClientManager|TestClientManager"
go test -v ./telemetry  # Run all telemetry tests
go build ./telemetry     # Verify build
\`\`\`

## Related Links
- Parent Ticket:
[PECOBLR-1143](https://databricks.atlassian.net/browse/PECOBLR-1143)
- This Ticket:
[PECOBLR-1147](https://databricks.atlassian.net/browse/PECOBLR-1147)
- Previous:
[PECOBLR-1146](https://databricks.atlassian.net/browse/PECOBLR-1146) -
Feature Flag Cache (#304)
- Design Doc: \`telemetry/DESIGN.md\`

## Next Steps
After this PR:
- PECOBLR-1148: Circuit Breaker Implementation


[PECOBLR-1143]:
https://databricks.atlassian.net/browse/PECOBLR-1143?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ
@Multimo Multimo closed this Dec 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants