AGENTS.md/CLAUDE.md - AI Agent Contribution Guide

This document provides comprehensive information for AI agents to understand and contribute to the Bruin project effectively.

Before You Finish: Mandatory Checks

You MUST run these commands before completing any task that modifies code:

Format the code: Run make format in the project root. Check git diff afterward — if there are formatting changes, stage and include them in your work.
Run the tests: Run make test in the project root. If any tests fail, fix the issues before finishing.

Do not consider your task complete until both checks pass. Use the /format-fix and /test commands if needed.

Project Overview
Architecture & Core Concepts
Development Environment
Build System
CLI Commands & Structure
Codebase Organization
Testing Strategy
Contributing Guidelines
Common Development Tasks

Project Overview

Bruin is an end-to-end data framework that combines data ingestion, transformation, and quality into a single tool. Think of it as "if dbt, Airbyte, and Great Expectations had a lovechild."

Core Features

Data Ingestion: Using ingestr and Python scripts
Transformations: SQL & Python on multiple platforms (BigQuery, Snowflake, DuckDB, etc.)
Data Quality: Built-in quality checks and validations
Materializations: Table/view materializations, incremental tables
Python Isolation: Using uv for isolated Python environments
Templating: Jinja templating for reusable code
Lineage: Dependency visualization and tracking
Multi-platform: Runs locally, on EC2, or GitHub Actions
Secrets Management: Environment variable injection
VS Code Extension: Enhanced developer experience

Design Principles

Version-controllable text: Everything configured via text files, no UI/database configs
Multi-technology support: SQL and Python natively, with pre-built binaries for complex use cases
Multi-source/destination: Support diverse sources and destinations
Mix-and-match: Single pipelines can combine different technologies, sources, and destinations
Avoid lock-in: Open-source Apache-licensed, runs anywhere

Architecture & Core Concepts

Assets

Anything that carries value derived from data:

Tables/views in databases
Files in S3/GCS
Machine learning models
Documents (Excel, Google Sheets, Notion, etc.)

Assets consist of:

Definition: Metadata enabling Bruin to understand the asset
Content: The actual query/logic that creates the asset

Pipelines

Groups of assets executed together in dependency order. Structure:

my-pipeline/
├─ pipeline.yml
└─ assets/
   ├─ asset1.sql
   └─ asset2.py

Pipeline Runs

Execution instances containing one or more asset instances with specific configuration and timing.

Development Environment

Prerequisites

Go: Version 1.23.0+ (see go.mod)
Python: For Python asset development and formatting
CGO: Required for DuckDB support
Git: For version control and repository detection

Dependencies

The project uses extensive Go dependencies including:

CLI framework: github.com/urfave/cli/v2
Database drivers: BigQuery, Snowflake, PostgreSQL, MySQL, DuckDB, etc.
Cloud SDKs: AWS, GCP
Templating: Jinja via github.com/nikolalohinski/gonja/v2
Testing: github.com/stretchr/testify

Build System

The Makefile provides comprehensive build and development targets:

Core Targets

Build Targets

make build          # Build with DuckDB support (CGO_ENABLED=1)
make build-no-duckdb # Build without DuckDB (CGO_ENABLED=0)

Development Targets

make deps           # Install dependencies and tools
make clean          # Remove build artifacts
make format         # Format Go and Python code
make tools          # Install development tools (gci, gofumpt, golangci-lint)
make tools-update   # Update development tools

Testing Targets

make test                      # Run unit tests
make test-unit                 # Run unit tests specifically
make integration-test          # Full integration tests with ingestr
make integration-test-light    # Light integration tests without ingestr
make integration-test-cloud    # Cloud-specific integration tests

Development Utilities

make lint-python                     # Format and lint Python code
make refresh-integration-expectations # Update integration test expectations

Build Configuration

Version: Set via main.Version variable, defaults to dev-$(git describe --tags --abbrev=0)
Telemetry: Controlled via TELEMETRY_KEY and TELEMETRY_OPTOUT environment variables
Tags: Uses no_duckdb_arrow for standard builds, bruin_no_duckdb for no-DuckDB builds

CLI Commands & Structure

Main Application Structure (`main.go`)

The CLI is built using github.com/urfave/cli/v2 with these core commands:

Commands: []*cli.Command{
    cmd.Lint(&isDebug),           // Lint pipelines and assets
    cmd.Run(&isDebug),            // Run pipelines/assets
    cmd.Render(),                 // Render Jinja templates
    cmd.Lineage(),                // Generate lineage graphs
    cmd.CleanCmd(),               // Clean up resources
    cmd.Format(&isDebug),         // Format code
    cmd.Docs(),                   // Open documentation
    cmd.Init(),                   // Initialize new projects
    cmd.Internal(),               // Internal/debugging commands
    cmd.Environments(&isDebug),   // Manage environments
    cmd.Connections(),            // Manage connections
    cmd.Query(),                  // Execute queries
    cmd.Patch(),                  // Patch assets
    cmd.DataDiffCmd(),            // Compare data between connections
    cmd.Import(),                 // Import database tables as assets
    versionCommand,               // Version information
}

Key Command Categories

Primary Commands

run: Execute pipelines with flags for workers, dates, environments, full-refresh
lint: Validate pipeline syntax and configuration
init: Bootstrap new Bruin projects
lineage: Generate dependency graphs

Management Commands

connections: List, add, delete, ping database connections
environments: Manage deployment environments
import: Import existing database tables as Bruin assets

Development Commands

format: Code formatting
render: Template rendering for debugging
docs: Open documentation (with --open flag for browser)

Internal Commands (Hidden)

internal parse-asset: Parse individual assets
internal parse-pipeline: Parse entire pipelines
internal connections: Connection schema operations

Codebase Organization

Package Structure (`pkg/`)

The codebase is organized into focused packages:

Core Packages

pipeline/: Pipeline parsing, execution, and management
config/: Configuration file handling (.bruin.yml)
connection/: Database connection management
executor/: Asset execution engine
lineage/: Dependency tracking and visualization
query/: Query execution and management

Data Platform Packages

Each supported platform has its own package:

Database platforms: bigquery/, snowflake/, postgres/, mysql/, duckdb/, clickhouse/, athena/, mssql/, databricks/, oracle/, sqlite/, trino/, synapse/, hana/, spanner/
Cloud storage: s3/, gcs/
Ingestion sources: 50+ packages for different data sources (e.g., shopify/, hubspot/, salesforce/, stripe/, etc.)

Utility Packages

jinja/: Template processing
python/: Python asset execution
lint/: Code linting and validation
diff/: Data comparison functionality
path/: File system utilities
git/: Git repository operations
telemetry/: Usage analytics
secrets/: Secret management
logger/: Logging utilities

Command Implementation (`cmd/`)

Each CLI command is implemented in its own file:

Command structure definition
Flag parsing and validation
Business logic delegation to appropriate packages
Error handling and output formatting

Testing Strategy

Test Types

Unit Tests

Location: Throughout pkg/ packages with *_test.go files
Execution: make test-unit
Coverage: Race detection enabled, 10-minute timeout
Scope: Excludes cloud integration tests

Integration Tests

Light Integration: make integration-test-light (excludes ingestr)
Full Integration: make integration-test (includes ingestr)
Cloud Integration: make integration-test-cloud (cloud platforms)

Test Data

Location: integration-tests/test-pipelines/
Coverage: Parse tests, lineage tests, execution tests
Expectations: JSON files with expected outputs
Refresh: make refresh-integration-expectations updates expectations

Test Patterns

Mock databases using github.com/DATA-DOG/go-sqlmock
PostgreSQL mocking with github.com/pashagolub/pgxmock/v3
Concurrent testing with github.com/sourcegraph/conc
File system abstraction with github.com/spf13/afero

Contributing Guidelines

Code Style & Formatting

Go Code

Tools automatically installed and run via make format:

gci: Import organization
gofumpt: Stricter Go formatting
golangci-lint: Comprehensive linting (10m timeout)
go vet: Static analysis

Python Code

Tools run via make lint-python:

ruff format: Code formatting
ruff check --fix: Linting with auto-fixes

Development Workflow

Setup: make deps to install tools and dependencies
Development: Edit code with VS Code extension for enhanced experience
Formatting: make format before committing
Testing: make test for unit tests, integration tests as appropriate
Building: make build to verify compilation

Adding New Data Platforms

Create package: pkg/newplatform/
Implement interfaces: Connection, query execution, schema introspection
Add CLI command: Register in main command list
Add tests: Unit and integration tests
Update documentation: Add to supported platforms list

Adding New CLI Commands

Create command file: cmd/newcommd.go
Implement command structure: Using cli.Command pattern
Add business logic: In appropriate pkg/ package
Register command: In main.go commands slice
Add tests: Command and business logic tests

Common Development Tasks

Running Locally

# Basic build and run
make build
./bin/bruin --help

# Development mode with debug
make build
./bin/bruin --debug [command]

Adding New Asset Types

Define asset type in pkg/pipeline/asset.go
Implement execution logic in pkg/executor/
Add parsing logic if needed
Update lineage detection if applicable
Add tests and integration tests

Debugging Integration Tests

# Run specific test pipeline
cd integration-tests
../bin/bruin run test-pipelines/your-test

# Refresh expectations after changes
make refresh-integration-expectations

Working with Templates

# Test template rendering
./bin/bruin render path/to/template.sql

# Test complete pipeline parsing
./bin/bruin internal parse-pipeline path/to/pipeline

Database Connection Testing

# List connections
./bin/bruin connections list

# Test connection
./bin/bruin connections test --name connection-name

# Add new connection
./bin/bruin connections add

This guide provides the foundational knowledge needed to contribute effectively to the Bruin project. For specific implementation details, refer to the extensive documentation in the docs/ directory and examine existing patterns in the codebase.

FilesExpand file tree

AGENTS.md

Latest commit

History