This document provides comprehensive information for AI agents to understand and contribute to the Bruin project effectively.
You MUST run these commands before completing any task that modifies code:
- Format the code: Run
make formatin the project root. Checkgit diffafterward — if there are formatting changes, stage and include them in your work. - Run the tests: Run
make testin the project root. If any tests fail, fix the issues before finishing.
Do not consider your task complete until both checks pass. Use the /format-fix and /test commands if needed.
- Project Overview
- Architecture & Core Concepts
- Development Environment
- Build System
- CLI Commands & Structure
- Codebase Organization
- Testing Strategy
- Contributing Guidelines
- Common Development Tasks
Bruin is an end-to-end data framework that combines data ingestion, transformation, and quality into a single tool. Think of it as "if dbt, Airbyte, and Great Expectations had a lovechild."
- Data Ingestion: Using
ingestrand Python scripts - Transformations: SQL & Python on multiple platforms (BigQuery, Snowflake, DuckDB, etc.)
- Data Quality: Built-in quality checks and validations
- Materializations: Table/view materializations, incremental tables
- Python Isolation: Using
uvfor isolated Python environments - Templating: Jinja templating for reusable code
- Lineage: Dependency visualization and tracking
- Multi-platform: Runs locally, on EC2, or GitHub Actions
- Secrets Management: Environment variable injection
- VS Code Extension: Enhanced developer experience
- Version-controllable text: Everything configured via text files, no UI/database configs
- Multi-technology support: SQL and Python natively, with pre-built binaries for complex use cases
- Multi-source/destination: Support diverse sources and destinations
- Mix-and-match: Single pipelines can combine different technologies, sources, and destinations
- Avoid lock-in: Open-source Apache-licensed, runs anywhere
Anything that carries value derived from data:
- Tables/views in databases
- Files in S3/GCS
- Machine learning models
- Documents (Excel, Google Sheets, Notion, etc.)
Assets consist of:
- Definition: Metadata enabling Bruin to understand the asset
- Content: The actual query/logic that creates the asset
Groups of assets executed together in dependency order. Structure:
my-pipeline/
├─ pipeline.yml
└─ assets/
├─ asset1.sql
└─ asset2.py
Execution instances containing one or more asset instances with specific configuration and timing.
- Go: Version 1.23.0+ (see
go.mod) - Python: For Python asset development and formatting
- CGO: Required for DuckDB support
- Git: For version control and repository detection
The project uses extensive Go dependencies including:
- CLI framework:
github.com/urfave/cli/v2 - Database drivers: BigQuery, Snowflake, PostgreSQL, MySQL, DuckDB, etc.
- Cloud SDKs: AWS, GCP
- Templating: Jinja via
github.com/nikolalohinski/gonja/v2 - Testing:
github.com/stretchr/testify
The Makefile provides comprehensive build and development targets:
make build # Build with DuckDB support (CGO_ENABLED=1)
make build-no-duckdb # Build without DuckDB (CGO_ENABLED=0)make deps # Install dependencies and tools
make clean # Remove build artifacts
make format # Format Go and Python code
make tools # Install development tools (gci, gofumpt, golangci-lint)
make tools-update # Update development toolsmake test # Run unit tests
make test-unit # Run unit tests specifically
make integration-test # Full integration tests with ingestr
make integration-test-light # Light integration tests without ingestr
make integration-test-cloud # Cloud-specific integration testsmake lint-python # Format and lint Python code
make refresh-integration-expectations # Update integration test expectations- Version: Set via
main.Versionvariable, defaults todev-$(git describe --tags --abbrev=0) - Telemetry: Controlled via
TELEMETRY_KEYandTELEMETRY_OPTOUTenvironment variables - Tags: Uses
no_duckdb_arrowfor standard builds,bruin_no_duckdbfor no-DuckDB builds
The CLI is built using github.com/urfave/cli/v2 with these core commands:
Commands: []*cli.Command{
cmd.Lint(&isDebug), // Lint pipelines and assets
cmd.Run(&isDebug), // Run pipelines/assets
cmd.Render(), // Render Jinja templates
cmd.Lineage(), // Generate lineage graphs
cmd.CleanCmd(), // Clean up resources
cmd.Format(&isDebug), // Format code
cmd.Docs(), // Open documentation
cmd.Init(), // Initialize new projects
cmd.Internal(), // Internal/debugging commands
cmd.Environments(&isDebug), // Manage environments
cmd.Connections(), // Manage connections
cmd.Query(), // Execute queries
cmd.Patch(), // Patch assets
cmd.DataDiffCmd(), // Compare data between connections
cmd.Import(), // Import database tables as assets
versionCommand, // Version information
}run: Execute pipelines with flags for workers, dates, environments, full-refreshlint: Validate pipeline syntax and configurationinit: Bootstrap new Bruin projectslineage: Generate dependency graphs
connections: List, add, delete, ping database connectionsenvironments: Manage deployment environmentsimport: Import existing database tables as Bruin assets
format: Code formattingrender: Template rendering for debuggingdocs: Open documentation (with--openflag for browser)
Internal Commands (Hidden)
internal parse-asset: Parse individual assetsinternal parse-pipeline: Parse entire pipelinesinternal connections: Connection schema operations
The codebase is organized into focused packages:
pipeline/: Pipeline parsing, execution, and managementconfig/: Configuration file handling (.bruin.yml)connection/: Database connection managementexecutor/: Asset execution enginelineage/: Dependency tracking and visualizationquery/: Query execution and management
Each supported platform has its own package:
- Database platforms:
bigquery/,snowflake/,postgres/,mysql/,duckdb/,clickhouse/,athena/,mssql/,databricks/,oracle/,sqlite/,trino/,synapse/,hana/,spanner/ - Cloud storage:
s3/,gcs/ - Ingestion sources: 50+ packages for different data sources (e.g.,
shopify/,hubspot/,salesforce/,stripe/, etc.)
jinja/: Template processingpython/: Python asset executionlint/: Code linting and validationdiff/: Data comparison functionalitypath/: File system utilitiesgit/: Git repository operationstelemetry/: Usage analyticssecrets/: Secret managementlogger/: Logging utilities
Each CLI command is implemented in its own file:
- Command structure definition
- Flag parsing and validation
- Business logic delegation to appropriate packages
- Error handling and output formatting
- Location: Throughout
pkg/packages with*_test.gofiles - Execution:
make test-unit - Coverage: Race detection enabled, 10-minute timeout
- Scope: Excludes cloud integration tests
- Light Integration:
make integration-test-light(excludes ingestr) - Full Integration:
make integration-test(includes ingestr) - Cloud Integration:
make integration-test-cloud(cloud platforms)
- Location:
integration-tests/test-pipelines/ - Coverage: Parse tests, lineage tests, execution tests
- Expectations: JSON files with expected outputs
- Refresh:
make refresh-integration-expectationsupdates expectations
- Mock databases using
github.com/DATA-DOG/go-sqlmock - PostgreSQL mocking with
github.com/pashagolub/pgxmock/v3 - Concurrent testing with
github.com/sourcegraph/conc - File system abstraction with
github.com/spf13/afero
Tools automatically installed and run via make format:
gci: Import organizationgofumpt: Stricter Go formattinggolangci-lint: Comprehensive linting (10m timeout)go vet: Static analysis
Tools run via make lint-python:
ruff format: Code formattingruff check --fix: Linting with auto-fixes
- Setup:
make depsto install tools and dependencies - Development: Edit code with VS Code extension for enhanced experience
- Formatting:
make formatbefore committing - Testing:
make testfor unit tests, integration tests as appropriate - Building:
make buildto verify compilation
- Create package:
pkg/newplatform/ - Implement interfaces: Connection, query execution, schema introspection
- Add CLI command: Register in main command list
- Add tests: Unit and integration tests
- Update documentation: Add to supported platforms list
- Create command file:
cmd/newcommd.go - Implement command structure: Using
cli.Commandpattern - Add business logic: In appropriate
pkg/package - Register command: In
main.gocommands slice - Add tests: Command and business logic tests
# Basic build and run
make build
./bin/bruin --help
# Development mode with debug
make build
./bin/bruin --debug [command]- Define asset type in
pkg/pipeline/asset.go - Implement execution logic in
pkg/executor/ - Add parsing logic if needed
- Update lineage detection if applicable
- Add tests and integration tests
# Run specific test pipeline
cd integration-tests
../bin/bruin run test-pipelines/your-test
# Refresh expectations after changes
make refresh-integration-expectations# Test template rendering
./bin/bruin render path/to/template.sql
# Test complete pipeline parsing
./bin/bruin internal parse-pipeline path/to/pipeline# List connections
./bin/bruin connections list
# Test connection
./bin/bruin connections test --name connection-name
# Add new connection
./bin/bruin connections addThis guide provides the foundational knowledge needed to contribute effectively to the Bruin project. For specific implementation details, refer to the extensive documentation in the docs/ directory and examine existing patterns in the codebase.