Skip to content

hipposys-ltd/dbt-kg

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

⚠️ REPOSITORY MOVED - NO LONGER MAINTAINED HERE

This repository has been transferred to new ownership and is no longer actively maintained in this location.

🔄 Migration Notice

This repository and all associated open-source packages have been moved to a new GitHub organization.

New Location: https://github.com/ponderedw

📍 What This Means

  • Active development continues at the new location
  • Latest updates and releases are published there
  • Issues and pull requests should be submitted to the new repository
  • ⚠️ This repository will no longer receive updates

🔗 Find the Updated Repository

Please visit https://github.com/ponderedw to:

  • Access the latest version of this package
  • Report issues or contribute
  • View updated documentation
  • Get support from the maintainers

Thank you for your understanding during this transition.

dbt-kg: Graph-Powered AI Chat for dbt

Transform your dbt metadata into an intelligent, queryable knowledge graph using FalkorDB or Neo4j, and interact with it through AI-powered natural language chat.

Overview

dbt-kg bridges the gap between complex dbt project structures and intuitive querying by leveraging graph databases. Upload your dbt metadata (manifest.json and catalog.json) and start asking sophisticated questions about dependencies, materializations, and project-wide relationships in plain English.

Features

  • 🚀 Graph Database Integration: Works with both FalkorDB and Neo4j
  • 🤖 AI-Powered Chat: Natural language querying of your dbt project
  • 📊 Complex Dependency Analysis: Recursive dependency tracking and analysis
  • 🔍 Smart Job Grouping: Intelligent separation strategies for orchestration
  • 🌐 Web Interface: Easy-to-use Streamlit-based UI

Quick Start

Prerequisites

  • Docker and Docker Compose
  • Just command runner (optional but recommended)

Installation

  1. Clone the repository

    git clone https://github.com/hipposys-ltd/dbt-kg.git
    cd dbt-kg
  2. Set up environment configuration

    cp template.env .env
  3. Configure your environment variables

    Edit .env file with your preferred settings:

    LLM Provider Configuration:

    # Choose your LLM provider (bedrock, anthropic, or openai)
    LLM_MODEL_ID='anthropic:claude-3-sonnet-20240229'
    # LLM_MODEL_ID='openai:gpt-4'
    # LLM_MODEL_ID='bedrock:anthropic.claude-3-sonnet-20240229-v1:0'

    LLM API Keys:

    # Anthropic
    ANTHROPIC_API_KEY=your_anthropic_api_key_here
    
    # OpenAI
    # OPENAI_API_KEY=your_openai_api_key_here
    
    # AWS Bedrock
    # AWS_ACCESS_KEY_ID=your_aws_access_key_id
    # AWS_SECRET_ACCESS_KEY=your_aws_secret_access_key
    # AWS_DEFAULT_REGION=us-east-1

    Graph Database Configuration:

    # Choose your graph database (falkordb or neo4j)
    GRAPH_DB=falkordb
    
    # Database authentication (uncomment if needed)
    # GRAPH_USER=neo4j
    # GRAPH_PASSWORD=password
  4. Start the application

    just all

    Or without Just:

    docker-compose up -d
  5. Access the application

Usage

Loading Your dbt Project

  1. Navigate to http://localhost:8501
  2. Go to the "Load DBT Manifest" page from the sidebar
  3. Upload your manifest.json and catalog.json files
    • Use your own project files, or
    • Use the sample files in DbtExampleProject/target/
  4. Wait for the confirmation message

Exploring Your Graph

  1. Open the graph database UI at http://localhost:3000
  2. Run this query to visualize your dbt knowledge graph:
    MATCH (a)-[b]-(c)
    RETURN a, b, c

AI-Powered Querying

Return to the main page (http://localhost:8501) and start asking questions:

Example queries:

  • "Can you find the source tables that are used by the most models, including indirect dependencies?"
  • "Please identify the model with the highest number of dependent models, including transitive dependencies"
  • "Can you analyze our dbt dependency graph and create a job separation strategy?"
  • "List all models that depend on the raw.orders source"

Configuration Options

Environment Variables

Variable Description Required Default
LLM_MODEL_ID LLM model identifier with provider prefix Yes -
ANTHROPIC_API_KEY Anthropic API key If using Anthropic -
OPENAI_API_KEY OpenAI API key If using OpenAI -
AWS_ACCESS_KEY_ID AWS access key for Bedrock If using Bedrock -
AWS_SECRET_ACCESS_KEY AWS secret key for Bedrock If using Bedrock -
AWS_DEFAULT_REGION AWS region for Bedrock If using Bedrock us-east-1
GRAPH_DB Graph database type Yes falkordb
GRAPH_USER Graph database username If auth required -
GRAPH_PASSWORD Graph database password If auth required -

Supported LLM Providers

  • Anthropic: anthropic:claude-3-sonnet-20240229
  • OpenAI: openai:gpt-4 or openai:gpt-3.5-turbo
  • AWS Bedrock: bedrock:anthropic.claude-3-sonnet-20240229-v1:0

Supported Graph Databases

  • FalkorDB: Lightweight, Redis-based graph database
  • Neo4j: Full-featured graph database platform

Use Cases

Dependency Analysis

  • Find circular dependencies
  • Identify most reused sources
  • Trace dependency chains recursively

Project Optimization

  • Identify bottleneck models
  • Optimize materialization strategies
  • Plan incremental improvements

Orchestration Planning

  • Break monolithic jobs into manageable chunks
  • Define clear inter-job dependencies
  • Eliminate circular dependencies in DAG design

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages