Skip to content

dlt-hub/boring-semantic-layer-demo

Repository files navigation

BSL: A Boring Semantic Layer for the Sakila Dataset

This repository showcases a complete, end-to-end semantic layer built on top of the Sakila database—the classic sample movie-rental dataset—using dlt and the Boring Semantic Layer libraries. The goal is to provide a simple, easy-to-follow example of how to build a semantic layer and reuse it across multiple downstream applications.

With a semantic layer in place, every downstream consumer—APIs, Streamlit apps, chatbots, ER-diagram generators, materializers, and more—can access the same unified data model using intuitive concepts like dimensions, measures, and filters.

This project demonstrates how to:

  • Load and normalize source data using dlt
  • Enrich tables and columns with semantic metadata
  • Transform and prepare data for analysis
  • Auto-generate a semantic model using an LLM
  • Build a fully operational semantic layer (dimensions, measures, joins)
  • Reuse that layer across multiple downstream applications, including:
    • A Streamlit data explorer
    • A FastAPI query interface
    • A chatbot
    • An ER-diagram generator
    • A materializer for creating derived tables in your warehouse

📦 Project Structure

bsl/
│
├── pipeline.py                # Full ETL + semantic inference pipeline
├── transformation.py          # Custom transformations (remove PII, time columns)
├── semantics/                 # Semantic modeling components
│   ├── llm.py                 # Generates semantic_model.json with OpenAI
│   ├── model_llm.py           # Builds BSL semantic model using LLM metadata
│   ├── model.py               # Alternative semantic model (non‑LLM) <-- do not use
│   ├── graph_generator.py     # Renders ERD from semantic model
│   ├── query_builder.py       # Turns API requests → ibis queries
│   └── table_references.py    # Manual table relationships
│
├── sakila-mod/                # Necessary files to build db in docker container
│
├── sources/                   # Data extraction from Sakila DB
│   ├── sources.py
│   └── rental.py
│
├── downstream_apps/
│ ├── kpi_explorer.py # Streamlit semantic explorer UI
│ ├── materializer.py # Materializes semantic queries into warehouse tables
│ ├── chatbot.py # MCP-based semantic chatbot server
│ └── api/
│   ├── models.py # Pydantic models for API requests/responses
│   ├── server.py # FastAPI semantic query server
│   └── test_client.py # Simple client script to test the API endpoints
│
└── open_image.py              # Helper to open PNG diagrams safely

🚀 Quick Start

1. Install dependencies

uv sync

You will need docker.


2. Configuration

cp .dlt/secrets.example.toml .dlt/secrets.toml

Fill in your OpenAI api key. You dont need to configure any database credentials, that is already done in compose.yaml and the default values.

Regarding pipeline name, dataset name and destination type, you can find them in constants.py. I have tested this in duckdb, clickhouse and snowflake destinations.


💾 Running the pipeline

The entrypoint is:

uv run python pipeline.py

Available flags

Flag Meaning
--nested-rental Loads a deeply nested version of rentals <-- do not use
--transform -t Runs custom transformations (PII removal, date handling)
--infer-schema -i Uses OpenAI to infer a semantic model + generate ERD

Example with full flow:

uv run python pipeline.py -t -i 

This will:

  1. Load all data into DuckDB
  2. Apply transformations
  3. Infer semantic model with LLM
  4. Generate semantic_model.json
  5. Render an ER diagram (diagram.png)
  6. Let you choose to apply the model

🧠 How the semantic model works

There are two options for building the semantic layer:

LLM‑Generated Semantic Model (Recommended and default unless you change code)

Files involved:

  • semantics/llm.py
  • semantics/model_llm.py
  • semantics/semantic_model.json

This mode:

  1. Reads the loaded dlt schema which contains database metadata (if present) and column hints (if present)
  2. Sends metadata to OpenAI
  3. Receives back structured semantic metadata
  4. Builds a full semantic model compatible with Boring Semantic Layer

📝 Manual / Schema‑Annotation Semantic Model (do not use)

Files involved:

  • semantics/model.py
  • semantics/columns.py
  • manual x-annotation-... hints

This mode derives semantics purely from schema annotations.


🛠 Transformations

transformation.py contains reusable transformations:

  • PII removal using x-annotation-pii
  • Time intelligence columns (year, rental_date, return_date)
  • Filtering of internal customers

These transformations run when --transform is passed.

They can also be run standalone:

uv run python transformation.py

📊 Streamlit Semantic Explorer

Launch the UI:

uv run python -m streamlit run downstream_apps/kpi_explorer.py

You can:

  • Select dimensions & measures
  • Add filters
  • Run queries through the semantic layer
  • Materialize results back into the dlt destination

🌐 Semantic API (FastAPI)

Start the API server:

uv run uvicorn downstream_apps.api.server:app --reload

Endpoints:

  • GET /dimensions
  • GET /measures
  • POST /query (returns JSON or parquet)

Docs in localhost:8000/docs. Or whatever your host:port/docs


🤖 Chatbot (MCP Semantic Server)

uv run python -m downstream_apps.chatbot

This exposes the semantic model via the MCP protocol so it can be used in LLM chat environments.


📈 ER Diagram Generation

When running with --infer-schema, the pipeline:

  • Generates semantic_model.json
  • Builds semantic_model.er
  • Renders a PNG ER diagram at diagram.png

To generate manually:

uv run python -m semantics.graph_generator

📦 Materializing Semantic Queries

Any semantic query can be written back into the destination database. This interface is not great, this should generally happen from the streamlit report.

Example:

uv run python bsl/downstream_apps/materializer.py -t my_table

This executes:

def materialize(pipeline, semantic_model, table_name):
    @dlt.resource(name=table_name, write_disposition="replace")
    def create():
        yield semantic_model.execute()
    pipeline.run(create())

🧪 Testing the API

Use the test client:

uv run python -m downstream_apps.api.test_client

It will:

  1. Fetch dimensions
  2. Fetch measures
  3. Build a random query
  4. Execute it
  5. Print sample output

🧹 Utility Scripts

Open PNG with fallback:

open_image.safe_open_image(path) safely opens diagrams across macOS, Windows, Linux.


About

This is a demo for the boring semantic layer

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors