BSL: A Boring Semantic Layer for the Sakila Dataset

This repository showcases a complete, end-to-end semantic layer built on top of the Sakila database—the classic sample movie-rental dataset—using dlt and the Boring Semantic Layer libraries. The goal is to provide a simple, easy-to-follow example of how to build a semantic layer and reuse it across multiple downstream applications.

With a semantic layer in place, every downstream consumer—APIs, Streamlit apps, chatbots, ER-diagram generators, materializers, and more—can access the same unified data model using intuitive concepts like dimensions, measures, and filters.

This project demonstrates how to:

Load and normalize source data using dlt
Enrich tables and columns with semantic metadata
Transform and prepare data for analysis
Auto-generate a semantic model using an LLM
Build a fully operational semantic layer (dimensions, measures, joins)
Reuse that layer across multiple downstream applications, including:
- A Streamlit data explorer
- A FastAPI query interface
- A chatbot
- An ER-diagram generator
- A materializer for creating derived tables in your warehouse

📦 Project Structure

bsl/
│
├── pipeline.py                # Full ETL + semantic inference pipeline
├── transformation.py          # Custom transformations (remove PII, time columns)
├── semantics/                 # Semantic modeling components
│   ├── llm.py                 # Generates semantic_model.json with OpenAI
│   ├── model_llm.py           # Builds BSL semantic model using LLM metadata
│   ├── model.py               # Alternative semantic model (non‑LLM) <-- do not use
│   ├── graph_generator.py     # Renders ERD from semantic model
│   ├── query_builder.py       # Turns API requests → ibis queries
│   └── table_references.py    # Manual table relationships
│
├── sakila-mod/                # Necessary files to build db in docker container
│
├── sources/                   # Data extraction from Sakila DB
│   ├── sources.py
│   └── rental.py
│
├── downstream_apps/
│ ├── kpi_explorer.py # Streamlit semantic explorer UI
│ ├── materializer.py # Materializes semantic queries into warehouse tables
│ ├── chatbot.py # MCP-based semantic chatbot server
│ └── api/
│   ├── models.py # Pydantic models for API requests/responses
│   ├── server.py # FastAPI semantic query server
│   └── test_client.py # Simple client script to test the API endpoints
│
└── open_image.py              # Helper to open PNG diagrams safely

🚀 Quick Start

1. Install dependencies

uv sync

You will need docker.

2. Configuration

cp .dlt/secrets.example.toml .dlt/secrets.toml

Fill in your OpenAI api key. You dont need to configure any database credentials, that is already done in compose.yaml and the default values.

Regarding pipeline name, dataset name and destination type, you can find them in constants.py. I have tested this in duckdb, clickhouse and snowflake destinations.

💾 Running the pipeline

The entrypoint is:

uv run python pipeline.py

Available flags

Flag	Meaning
`--nested-rental`	Loads a deeply nested version of rentals <-- do not use
`--transform` `-t`	Runs custom transformations (PII removal, date handling)
`--infer-schema` `-i`	Uses OpenAI to infer a semantic model + generate ERD

Example with full flow:

uv run python pipeline.py -t -i

This will:

Load all data into DuckDB
Apply transformations
Infer semantic model with LLM
Generate semantic_model.json
Render an ER diagram (diagram.png)
Let you choose to apply the model

🧠 How the semantic model works

There are two options for building the semantic layer:

✅ LLM‑Generated Semantic Model (Recommended and default unless you change code)

Files involved:

semantics/llm.py
semantics/model_llm.py
semantics/semantic_model.json

This mode:

Reads the loaded dlt schema which contains database metadata (if present) and column hints (if present)
Sends metadata to OpenAI
Receives back structured semantic metadata
Builds a full semantic model compatible with Boring Semantic Layer

📝 Manual / Schema‑Annotation Semantic Model (do not use)

Files involved:

semantics/model.py
semantics/columns.py
manual x-annotation-... hints

This mode derives semantics purely from schema annotations.

🛠 Transformations

transformation.py contains reusable transformations:

PII removal using x-annotation-pii
Time intelligence columns (year, rental_date, return_date)
Filtering of internal customers

These transformations run when --transform is passed.

They can also be run standalone:

uv run python transformation.py

📊 Streamlit Semantic Explorer

Launch the UI:

uv run python -m streamlit run downstream_apps/kpi_explorer.py

You can:

Select dimensions & measures
Add filters
Run queries through the semantic layer
Materialize results back into the dlt destination

🌐 Semantic API (FastAPI)

Start the API server:

uv run uvicorn downstream_apps.api.server:app --reload

Endpoints:

GET /dimensions
GET /measures
POST /query (returns JSON or parquet)

Docs in localhost:8000/docs. Or whatever your host:port/docs

🤖 Chatbot (MCP Semantic Server)

uv run python -m downstream_apps.chatbot

This exposes the semantic model via the MCP protocol so it can be used in LLM chat environments.

📈 ER Diagram Generation

When running with --infer-schema, the pipeline:

Generates semantic_model.json
Builds semantic_model.er
Renders a PNG ER diagram at diagram.png

To generate manually:

uv run python -m semantics.graph_generator

📦 Materializing Semantic Queries

Any semantic query can be written back into the destination database. This interface is not great, this should generally happen from the streamlit report.

Example:

uv run python bsl/downstream_apps/materializer.py -t my_table

This executes:

def materialize(pipeline, semantic_model, table_name):
    @dlt.resource(name=table_name, write_disposition="replace")
    def create():
        yield semantic_model.execute()
    pipeline.run(create())

🧪 Testing the API

Use the test client:

uv run python -m downstream_apps.api.test_client

It will:

Fetch dimensions
Fetch measures
Build a random query
Execute it
Print sample output

🧹 Utility Scripts

Open PNG with fallback:

open_image.safe_open_image(path) safely opens diagrams across macOS, Windows, Linux.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.dlt		.dlt
clickhouse		clickhouse
downstream_apps		downstream_apps
sakila-db-mod		sakila-db-mod
semantics		semantics
sources		sources
.env		.env
.gitignore		.gitignore
README.md		README.md
compose.yaml		compose.yaml
constants.py		constants.py
mypy.ini		mypy.ini
open_image.py		open_image.py
pipeline.py		pipeline.py
pyproject.toml		pyproject.toml
run_ngrok.sh		run_ngrok.sh
transformation.py		transformation.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BSL: A Boring Semantic Layer for the Sakila Dataset

📦 Project Structure

🚀 Quick Start

1. Install dependencies

2. Configuration

💾 Running the pipeline

Available flags

Example with full flow:

🧠 How the semantic model works

✅ LLM‑Generated Semantic Model (Recommended and default unless you change code)

📝 Manual / Schema‑Annotation Semantic Model (do not use)

🛠 Transformations

📊 Streamlit Semantic Explorer

🌐 Semantic API (FastAPI)

🤖 Chatbot (MCP Semantic Server)

📈 ER Diagram Generation

📦 Materializing Semantic Queries

🧪 Testing the API

🧹 Utility Scripts

Open PNG with fallback:

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

dlt-hub/boring-semantic-layer-demo

Folders and files

Latest commit

History

Repository files navigation

BSL: A Boring Semantic Layer for the Sakila Dataset

📦 Project Structure

🚀 Quick Start

1. Install dependencies

2. Configuration

💾 Running the pipeline

Available flags

Example with full flow:

🧠 How the semantic model works

✅ LLM‑Generated Semantic Model (Recommended and default unless you change code)

📝 Manual / Schema‑Annotation Semantic Model (do not use)

🛠 Transformations

📊 Streamlit Semantic Explorer

🌐 Semantic API (FastAPI)

🤖 Chatbot (MCP Semantic Server)

📈 ER Diagram Generation

📦 Materializing Semantic Queries

🧪 Testing the API

🧹 Utility Scripts

Open PNG with fallback:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages