This repository showcases a complete, end-to-end semantic layer built on top of the Sakila database—the classic sample movie-rental dataset—using dlt and the Boring Semantic Layer libraries. The goal is to provide a simple, easy-to-follow example of how to build a semantic layer and reuse it across multiple downstream applications.
With a semantic layer in place, every downstream consumer—APIs, Streamlit apps, chatbots, ER-diagram generators, materializers, and more—can access the same unified data model using intuitive concepts like dimensions, measures, and filters.
This project demonstrates how to:
- Load and normalize source data using dlt
- Enrich tables and columns with semantic metadata
- Transform and prepare data for analysis
- Auto-generate a semantic model using an LLM
- Build a fully operational semantic layer (dimensions, measures, joins)
- Reuse that layer across multiple downstream applications, including:
- A Streamlit data explorer
- A FastAPI query interface
- A chatbot
- An ER-diagram generator
- A materializer for creating derived tables in your warehouse
bsl/
│
├── pipeline.py # Full ETL + semantic inference pipeline
├── transformation.py # Custom transformations (remove PII, time columns)
├── semantics/ # Semantic modeling components
│ ├── llm.py # Generates semantic_model.json with OpenAI
│ ├── model_llm.py # Builds BSL semantic model using LLM metadata
│ ├── model.py # Alternative semantic model (non‑LLM) <-- do not use
│ ├── graph_generator.py # Renders ERD from semantic model
│ ├── query_builder.py # Turns API requests → ibis queries
│ └── table_references.py # Manual table relationships
│
├── sakila-mod/ # Necessary files to build db in docker container
│
├── sources/ # Data extraction from Sakila DB
│ ├── sources.py
│ └── rental.py
│
├── downstream_apps/
│ ├── kpi_explorer.py # Streamlit semantic explorer UI
│ ├── materializer.py # Materializes semantic queries into warehouse tables
│ ├── chatbot.py # MCP-based semantic chatbot server
│ └── api/
│ ├── models.py # Pydantic models for API requests/responses
│ ├── server.py # FastAPI semantic query server
│ └── test_client.py # Simple client script to test the API endpoints
│
└── open_image.py # Helper to open PNG diagrams safely
uv syncYou will need docker.
cp .dlt/secrets.example.toml .dlt/secrets.tomlFill in your OpenAI api key. You dont need to configure any database credentials, that is already done in compose.yaml and the default values.
Regarding pipeline name, dataset name and destination type, you can find them in constants.py. I have tested this in duckdb, clickhouse and snowflake destinations.
The entrypoint is:
uv run python pipeline.py| Flag | Meaning |
|---|---|
--nested-rental |
Loads a deeply nested version of rentals <-- do not use |
--transform -t |
Runs custom transformations (PII removal, date handling) |
--infer-schema -i |
Uses OpenAI to infer a semantic model + generate ERD |
uv run python pipeline.py -t -i This will:
- Load all data into DuckDB
- Apply transformations
- Infer semantic model with LLM
- Generate
semantic_model.json - Render an ER diagram (
diagram.png) - Let you choose to apply the model
There are two options for building the semantic layer:
Files involved:
semantics/llm.pysemantics/model_llm.pysemantics/semantic_model.json
This mode:
- Reads the loaded dlt schema which contains database metadata (if present) and column hints (if present)
- Sends metadata to OpenAI
- Receives back structured semantic metadata
- Builds a full semantic model compatible with Boring Semantic Layer
Files involved:
semantics/model.pysemantics/columns.py- manual
x-annotation-...hints
This mode derives semantics purely from schema annotations.
transformation.py contains reusable transformations:
- PII removal using
x-annotation-pii - Time intelligence columns (year, rental_date, return_date)
- Filtering of internal customers
These transformations run when --transform is passed.
They can also be run standalone:
uv run python transformation.pyLaunch the UI:
uv run python -m streamlit run downstream_apps/kpi_explorer.pyYou can:
- Select dimensions & measures
- Add filters
- Run queries through the semantic layer
- Materialize results back into the dlt destination
Start the API server:
uv run uvicorn downstream_apps.api.server:app --reloadEndpoints:
GET /dimensionsGET /measuresPOST /query(returns JSON or parquet)
Docs in localhost:8000/docs. Or whatever your host:port/docs
uv run python -m downstream_apps.chatbotThis exposes the semantic model via the MCP protocol so it can be used in LLM chat environments.
When running with --infer-schema, the pipeline:
- Generates
semantic_model.json - Builds
semantic_model.er - Renders a PNG ER diagram at
diagram.png
To generate manually:
uv run python -m semantics.graph_generatorAny semantic query can be written back into the destination database. This interface is not great, this should generally happen from the streamlit report.
Example:
uv run python bsl/downstream_apps/materializer.py -t my_tableThis executes:
def materialize(pipeline, semantic_model, table_name):
@dlt.resource(name=table_name, write_disposition="replace")
def create():
yield semantic_model.execute()
pipeline.run(create())Use the test client:
uv run python -m downstream_apps.api.test_clientIt will:
- Fetch dimensions
- Fetch measures
- Build a random query
- Execute it
- Print sample output
open_image.safe_open_image(path) safely opens diagrams across macOS, Windows, Linux.