A collection of reference implementations for lakehouse patterns with Bauplan and Prefect 3.0
This repository contains reference implementations of common data engineering patterns using Bauplan as the programmable lakehouse and Prefect for orchestration and scheduling.
Bauplan is the programmable lakehouse: you can load, transform, query data all from your code (CLI or Python). You can learn more here, read the docs or explore its architecture and ergonomics.
To use Bauplan, you need an API key for our preview environment: you can request one here.
Note: the current SDK version is 0.0.3a492 but it is subject to change as the platform evolves - ping us if you need help with any of the APIs used in this project.
We use uv to manage the required dependencies - you can synchronize the environment from the root of the repo with:
uv syncCheck the README in src/transformation for a reference implementation of ETL with Bauplan models, Prefect for orchestration and scheduling, and Streamlit for visualization.
Check the README in src/wap for a reference implementation of the Write-Audit-Publish pattern with Bauplan and Prefect.
The code in the project is licensed under the MIT License (Prefect and Bauplan are owned by their respective owners and have their own licenses).