Skip to content

datonic/datadex

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

D A T A D E X

The Open Data Platform for your community Open Data

GitHub GitHub Workflow Status GitHub Repo stars

Open-source, serverless, and local-first data platform for your community. Datadex helps communities collaborate on Open Data, increasing the community's coordination and shared understanding by making it easy to build and publish data products, by your community, for your community.

Note

The previous version of Datadex, which utilized Dagster and DuckDB, can be found at this commit.

πŸš€ Implementations

Datadex is a pattern, not only a project. Check real-world production Open Data Portals based on Datadex:

  • Datania. Open Data Platform that unifies and harmonizes information relevant Spanish datasets from different sources.
  • Filecoin Data Portal. The main open data portal around the Filecoin ecosystem.
  • LUNG-SARG. Open Data Platform for Sustainable, Accessible Lung Radiogenomics.
  • Gitcoin Grants Data Portal. A Data hub for Gitcoin Grants data and related models.

πŸ’‘ Principles

Make working with open data easy and accessible by using modern tooling and approaches.

  • Open: Code, standards, infrastructure, and data, all public and open source. Rely on open source tools, standards, public infrastructure, and accessible data formats.
  • Modular and Interoperable: Easy to replace, extend or remove components of the pattern. Environment flexibility (laptop, cluster, browser) when running and when deploying (S3 + GH Pages, IPFS, Hugging Face).
  • Permissionless: Any improvement is one Pull Request away. Update pipelines, add datasets, or improve documentation. No API limits, just plain open files.
  • Simple: Static assets, batch jobs.
  • Data as Code: Reproducible datasets with declarative stateless transformations tracked in git. Data is versioned alongside the code.
  • Glue: Be a bridge between tools and approaches. Follow UNIX philosophy.

βš™οΈ Setup

You can get started easily by setting up a Python virtual environment. If you hit any issue, please open an issue!

🐍 Python Virtual Environment

Install uv and let it manage the Python environment. The following commands will install the dependencies.

make setup

Alternatively, you can rely on your system's Python installation to create a virtual environment and install the dependencies.

# Create a virtual environment
python3 -m venv .venv
source .venv/bin/activate

# Install the package and dependencies
pip install -e .

🐳 Docker / Dev Containers

You can use VSCode Remote Containers to get started with Datadex too. If you have Docker installed and running, open the project in VSCode and click on the bottom right corner to open the project in a container.

The development environment can also run in your browser thanks to GitHub Codespaces!

badge

πŸ“œ License

Datadex is licensed under the MIT License. See the LICENSE file for details.

About

πŸ“¦ Serverless and local-first Open Data Platform

Topics

Resources

License

Stars

Watchers

Forks