GitHub - openlobbying/muckrake: A data framework for creating Follow The Money data.

A framework for creating and storing FollowTheMoney entities, used by OpenLobbying.

Warning

This is a work in progress. Expect breaking changes and incomplete features.

Muckrake

Muckrake is the data pipeline. It is partially inspired by zavod and other FollowTheMoney tools.

Run uv run muckrake --help for a full list of available commands.

Crawlers

You can find crawlers for various datasets in datasets/. At a minimum, each dataset consists of a config.yml with metadata and a crawl.py script that outputs FollowTheMoney statements in CSV format.

To crawl a dataset, run uv run muckrake crawl {dataset_name}. Run uv run muckrake list to see available datasets.

AI-based NER

Many data sources have composite fields that contain multiple entities. We use LLMs to extract unique entities and relationships from these fields, and store them as candidates in the database for review and approval. See NER docs for details.

# Create extraction candidates for one dataset
uv run muckrake ner-extract open_access --extractor llm --limit 50

# Review candidates in a terminal UI
uv run muckrake ner-review

Dedupe

Our goal is to link entities across datasets to provide a unified view of lobbying and political finance for any given person, company, or organisation.

# Create dedupe candidates across all datasets
uv run muckrake xref

# Review candidates in a terminal UI
uv run muckrake dedupe

We also want to collapse duplicate relationship edges across datasets, especially for ORCL and PRCA. This is done automatically, no review step required.

uv run muckrake dedupe-edges

Loading

Statements are loaded into a working store (SQLite for local development, Postgres in production) with uv run muckrake load. This reads the statements CSV files and applies any approved NER candidates before materialising entities and relationships.

OpenLobbying

The primary user of Muckrake data is OpenLobbying, an open database of lobbying and political finance data.

Start the API server:

uv run muckrake server

Start the Svelte frontend:

cd openlobbying
npm run dev

In development, frontend requests to /api/* are proxied to http://127.0.0.1:8000 via Vite.

Database configuration

Local default: SQLite at data/muckrake.db.
Production: set MUCKRAKE_DATABASE_URL to a SQLAlchemy-compatible Postgres URL, for example:

export MUCKRAKE_DATABASE_URL="postgresql+psycopg://muckrake:[email protected]:5432/muckrake"

Deployment docs

VPS guide and templates: docs/deploy/README.md
MVP deploy model: promote one curated DB artifact (includes dedupe + NER state), not dataset files alone.
One-command deploy (code + data): ./scripts/deploy_to_vps.sh {ip_address}

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.github		.github
datasets		datasets
datasets_to_do		datasets_to_do
docs		docs
openlobbying		openlobbying
scripts		scripts
src/muckrake		src/muckrake
.gitignore		.gitignore
.python-version		.python-version
AGENTS.md		AGENTS.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Muckrake

Crawlers

AI-based NER

Dedupe

Loading

OpenLobbying

Database configuration

Deployment docs

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Muckrake

Crawlers

AI-based NER

Dedupe

Loading

OpenLobbying

Database configuration

Deployment docs

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages