ThousandWorlds

The search for life beyond Earth depends on the molecular signatures it leaves behind in the atmospheres of its host planet. Correctly interpreting these signatures requires understanding the climates of potential host planets. ThousandWorlds is a benchmark for emulating these exoplanet climates: 1760 simulations across 5 GCMs, 8 planet parameters, and atmospheric variables on a 32 x 64 x 10 latitude-longitude-pressure grid. It includes three nested benchmark subsets, two evaluation protocols, and eight released baseline methods.

Explore the dataset + discovered exoplanets online with the ThousandWorlds Explorer! Built by Hamza Ali Shahjahan!

Quickstart

pip install -e .

import numpy as np
import thousandworlds as tw

tw.download_dataset()
bundle = tw.load("single-complete", data_dir="dataset")

pred = np.broadcast_to(bundle.Y_train.mean(axis=0), bundle.Y_test.shape)
scores = tw.evaluate.rmse(pred, bundle.Y_test, bundle.field_mask_test, bundle.field_names)
scores["per_variable"]

See notebooks/quickstart.ipynb for a short walkthrough.

Installation

pip install -e .              # core: data loading + evaluation
pip install -e '.[models]'    # baseline model dependencies
pip install -e '.[notebooks]' # notebook dependencies

Dataset

The benchmark dataset is hosted on Hugging Face. The repository already contains metadata and directory layout; this fills in the large array files:

python -c "import thousandworlds as tw; tw.download_dataset()"

Once downloaded, notebooks/explore_trappist1e.ipynb tours an example world's climate.

Baselines

Published baseline prediction results are distributed as separate archives:

python -c "import thousandworlds as tw; tw.download_baselines()"

To run baselines yourself:

python -m thousandworlds.run_model train_mean single-complete
python -m thousandworlds.run_model --config results/models/multi-partial/pca_mlp/config.json

The first form runs a method on a subset with default hyperparameters (override with flags); the second reproduces a published baseline from its checked-in config.json. Each run writes predictions, metrics, and the resolved config to results/models/<subset>/<method>/, overwriting the checked-in results by default (use --out-dir to redirect).

See notebooks/pca_mlp.ipynb for a quick example that trains a baseline in-notebook and compares its predictions to the targets.

Repo Structure

thousandworlds/
  data.py               # download + load
  preprocessing.py      # input/output transforms, normalization
  spectral.py           # spectral coefficients <-> gridded fields
  evaluate.py           # RMSE, ACC, energy score, spread-skill ratio, etc.
  run_model.py          # CLI entry point
  make_model_tables.py  # regenerate result tables
  models/               # baseline implementations
  assets/               # precomputed SHT matrix, latitude weights

dataset/                # inputs.csv, subset CSVs, arrays after download
results/                # configs, metrics, published tables
notebooks/              # quickstart, explore_trappist1e, pca_mlp
tests/                  # test suite

Citation

If you use ThousandWorlds, please cite the paper:

@article{thousandworlds2026,
  title = {ThousandWorlds: A benchmark for climate emulation of potentially habitable exoplanets},
  author = {Stevenson, Edward T. and Mak, Mei Ting and Wolf, Eric and Sergeev, Denis E. and Hammond, Tobi and Mayne, N. J. and Cranmer, Miles},
  year = {2026},
  eprint = {2606.18338},
  archivePrefix = {arXiv},
  doi = {10.48550/arXiv.2606.18338}
}

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
dataset		dataset
imgs		imgs
notebooks		notebooks
results		results
tests		tests
thousandworlds		thousandworlds
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CITATION.cff		CITATION.cff
HF_CARD.md		HF_CARD.md
LICENSE		LICENSE
README.md		README.md
croissant.json		croissant.json
pyproject.toml		pyproject.toml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ThousandWorlds

Quickstart

Installation

Dataset

Baselines

Repo Structure

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

ThousandWorlds

Quickstart

Installation

Dataset

Baselines

Repo Structure

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages