The search for life beyond Earth depends on the molecular signatures it leaves behind in the atmospheres of its host planet. Correctly interpreting these signatures requires understanding the climates of potential host planets. ThousandWorlds is a benchmark for emulating these exoplanet climates: 1760 simulations across 5 GCMs, 8 planet parameters, and atmospheric variables on a 32 x 64 x 10 latitude-longitude-pressure grid. It includes three nested benchmark subsets, two evaluation protocols, and eight released baseline methods.
Explore the dataset + discovered exoplanets online with the ThousandWorlds Explorer! Built by Hamza Ali Shahjahan!
pip install -e .import numpy as np
import thousandworlds as tw
tw.download_dataset()
bundle = tw.load("single-complete", data_dir="dataset")
pred = np.broadcast_to(bundle.Y_train.mean(axis=0), bundle.Y_test.shape)
scores = tw.evaluate.rmse(pred, bundle.Y_test, bundle.field_mask_test, bundle.field_names)
scores["per_variable"]See notebooks/quickstart.ipynb for a short
walkthrough.
pip install -e . # core: data loading + evaluation
pip install -e '.[models]' # baseline model dependencies
pip install -e '.[notebooks]' # notebook dependenciesThe benchmark dataset is hosted on Hugging Face. The repository already contains metadata and directory layout; this fills in the large array files:
python -c "import thousandworlds as tw; tw.download_dataset()"Once downloaded, notebooks/explore_trappist1e.ipynb
tours an example world's climate.
Published baseline prediction results are distributed as separate archives:
python -c "import thousandworlds as tw; tw.download_baselines()"To run baselines yourself:
python -m thousandworlds.run_model train_mean single-complete
python -m thousandworlds.run_model --config results/models/multi-partial/pca_mlp/config.jsonThe first form runs a method on a subset with default hyperparameters (override
with flags); the second reproduces a published baseline from its checked-in
config.json. Each run writes predictions, metrics, and the resolved config to
results/models/<subset>/<method>/, overwriting the checked-in results by default
(use --out-dir to redirect).
See notebooks/pca_mlp.ipynb for a quick example that
trains a baseline in-notebook and compares its predictions to the targets.
thousandworlds/
data.py # download + load
preprocessing.py # input/output transforms, normalization
spectral.py # spectral coefficients <-> gridded fields
evaluate.py # RMSE, ACC, energy score, spread-skill ratio, etc.
run_model.py # CLI entry point
make_model_tables.py # regenerate result tables
models/ # baseline implementations
assets/ # precomputed SHT matrix, latitude weights
dataset/ # inputs.csv, subset CSVs, arrays after download
results/ # configs, metrics, published tables
notebooks/ # quickstart, explore_trappist1e, pca_mlp
tests/ # test suite
If you use ThousandWorlds, please cite the paper:
@article{thousandworlds2026,
title = {ThousandWorlds: A benchmark for climate emulation of potentially habitable exoplanets},
author = {Stevenson, Edward T. and Mak, Mei Ting and Wolf, Eric and Sergeev, Denis E. and Hammond, Tobi and Mayne, N. J. and Cranmer, Miles},
year = {2026},
eprint = {2606.18338},
archivePrefix = {arXiv},
doi = {10.48550/arXiv.2606.18338}
}
