AutoDiscovery: Open-ended Scientific Discovery via Bayesian Surprise

Link to our NeurIPS 2025 paper: AutoDiscovery: Open-ended Scientific Discovery via Bayesian Surprise

Installation

Create the environment with:

conda env create -f environment.yml
conda activate autods

Set environment variables:

# (for Linux/MacOS/Bash/Cygwin)
export PYTHONPATH=$(pwd):$PYTHONPATH;

# (for Windows CMD)
set PYTHONPATH=%cd%;%PYTHONPATH%

# (if OPENAI_API_KEY is not already set)
export OPENAI_API_KEY=<key>

Datasets

DiscoveryBench

git clone https://github.com/allenai/discoverybench.git temp_db
cp -r temp_db/discoverybench discoverybench
rm -rf temp_db

Blade

git clone https://github.com/behavioral-data/BLADE.git temp_db
cp -r temp_db/blade_bench/datasets blade
rm -rf temp_db

BYO-Datasets!

You can also use your own datasets. To do this, pass in a dataset metadata JSON file containing descriptions of the paths of datasets (relative to the metadata file) and their column descriptions in natural language. You can have a look at the metadata files in the DiscoveryBench directory from above as examples.

Run AutoDS (MCTS-based hypothesis search and verification)

For example, to explore the DiscoveryBench NLS SES dataset, the following command can be used:

python src/run.py \
    --work_dir="work" \
    --out_dir="outputs" \
    --dataset_metadata="discoverybench/real/test/nls_ses/metadata_0.json" \
    --n_experiments=16 \
    --model="gpt-4o" \
    --belief_model="gpt-4o"

To resume a previous exploration, use the --continue_from_dir flag to specify the directory containing the previous exploration logs. This will allow the script to continue from where it left off, using the MCTS nodes it had generated so far.

✍️ Get in touch!

Please reach out to us on email or open a GitHub issue in case of any issues running the code: [email protected] (Dhruv Agarwal), [email protected] (Bodhisattwa Prasad Majumder).

📄 Citation

If you find our work useful, please cite our paper:

@inproceedings{
agarwal2025autodiscovery,
title={AutoDiscovery: Open-ended Scientific Discovery via Bayesian Surprise},
author={Dhruv Agarwal and Bodhisattwa Prasad Majumder and Reece Adamson and Megha Chakravorty and Satvika Reddy Gavireddy and Aditya Parashar and Harshit Surana and Bhavana Dalvi Mishra and Andrew McCallum and Ashish Sabharwal and Peter Clark},
booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems},
year={2025},
url={https://openreview.net/forum?id=kJqTkj2HhF}
}

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
artifacts		artifacts
src		src
.gitignore		.gitignore
NOTICES.txt		NOTICES.txt
README.md		README.md
environment.yml		environment.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AutoDiscovery: Open-ended Scientific Discovery via Bayesian Surprise

Installation

Datasets

DiscoveryBench

Blade

BYO-Datasets!

Run AutoDS (MCTS-based hypothesis search and verification)

✍️ Get in touch!

📄 Citation

About

Uh oh!

Releases

Packages

Contributors 3

Languages

allenai/autodiscovery

Folders and files

Latest commit

History

Repository files navigation

AutoDiscovery: Open-ended Scientific Discovery via Bayesian Surprise

Installation

Datasets

DiscoveryBench

Blade

BYO-Datasets!

Run AutoDS (MCTS-based hypothesis search and verification)

✍️ Get in touch!

📄 Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages