This repository provides the implementation accompanying the paper “MULTI-LINGUAL ML BENCHMARK FOR AUTOML”.
It includes the code for dataset construction, the evaluation framework, and the agents assessed within this benchmark.
We use uv for environment management.
Install uv once, then run uv sync (or uv pip install -r requirements.txt) inside the project to create the virtual environment.
-
Install dependencies:
uv sync
-
Activate the virtual environment:
source .venv/bin/activate -
Build the agent runtime:
python run.py build-runtime -i aide --agent-dir agents/aide
(If you use another agent, keep the same file structure and command. See
python run.py build-runtime --helpfor details.) -
Download and prepare the dataset:
python run.py prepare-data huggingface # preferred way, you may delete the cache by passing the --remove-cache option # OR python run.py prepare-data gdrive
(The dataset can also be downloaded manually from the hugginface hub and placing the
data,tasksdirectories intocompetitions)(If you wish to download from GDrive and encounter an error with
gdown, manually download the data from Google Drive. You would also need to download task descriptions manually)
After these steps, you should see the following structure:
.
├── run.py
└── competitions/
├── data/
├── competitions.json
└── tasks/
-
Configure agent parameters in the corresponding directory (e.g.
agents/aide/config.yaml).
Make sure environment variables such as$OPENAI_API_KEYare exported in your shell. -
Run the benchmark (see
python run.py bench --helpfor more options):python run.py bench -i aide -w 4 --agent-dir agents/aide --seed 42 --args-variant extended --code-variant extended