ML2B

This repository provides the implementation accompanying the paper “MULTI-LINGUAL ML BENCHMARK FOR AUTOML”.
It includes the code for dataset construction, the evaluation framework, and the agents assessed within this benchmark.

Usage

Requirements

We use uv for environment management.
Install uv once, then run uv sync (or uv pip install -r requirements.txt) inside the project to create the virtual environment.

Prepare Environment

Install dependencies:
```
uv sync
```
Activate the virtual environment:
```
source .venv/bin/activate
```
Build the agent runtime:
```
python run.py build-runtime -i aide --agent-dir agents/aide
```
(If you use another agent, keep the same file structure and command. See python run.py build-runtime --help for details.)
Download and prepare the dataset:
```
python run.py prepare-data huggingface    # preferred way, you may delete the cache by passing the --remove-cache option
# OR
python run.py prepare-data gdrive
```
(The dataset can also be downloaded manually from the hugginface hub and placing the data,tasks directories into competitions)

(If you wish to download from GDrive and encounter an error with gdown, manually download the data from Google Drive. You would also need to download task descriptions manually)

After these steps, you should see the following structure:

.
├── run.py
└── competitions/
    ├── data/
    ├── competitions.json
    └── tasks/

Running the Benchmark

Configure agent parameters in the corresponding directory (e.g. agents/aide/config.yaml).
Make sure environment variables such as $OPENAI_API_KEY are exported in your shell.

Run the benchmark (see python run.py bench --help for more options):

python run.py bench -i aide -w 4 --agent-dir agents/aide --seed 42 --args-variant extended --code-variant extended

Name		Name	Last commit message	Last commit date
Latest commit History 132 Commits
agents		agents
competitions		competitions
environments		environments
leakage		leakage
loaders		loaders
python		python
src		src
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
grade.sh		grade.sh
grade_manual.py		grade_manual.py
pyproject.toml		pyproject.toml
run.py		run.py
test_submission.sh		test_submission.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ML2B

Usage

Requirements

Prepare Environment

Running the Benchmark

About

Uh oh!

Releases

Packages

Contributors 8

Uh oh!

Languages

License

enaix/ml2b

Folders and files

Latest commit

History

Repository files navigation

ML2B

Usage

Requirements

Prepare Environment

Running the Benchmark

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 8

Uh oh!

Languages

Packages