This is the official repository for the paper, "ROTATE: Regret-driven Open-ended Training for Ad Hoc Teamwork", released for reproducing the experiments and figures in the paper. If you're interested in using this code for your own research, please check out our JaxAHT repository instead, which builds on top of this codebase and is meant for general use.
If you find this project useful, please cite the paper:
@misc{wang2025rotate,
title={ROTATE: Regret-driven Open-ended Training for Ad Hoc Teamwork},
author={Caroline Wang, Arrasy Rahman, Jiaxun Cui, Yoonchang Sung, Peter Stone},
archivePrefix={arXiv},
primaryClass={cs.AI},
url = {http://arxiv.org/abs/2505.23686},
eprint={2505.23686},
year={2025}
}- π Installation Guide
βΆοΈ Getting Started: Reproducing ROTATE Results- π Code Overview
- πΊοΈ Project Structure
- π License
- π See Also
Follow instructions at install_instructions.md to install the necessary libraries.
Evaluating trained agents against the heldout evaluation set (referred to as
python download_eval_data.pyTo run the experiments for the ROTATE paper, use the provided bash scripts at teammate_generation/experiments.sh and open_ended_training/experiments.sh.
For teammate generation methods (FCP, BRDiv, CoMeDi), please select the algorithm by modifying teammate_generation/experiments.sh. Then, to run the method for all tasks, run bash teammate_generation/experiments.sh.
Similarly, for the open-ended methods (ROTATE, PAIRED, Minimax Return), please select the algorithm by modifying open_ended_training/experiments.sh. To run the method for all tasks, run bash open_ended_training/experiments.sh.
As a warning, ROTATE training runs take a large amount of disk space (~5GB for a training run with 3 seeds), primarily due to the number of checkpointed policies saved. You can decrease the number of checkpoints to reduce the amount of space used.
Code to reproduce the experimental figures in the ROTATE paper is provided at vis/, and
a general workflow to generate the paper figures is provided at vis/make_paper_plots.sh.
The instructions here assume that you have downloaded the evaluation data already, as specified in the Installation Guide.
- Specify experiment paths at
vis/plot_globals.py - Run
bash vis/make_paper_plots.shto generate and save the paper figures. Figures are stored atresults/neurips_figuresby default.
Note: The first time that the code is run, it may take a while to generate the metrics and create the plots---around 5 minutes for each bar chart, and 30 min for each learning curve chart (since all checkpoints' evaluation results must be processed). The first time that a particular experimental result is processed, a cache file is automatically generated and stored within each experimental result directory, which makes subsequent runs of the visualization scripts much faster.
Cache files can be cleared by running, python vis/clean_cache_files.py.
JaxMARL follows a single-script training paradigm, which enables jit-compiling the entire RL training loop and makes it simple for researchers to modify algorithms. We follow a similar paradigm, but use agent and population interfaces, along with some common utility functions to avoid code duplication.
The code makes the following assumptions:
- Agent policies are assumed to handle "done" signals and reset internally.
- Environments are assumed to "auto-reset", i.e. when the episode is done, the step function should check for this and reset the environment if needed.
The project structure is described here. Additional notes about some folders are provided.
agents/: Contains agent related implementations.common/: Shared utilities and common code.envs/: Environment implementations and wrappers.evaluation/: Evaluation and visualization scripts.ego_agent_training/: All ego agent learning implementations. Currently only supports PPO.marl/: MARL algorithm implementations. Currently only supports IPPO.open_ended_training/: Open-ended learning methods (ROTATE, PAIRED, Minimax Return).vis/: Code to generate the plots shown in the paper.teammate_generation/: Teammate generation algorithms (BRDiv, FCP, CoMeDi).tests/: Test scripts used during development.
The algorithms in this codebase are divided into four categories, and each is stored in its own directory:
- MARL algorithms, located at
marl/ - AHT (Ad Hoc Teamwork) algorithms
- Ego agent training methods, located at
ego_agent_training/ - Two-stage teammate generation methods, located at
teammate_generation/ - Open-ended AHT methods, located at
open_ended_training/
- Ego agent training methods, located at
Note that algorithms from the marl/ and ego_agent_training/ categories are called as subroutines in the other two categories.
For example:
- FCP uses the
marl/ippoimplementation as the teammate generation subroutine. - Two-stage teammate generation methods use
ego_agent_training/ppo_ego.pyas the ego agent training routine.
Within each directory, there is a run.py which serves as the entry point for
all algorithms implemented within the directory.
We use Hydra to manage algorithm and task configurations.
In each directory above, there is a configs/ directory with the
following subdirectories:
configs/algorithm/: Contains algorithm configs, for each algorithm and task combination.configs/hydra/: Contains Hydra settings.configs/task/: Contains environment configs necessary to specify a task.
Given an algorithm and task, Hydra retrieves the appropriate configs from the subdirectories above
and merges them into the master config found in configs/base_config_<method_type>.yaml (e.g., configs/base_config_teammate_generation.yaml).
The algorithm and task may be manually specified by modifying the master config, or by using
Hydra's command line argument support.
For example, the following command runs Fictitious Co-Play on the Level-Based Foraging (LBF) task:
python teammate_generation/run.py task=lbf algorithm=fcp/lbfBy default, results are logged to a local results/ directory, as specified within the configs/hydra/hydra_simple.yaml file for each method type, and to the Weights & Biases (wandb) project specified in the master config.
All metrics are logged using wandb and can be viewed using the wandb web interface.
Please see the wandb documentation for general information about wandb.
Logging settings in each master config allow the user to control whether logging is enabled/disabled.
Before running the evaluation, please ensure that the paths to the ego agent set
point to the desired saved policy paths in the config file, configs/human_proxy_eval.yaml.
To run the human proxy evaluation on Cramped Room, use the following command:
python evaluation/run.py --config-name human_proxy_eval task=overcooked-v1/cramped_room.
The agents/ directory contains:
- Heuristic agents for Overcooked and LBF environments.
- Various actor-critic architectures.
- Population and agent interfaces for RL agents.
You can test Overcooked heuristic agents by running, python tests/test_overcooked_agents.py,
and the LBF heuristic agents by running, python tests/test_lbf_agents.py.
The marl/ directory stores our IPPO implementation.
To run it with wandb logging and using the configs, run:
python marl/run.py task=lbf algorithm=ippo/lbfResults are logged via wandb, but can also be viewed locally in the results/ directory.
The wrapper for the Jumanji LBF environment is stored in the envs/ directory, at envs/jumanji_jaxmarl_wrapper.py. A corresponding test script is stored at tests/test_jumanji_jaxmarl_wrapper.py.
We made some modifications to the JaxMARL Overcooked environment to improve the functionality and ensure environments are solvable.
- Initialization randomization: Previously, setting
random_resetwould lead to random initial agent positions, and randomized initial object states (e.g. pot might be initialized with onions already in it, agents might be initialized holding plates, etc.). We separate the functionality of the argumentrandom_resetinto two arguments:random_resetandrandom_obj_state, whererandom_resetonly controls the initial positions of the two agents. - Agent initial positions: Previously, in a map with disconnected components, it was possible for two agents to be spawned in the same component, making it impossible to solve the task. The Overcooked-v2 environment initializes agents such that one is always spawned on each side of the map.
As described in the "Getting Started" section, the code in this directory is used to generate the plots from the ROTATE paper. Here, we comment on how the code in this folder computes normalization bounds.
You can compute your own normalization upper bounds using the vis/compute_best_returns.py script to walk your results/ directory to recompute the best seen returns for each evaluation partner.
For usage, see the bash script, vis/get_best_returns.sh.
Alternatively, if you do not wish to recompute the normalization upper bounds or download the provided normalization bounds, you can use the development performance bounds provided directly in evaluation/configs/global_heldout_settings.yaml to normalize the results by setting the renormalize_metrics argument of the load_results_for_task() function to False.
Note that the development upper performance bounds are not as high as the normalization upper bounds downloaded by download_eval_data.py as they were computed earlier in the project.
This project is licensed under the MIT License - see the LICENSE file for details.
This project was inspired by the following Jax-based RL repositories. Please check them out!
