This repo contains the MuJoCo-based ballbot simulation as well as the RL code from the paper Salehi, Achkan. "Reinforcement Learning for Ballbot Navigation in Uneven Terrain." arXiv preprint arXiv:2505.18417 (2025) (link).
Here are some navigation examples from a trained policy on four different, randomly sampled uneven terrains:
Note that in the above, eval episodes 1, 2, 3 are 4000 timesteps long, while episode 4 is 10000 steps long. The policy has been trained with a maximum of 4000 steps, therefore, this last evaluation can bee seen as a demonstration of generalization capability.
Omniwheels are simulated using capsules with anisotropic friction. This requires a fix that is not (yet) part of the official MuJoCo release. Therefore, you must apply the provided patch to your clone of MuJoCo and build both MuJoCo and the python bindings from source.
For more information, see This discussion.
Make sure you have CMake and a C++17 compiler installed.
1. Clone the MuJoCo repository: `git clone https://github.com/deepmind/mujoco.git` and cd into it.
2. This step is optional but **recommended** due to the patching issue mentioned above: `git checkout 99490163df46f65a0cabcf8efef61b3164faa620`
3. Copy the patch `mujoco_fix.patch` provided in our repository to `<your_mujoco_repo>`, then `cd` into the latter and apply the patch: `patch -p1 < mujoco_fix.patch`
The rest of the instructions are identical to the official MuJoCo guide for building from source:
4. Create a new build directory and cd into it.
5. Run `cmake $PATH_TO_CLONED_REPO` to configure the build.
6. Run `cmake --build .` to build.
7. Select the directory: `cmake $PATH_TO_CLONED_REPO -DCMAKE_INSTALL_PREFIX=<my_install_dir>`
8. After building, install with `cmake --install . `
Once you have built the patched MuJoCo version from above, the steps for building the python bindings are almost identical to those from the official MuJoCo documentation:
- Change to this directory:
cd <the_mujoco_repo_from_above>/mujoco/python
- Create a virtual environment and activate it (I use conda, but whatever floats your boat)
- Generate a source distribution tarball:
bash make_sdist.sh
This will generate many files in <repo_clone_path>/mujoco/python/, among which you'll find mujoco-x.y.z.tar.gz.
4. Run this:
cd dist
export MUJOCO_PATH=/PATH/TO/MUJOCO \
export MUJOCO_PLUGIN_PATH=/PATH/TO/MUJOCO_PLUGIN \
pip install mujoco-x.y.z.tar.gz #replace x.y.z with the appropriate integers
NOTE: If you're using conda, you might need conda install -c conda-forge libstdcxx to avoid some gxx related issues.
Make sure that you have a recent version of pytorch as well as a recent version of stable_baselines3 installed. This code has been tested with torch version '2.7.0+cu126'.
Other requirements can be found in requirements.txt.
cd OpenBallbot-RL/ballbotgym/
pip install -e .
To test that everything works well, run
cd OpenBallbot-RL/scripts
python3 test_pid.py
This uses a simple PID controller to balance the robot on flat terrain.
Edit the OpenBallbot-RL/config/train_ppo_directional.yaml file if necessary, and then
cd scripts
python3 train.py --config ../config/train_ppo_directional.yaml
To see the progress of your training, you can use
python3 ../utils/plotting_tools.py --csv log/progress.csv --config log/config.yaml --plot_train
The default yaml config file should result in something that looks like
Note: The training process uses a pretrained depth-encoder, which is provided in <root>/encoder_frozen/encoder_epoch_53. If for some reason you prefer to train your own, you can use the scripts/gather_data.py and sscripts/pretrain_encoder.py scripts.
You can see how the agent behaves using the OpenBallbot-RL/scripts/test.py script.
python3 test.py --algo ppo --n_test=<numer_of_tests_to_perform> --path <path_to_your_model>
A trained policy is provided in the OpenBallbot-RL/trained_agents/ directory, and can be tested using the line above.
If you use this code or refer to our work, please cite:
@misc{salehi2025reinforcementlearningballbotnavigation,
title={Reinforcement Learning for Ballbot Navigation in Uneven Terrain},
author={Achkan Salehi},
year={2025},
eprint={2505.18417},
archivePrefix={arXiv},
primaryClass={cs.RO},
url={https://arxiv.org/abs/2505.18417},
}





