Reinforcement Learning for Ballbot control in uneven terrain

This repo contains the MuJoCo-based ballbot simulation as well as the RL code from the paper Salehi, Achkan. "Reinforcement Learning for Ballbot Navigation in Uneven Terrain." arXiv preprint arXiv:2505.18417 (2025) (link).

Here are some navigation examples from a trained policy on four different, randomly sampled uneven terrains:

Note that in the above, eval episodes 1, 2, 3 are 4000 timesteps long, while episode 4 is 10000 steps long. The policy has been trained with a maximum of 4000 steps, therefore, this last evaluation can bee seen as a demonstration of generalization capability.

Warning!

Omniwheels are simulated using capsules with anisotropic friction. This requires a fix that is not (yet) part of the official MuJoCo release. Therefore, you must apply the provided patch to your clone of MuJoCo and build both MuJoCo and the python bindings from source.

For more information, see This discussion.

Build instructions

Make sure you have CMake and a C++17 compiler installed.

Building MuJoCo from source (python bindings will be built separately)

1. Clone the MuJoCo repository: `git clone https://github.com/deepmind/mujoco.git` and cd into it.
2. This step is optional but **recommended** due to the patching issue mentioned above: `git checkout 99490163df46f65a0cabcf8efef61b3164faa620`
3. Copy the patch `mujoco_fix.patch` provided in our repository to `<your_mujoco_repo>`, then `cd` into the latter and apply the patch: `patch -p1 < mujoco_fix.patch`

The rest of the instructions are identical to the official MuJoCo guide for building from source:

4. Create a new build directory and cd into it.
5. Run `cmake $PATH_TO_CLONED_REPO` to configure the build.
6. Run `cmake --build .` to build.
7. Select the directory: `cmake $PATH_TO_CLONED_REPO -DCMAKE_INSTALL_PREFIX=<my_install_dir>`
8. After building, install with `cmake --install . `

Building the python bindings

Once you have built the patched MuJoCo version from above, the steps for building the python bindings are almost identical to those from the official MuJoCo documentation:

Change to this directory:

cd <the_mujoco_repo_from_above>/mujoco/python

Create a virtual environment and activate it (I use conda, but whatever floats your boat)
Generate a source distribution tarball:

bash make_sdist.sh

This will generate many files in <repo_clone_path>/mujoco/python/, among which you'll find mujoco-x.y.z.tar.gz. 4. Run this:

cd dist
export MUJOCO_PATH=/PATH/TO/MUJOCO \
export MUJOCO_PLUGIN_PATH=/PATH/TO/MUJOCO_PLUGIN \
pip install mujoco-x.y.z.tar.gz #replace x.y.z with the appropriate integers

NOTE: If you're using conda, you might need conda install -c conda-forge libstdcxx to avoid some gxx related issues.

Other requirements

Make sure that you have a recent version of pytorch as well as a recent version of stable_baselines3 installed. This code has been tested with torch version '2.7.0+cu126'.

Other requirements can be found in requirements.txt.

Install the Ballbot Environment

cd OpenBallbot-RL/ballbotgym/
pip install -e .

Sanity Check

To test that everything works well, run

cd OpenBallbot-RL/scripts
python3 test_pid.py

This uses a simple PID controller to balance the robot on flat terrain.

Training an agent

Edit the OpenBallbot-RL/config/train_ppo_directional.yaml file if necessary, and then

cd scripts
python3  train.py --config ../config/train_ppo_directional.yaml

To see the progress of your training, you can use

python3 ../utils/plotting_tools.py --csv log/progress.csv --config log/config.yaml --plot_train

The default yaml config file should result in something that looks like

Note: The training process uses a pretrained depth-encoder, which is provided in <root>/encoder_frozen/encoder_epoch_53. If for some reason you prefer to train your own, you can use the scripts/gather_data.py and sscripts/pretrain_encoder.py scripts.

Evaluating an agent

You can see how the agent behaves using the OpenBallbot-RL/scripts/test.py script.

python3 test.py --algo ppo --n_test=<numer_of_tests_to_perform> --path <path_to_your_model>

Trained policies

A trained policy is provided in the OpenBallbot-RL/trained_agents/ directory, and can be tested using the line above.

Citation

If you use this code or refer to our work, please cite:

@misc{salehi2025reinforcementlearningballbotnavigation,
      title={Reinforcement Learning for Ballbot Navigation in Uneven Terrain},
      author={Achkan Salehi},
      year={2025},
      eprint={2505.18417},
      archivePrefix={arXiv},
      primaryClass={cs.RO},
      url={https://arxiv.org/abs/2505.18417},
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Reinforcement Learning for Ballbot control in uneven terrain

Warning!

Build instructions

Building MuJoCo from source (python bindings will be built separately)

Building the python bindings

Other requirements

Install the Ballbot Environment

Sanity Check

Training an agent

Evaluating an agent

Trained policies

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 126 Commits
ballbotgym		ballbotgym
config		config
encoder_frozen		encoder_frozen
images		images
policies		policies
scripts		scripts
trained_agents		trained_agents
utils		utils
LICENSE		LICENSE
Readme.md		Readme.md
mujoco_fix.patch		mujoco_fix.patch
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Reinforcement Learning for Ballbot control in uneven terrain

Warning!

Build instructions

Building MuJoCo from source (python bindings will be built separately)

Building the python bindings

Other requirements

Install the Ballbot Environment

Sanity Check

Training an agent

Evaluating an agent

Trained policies

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages