SO101 Cube Grasping Reference

This repository provides a reference implementation of an SO101 robot arm performing a cube grasping task using ManiSkill. It includes both reinforcement learning and motion planning based solutions to obtain successful trajectories.

Note: This project is only tested on Linux. On non-Linux systems, basic GUI visualization and scene preview may still work, but some features will be limited:

Motion Planning: The current motion planning library is not compatible with other platforms. You may need to replace parts of the code with a motion planning library that supports your OS.

RL Parallelism: On non-Linux systems, Python multiprocessing may not work as expected. You may need to adjust the parallelization strategy yourself. The reference code uses TDMPC2, which typically runs with a relatively small number of parallel environments (e.g. 32), so it is possible to train purely on CPU.

Important Update

New

To accelerate the real-robot deployment process, we provide an example implementation of a server-client structure as well as a FakeLeRobotEnv that replays dataset trajectories. The following section explains the real-robot setup.

For convenient testing later, we provide a Docker packaging guide. Please refer to this guide when submitting the policy for final real-robot testing.

Earlier

Add real-world deployment code samples. The simulation eval workflow is similar to the real-world eval code. The reference deployment policy uses the same ACT version as the submodule.

If your own LeRobot policy fails in simulation, check the grasp_cube/real/act_policy.py implementation. Crucially, LeRobot requires loading data pre- and post-processors in addition to the policy itself.

It is recommended to run uv sync each time you pull updates from GitHub.

Since the LeRobot version has undergone significant changes, we provide a LeRobot branch used by the TAs and have updated the reference for using the LeRobot dataset.

We provide an updated URDF under grasp_cube/assets/robots/so101/so101_new to address several issues:

Fixes mirroring issues from the earlier so101_old, making it consistent with the real robot.
Improves the gripper collision meshes to make grasping easier.
Refines the inertia of some links to make the simulation more stable.
Adds two auxiliary links near the gripper fingertips for computing the TCP (Tool Center Point), which is useful for motion planning.

Installation

From the repo root:

uv sync

If you want to run RL training, install the RL-specific dependencies:

uv pip install -r rl/tdmpc2/requirements.txt

If you want to run LeRobot example, install the dependencies:

git submodule update --init --recursive
cd external/lerobot
uv pip install -e .

Real Robot

We will test the model on the real robot using a server-client setup. The advantage of this approach is that it fully decouples your model from the environment, preventing risks caused by tight coupling.

Client

The environment acts as the client. In the current code repository, we provide a packaged real-device environment for final testing and a simulated real-device environment for debugging. Before testing on the real device, you can use the simulated environment to verify that the model-side packaging is correct.

On the client side, it is recommended to create a new Python environment and install the LeRobot library provided in the submodule before running. And you also need to install env_client in this repo.

uv pip install -e packages/env-client

The simulated real-device environment can be run as follows:

uv run grasp_cube/real/run_fake_env_client.py --env.dataset-path datasets/lift

Here, dataset-path refers to the LeRobot Dataset we provide.

After successful execution, you will see:

[MonitorWrapper] Panel: http://0.0.0.0:9000
[EvalRecordWrapper] Output dir: outputs/eval_records/20251226_124302
Waiting for server at ws://0.0.0.0:8000...
Connection refused, retrying in 5 seconds...

You can open the webpage http://0.0.0.0:9000 to view the interactive interface. Clicking Stop sends a termination command directly to the environment. When you click Stop, you will be prompted to indicate whether this evaluation was successful. After confirming, the environment will enter a waiting for Reset state. Clicking Reset will start the next evaluation.

For the simulated environment, by default, the outputs folder also contains a comparison between the policy output actions and the ground truth actions from the dataset. This can be used to verify whether your I/O is consistent.

Server

On the server side, you can use your own dependencies without worrying about how the environment is implemented. Only you need to do is to install env_client by

pip install -e packages/env-client

Please refer to grasp_cube/real/act_policy.py and grasp_cube/real/serve_act_policy.py to wrap your own policy into a runnable server.

When submitting, please upload a Docker image. Refer to Packaging Your Policy Server with Docker.

Getting Started

Robot Model

The SO101 robot is registered and imported in grasp_cube/agents/robots/so101/so_101.py. For details on adding custom robots to ManiSkill, see the official tutorial: Custom Robots.

To preview the SO101 robot imported into ManiSkill:

uv run hello_robot.py

Expected preview:

The green and red spheres attached to the end-effector are the auxiliary links mentioned above for TCP computation. You can remove them by commenting out the corresponding links in the URDF.

Task Definition

The cube picking task for SO101 is defined in grasp_cube/envs/tasks/pick_cube_so101.py. For how tasks are structured in ManiSkill, see the Tasks tutorial.

To preview the PickCubeSO101 task:

uv run hello_pick_cube.py

Expected preview:

The blue thin box shows the cube spawn region.
The green opaque sphere is the goal position.
The red cube and the green goal are randomly sampled inside the blue region.
The goal of the task is for the robot to move the red cube to the green goal position, while maintaining a stable grasp and keeping the arm static.

The size and visualization of the blue region (and other task parameters) can be adjusted or disabled directly in grasp_cube/envs/tasks/pick_cube_so101.py.

Motion Planning

We implement a basic motion planner in grasp_cube/motionplanning/base_motionplanner. It takes as input a desired end-effector pose and outputs a sequence of waypoints (a path) for the end-effector.

A concrete solution for the PickCubeSO101 environment is implemented in grasp_cube/motionplanning/so101/solutions/pick_cube.py.

To run motion planning for 100 episodes:

uv run -m grasp_cube.motionplanning.so101.run -n 100

You should see output similar to:

proc_id: 0: 100%|█| 100/100 [00:27<00:00,  3.62it/s, success_rate=0.56, failed_motion_plan_rate=0.284, avg_episode_length=97.7...

The corresponding trajectories will be saved as HDF5 files under the demos directory.

To visualize the motion planning process during data collection, add the --vis flag:

uv run -m grasp_cube.motionplanning.so101.run -n 100 --vis

Example visualization:

This includes visualizations of the target gripper poses. For more configuration options, see grasp_cube/motionplanning/so101/run.py.

Reinforcement Learning

TDMPC2

We use TDMPC2, a model-based RL algorithm with high sample efficiency.

To train a state-based RL policy on PickCubeSO101-v1:

cd rl/tdmpc2
uv run train.py env_id=PickCubeSO101-v1

A typical final training log might look like:

eval    E: 19,968       I: 1,000,000    R: 30.16        S: 1.00         T: 6:55:02 
train   E: 19,968       I: 1,000,000    R: 30.19        S: 0.94         T: 6:55:02

Example learned behavior:

These results were obtained on an RTX 4090 Laptop GPU, with GPU memory usage under 5 GB.

Some rough time milestones (for reference, may vary by machine):

First non-zero success rate around E ≈ 400, I ≈ 20k, about 10 minutes of training.
Success rate stabilizing above 80% around E ≈ 4000, I ≈ 200k, about 80 minutes of training.

LeRobot Dataset

Download the dataset from Web Learning to your local machine and run the official LeRobot dataset visualization tool:

lerobot-dataset-viz \
    --repo-id eai/lift \
    --root ./datasets/lift \
    --mode local \
    --episode-index 0

An example output is shown below:

Running hello_real_robot.py allows you to replay real-world trajectories in simulation and outputs the differences between the simulation and the real world. You can use --help to see all available commands:

uv run hello_real_robot.py --root ./datasets/lift --episode-index 0

Example result:

Acknowledgement

Most of this codebase is adapted from or inspired by the official ManiSkill repository.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
assets		assets
external		external
grasp_cube		grasp_cube
packages/env-client		packages/env-client
rl/tdmpc2		rl/tdmpc2
scripts		scripts
.gitignore		.gitignore
.gitmodules		.gitmodules
.python-version		.python-version
README.md		README.md
README_zh.md		README_zh.md
docker_tutorial.md		docker_tutorial.md
hello_pick_cube.py		hello_pick_cube.py
hello_real_robot.py		hello_real_robot.py
hello_robot.py		hello_robot.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SO101 Cube Grasping Reference

Important Update

New

Earlier

Installation

Real Robot

Client

Server

Getting Started

Robot Model

Task Definition

Motion Planning

Reinforcement Learning

TDMPC2

LeRobot Dataset

Acknowledgement

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SO101 Cube Grasping Reference

Important Update

New

Earlier

Installation

Real Robot

Client

Server

Getting Started

Robot Model

Task Definition

Motion Planning

Reinforcement Learning

TDMPC2

LeRobot Dataset

Acknowledgement

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages