This repository provides a reference implementation of an SO101 robot arm performing a cube grasping task using ManiSkill. It includes both reinforcement learning and motion planning based solutions to obtain successful trajectories.
Note: This project is only tested on Linux. On non-Linux systems, basic GUI visualization and scene preview may still work, but some features will be limited:
- Motion Planning: The current motion planning library is not compatible with other platforms. You may need to replace parts of the code with a motion planning library that supports your OS.
- RL Parallelism: On non-Linux systems, Python multiprocessing may not work as expected. You may need to adjust the parallelization strategy yourself. The reference code uses TDMPC2, which typically runs with a relatively small number of parallel environments (e.g. 32), so it is possible to train purely on CPU.
To accelerate the real-robot deployment process, we provide an example implementation of a server-client structure as well as a FakeLeRobotEnv that replays dataset trajectories. The following section explains the real-robot setup.
For convenient testing later, we provide a Docker packaging guide. Please refer to this guide when submitting the policy for final real-robot testing.
Add real-world deployment code samples. The simulation eval workflow is similar to the real-world eval code. The reference deployment policy uses the same ACT version as the submodule.
If your own LeRobot policy fails in simulation, check the grasp_cube/real/act_policy.py implementation. Crucially, LeRobot requires loading data pre- and post-processors in addition to the policy itself.
It is recommended to run uv sync each time you pull updates from GitHub.
Since the LeRobot version has undergone significant changes, we provide a LeRobot branch used by the TAs and have updated the reference for using the LeRobot dataset.
We provide an updated URDF under grasp_cube/assets/robots/so101/so101_new to address several issues:
- Fixes mirroring issues from the earlier
so101_old, making it consistent with the real robot. - Improves the gripper collision meshes to make grasping easier.
- Refines the inertia of some links to make the simulation more stable.
- Adds two auxiliary links near the gripper fingertips for computing the TCP (Tool Center Point), which is useful for motion planning.
From the repo root:
uv syncIf you want to run RL training, install the RL-specific dependencies:
uv pip install -r rl/tdmpc2/requirements.txtIf you want to run LeRobot example, install the dependencies:
git submodule update --init --recursive
cd external/lerobot
uv pip install -e .
We will test the model on the real robot using a server-client setup. The advantage of this approach is that it fully decouples your model from the environment, preventing risks caused by tight coupling.
The environment acts as the client. In the current code repository, we provide a packaged real-device environment for final testing and a simulated real-device environment for debugging. Before testing on the real device, you can use the simulated environment to verify that the model-side packaging is correct.
On the client side, it is recommended to create a new Python environment and install the LeRobot library provided in the submodule before running.
And you also need to install env_client in this repo.
uv pip install -e packages/env-client
The simulated real-device environment can be run as follows:
uv run grasp_cube/real/run_fake_env_client.py --env.dataset-path datasets/lift
Here, dataset-path refers to the LeRobot Dataset we provide.
After successful execution, you will see:
[MonitorWrapper] Panel: http://0.0.0.0:9000
[EvalRecordWrapper] Output dir: outputs/eval_records/20251226_124302
Waiting for server at ws://0.0.0.0:8000...
Connection refused, retrying in 5 seconds...
You can open the webpage http://0.0.0.0:9000 to view the interactive interface. Clicking Stop sends a termination command directly to the environment. When you click Stop, you will be prompted to indicate whether this evaluation was successful. After confirming, the environment will enter a waiting for Reset state. Clicking Reset will start the next evaluation.
For the simulated environment, by default, the outputs folder also contains a comparison between the policy output actions and the ground truth actions from the dataset. This can be used to verify whether your I/O is consistent.
On the server side, you can use your own dependencies without worrying about how the environment is implemented. Only you need to do is to install env_client by
pip install -e packages/env-client
Please refer to grasp_cube/real/act_policy.py and grasp_cube/real/serve_act_policy.py to wrap your own policy into a runnable server.
When submitting, please upload a Docker image. Refer to Packaging Your Policy Server with Docker.
The SO101 robot is registered and imported in grasp_cube/agents/robots/so101/so_101.py. For details on adding custom robots to ManiSkill, see the official tutorial: Custom Robots.
To preview the SO101 robot imported into ManiSkill:
uv run hello_robot.pyExpected preview:
The green and red spheres attached to the end-effector are the auxiliary links mentioned above for TCP computation. You can remove them by commenting out the corresponding links in the URDF.
The cube picking task for SO101 is defined in grasp_cube/envs/tasks/pick_cube_so101.py. For how tasks are structured in ManiSkill, see the Tasks tutorial.
To preview the PickCubeSO101 task:
uv run hello_pick_cube.pyExpected preview:
- The blue thin box shows the cube spawn region.
- The green opaque sphere is the goal position.
- The red cube and the green goal are randomly sampled inside the blue region.
- The goal of the task is for the robot to move the red cube to the green goal position, while maintaining a stable grasp and keeping the arm static.
The size and visualization of the blue region (and other task parameters) can be adjusted or disabled directly in grasp_cube/envs/tasks/pick_cube_so101.py.
We implement a basic motion planner in grasp_cube/motionplanning/base_motionplanner. It takes as input a desired end-effector pose and outputs a sequence of waypoints (a path) for the end-effector.
A concrete solution for the PickCubeSO101 environment is implemented in grasp_cube/motionplanning/so101/solutions/pick_cube.py.
To run motion planning for 100 episodes:
uv run -m grasp_cube.motionplanning.so101.run -n 100You should see output similar to:
proc_id: 0: 100%|█| 100/100 [00:27<00:00, 3.62it/s, success_rate=0.56, failed_motion_plan_rate=0.284, avg_episode_length=97.7...
The corresponding trajectories will be saved as HDF5 files under the demos directory.
To visualize the motion planning process during data collection, add the --vis flag:
uv run -m grasp_cube.motionplanning.so101.run -n 100 --visExample visualization:
This includes visualizations of the target gripper poses. For more configuration options, see grasp_cube/motionplanning/so101/run.py.
We use TDMPC2, a model-based RL algorithm with high sample efficiency.
To train a state-based RL policy on PickCubeSO101-v1:
cd rl/tdmpc2
uv run train.py env_id=PickCubeSO101-v1A typical final training log might look like:
eval E: 19,968 I: 1,000,000 R: 30.16 S: 1.00 T: 6:55:02
train E: 19,968 I: 1,000,000 R: 30.19 S: 0.94 T: 6:55:02
Example learned behavior:
These results were obtained on an RTX 4090 Laptop GPU, with GPU memory usage under 5 GB.
Some rough time milestones (for reference, may vary by machine):
- First non-zero success rate around
E ≈ 400,I ≈ 20k, about 10 minutes of training. - Success rate stabilizing above 80% around
E ≈ 4000,I ≈ 200k, about 80 minutes of training.
Download the dataset from Web Learning to your local machine and run the official LeRobot dataset visualization tool:
lerobot-dataset-viz \
--repo-id eai/lift \
--root ./datasets/lift \
--mode local \
--episode-index 0
An example output is shown below:
Running hello_real_robot.py allows you to replay real-world trajectories in simulation and outputs the differences between the simulation and the real world. You can use --help to see all available commands:
uv run hello_real_robot.py --root ./datasets/lift --episode-index 0
Example result:
Most of this codebase is adapted from or inspired by the official ManiSkill repository.






