Skip to content

HHYHRHY/OWMM-Agent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OWMM-Agent

Accepted at NeurIPS 2025

This repo maintains an overview of the OWMM-Agent project, as introduced in paper "OWMM-Agent: Open World Mobile Manipulation With Multi-modal Agentic Data Synthesis".

📖Paper|🗂️Dataset|🤗Models

OWMM-Agent Banner

Table of Contents

Introduction

The rapid progress of navigation, manipulation, and vision models has made mobile manipulators capable in many specialized tasks. However, the open-world mobile manipulation (OWMM) task remains a challenge due to the need for generalization to open-ended instructions and environments, as well as the systematic complexity to integrate high-level decision making with low-level robot control based on both global scene understanding and current agent state. To address this complexity, we propose a novel multi-modal agent architecture that maintains multi-view scene frames and agent states for decision-making and controls the robot by function calling. A second challenge is the hallucination from domain shift. To enhance the agent performance, we further introduce an agentic data synthesis pipeline for the OWMM task to adapt the VLM model to our task domain with instruction fine-tuning. We highlight our fine-tuned OWMM-VLM as the first dedicated foundation model for mobile manipulators with global scene understanding, robot state tracking, and multi-modal action generation in a unified model. Through experiments, we demonstrate that our model achieves SOTA performance compared to other foundation models including GPT-4o and strong zero-shot generalization in real world. In this repository, we provide the complete pipeline code for data collection and data annotation, as well as the code for step evaluation and simulator evaluation.

Installation

You should first clone our repo:

git clone https://github.com/HHYHRHY/OWMM-Agent.git

Install Habitat environment and datasets

Please follow the instructions in the Install Habitat Environment to install the Habitat environment. Please refer to the Meta official repository habitat-lab for troubleshooting and more information.

For extra dependencies in Habitat and original datasets used in OWMM-VLM, please follow the installation instructions and usage of dataset download in Habitat-MAS Package. We have not utilized the MP3D dataset at present, so there is no need to download it.

Install VLM dependencies

For the dependencies required for model fine-tuning and deployment, please refer to InternVL2.5. For the dependencies of the baselines, please refer to the dependency downloads of PIVOT, Robopoint, and GPT.

Usage

For dataset generation and simulator evaluation, Please follow the instructions in sim. After sampling dataset from dataset generation, please refer to the instructions in dataset_annotation to obtain annotated datasets. For step evaluation, please follow the instructions in step_evaluation.

Citation

If you find our work helpful, please cite us:

@article{chen2025owmm,
  title={OWMM-Agent: Open World Mobile Manipulation With Multi-modal Agentic Data Synthesis},
  author={Chen, Junting and Liang, Haotian and Du, Lingxiao and Wang, Weiyun and Hu, Mengkang and Mu, Yao and Wang, Wenhai and Dai, Jifeng and Luo, Ping and Shao, Wenqi and others},
  journal={arXiv preprint arXiv:2506.04217},
  year={2025}
}

Credit

This repo is built upon EMOS, which built based on the Habitat Project and Habitat Lab by Meta. We would like to thank the authors of EMOS and the original Habitat project for their contributions to the community.

Special thanks: The real-world deployment and experiment is based on Robi Butler, which provides a multi-modal communication interfaces for perception and actions between the human/agent and the real robot in the environment.

About

[NeurIPS'2025] "OWMM-Agent: Open World Mobile Manipulation With Multi-modal Agentic Data Synthesis"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages