Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
56 changes: 56 additions & 0 deletions recipe/infigui-g1/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
# Recipe for InfiGUI-G1

This directory contains the official implementation for the paper [InfiGUI-G1: Advancing GUI Grounding with Adaptive Exploration Policy Optimization](https://arxiv.org/abs/2508.05731).

This work introduces Adaptive Exploration Policy Optimization (AEPO), a policy optimization framework designed to enhance GUI grounding in Multimodal Large Language Models (MLLMs). AEPO improves exploration efficiency by employing a multi-answer generation strategy and a theoretically grounded Adaptive Exploration Reward (AER) function. This approach effectively addresses the challenge of semantic alignment in complex GUI grounding tasks.

We provide training scripts for both 3B and 7B models, configured for a single machine with 8 GPUs by default.

## Environment Setup

Please follow the main environment setup guide for `verl`.

The provided scripts use the following Docker image: `verlai/verl:app-verl0.5-transformers4.55.4-sglang0.4.10.post2-mcore0.13.0-te2.2`

## Data Preparation

Before starting the training, you need to download the example dataset. This dataset is a filtered version of [omniact](https://huggingface.co/datasets/Writer/omniact), containing only grounding tasks and excluding easy samples.

The data is hosted on the Hugging Face. You can download it using the `huggingface-cli`:

```bash
huggingface-cli download --repo-type dataset --resume-download InfiX-ai/omniact_grounding_filtered --local-dir data/omniact_grounding_filtered
```

This command will download the training and validation parquet files into the `data/omniact_grounding_filtered` directory, which is the default path used by the scripts.

## Training

We provide scripts to train the 3B and 7B models. Please run them from the root directory of `verl`.

- **Train the 3B model:**

```bash
bash recipe/infigui-g1/run_3b.sh
```

- **Train the 7B model:**

```bash
bash recipe/infigui-g1/run_7b.sh
```

## Using Custom Data

If you wish to train on your own dataset, please format your data to match the structure of the example files located in `data/omniact_grounding_filtered`.

Once your data is ready, you need to update the data path arguments in the training script.

In `run_3b.sh` or `run_7b.sh`, modify the following lines:

```bash
data.train_files=./path/to/your/train_data.parquet \
data.val_files=./path/to/your/val_data.parquet \
```

Replace the paths with the location of your custom data files.
Loading