[recipe] feat: Add InfiGUI-G1 recipe for MLLM GUI grounding #3242

Sirius-L1 · 2025-08-27T12:20:16Z

What does this PR do?

This PR introduces a new recipe, infigui-g1, for training Multimodal Large Language Models (MLLMs) in GUI grounding tasks. This recipe implements a reinforcement learning approach that significantly improves the model's ability to understand and interact with graphical user interfaces.

Checklist Before Starting

Search for similar PRs. Paste at least one query link here: https://github.com/search?q=repo%3Avolcengine%2Fverl+gui&type=pullrequests
Format the PR title as [{modules}] {type}: {description} (This will be checked by the CI)
- {modules} include fsdp, megatron, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data
- If this PR involves multiple modules, separate them with , like [megatron, fsdp, doc]
- {type} is in feat, fix, refactor, chore, test
- If this PR breaks any API (CLI arguments, config, function signature, etc.), add [BREAKING] to the beginning of the title.
- Example: [BREAKING][fsdp, megatron] feat: dynamic batching

Test

The effectiveness of this recipe has been validated through experiments. Key results are as follows:

The training curves for reward, validation accuracy, and exploration success rate all show a upward trend.
After 156 steps of training on sample data, the 3b model achieves a score of 41.2 on the screenspot-pro benchmark, a substantial improvement over the base model's score of 18.2.

API and Usage Example

The recipe is self-contained and can be run using the provided scripts. For example, to run training with the 3B parameter model:

# In verl path
bash recipe/infigui-g1/run_3b.sh

Design & Code Changes

This PR adds a new, independent recipe located in recipe/infigui-g1/. The changes are fully encapsulated within this directory and do not affect any other part of the codebase.

The new files include:

recipe/infigui-g1/README.md: An introduction to the recipe.
recipe/infigui-g1/run_3b.sh, run_7b.sh: Scripts to launch training.
recipe/infigui-g1/reward_fn.py: Custom reward function implementation.

Checklist Before Submitting

Important

Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.

Read the Contribute Guide.
Apply pre-commit checks: pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always
Add / Update the documentation.
Add unit or end-to-end test(s) to the CI workflow to cover all the code. If not feasible, explain why: ...
Once your PR is ready for CI, send a message in the ci-request channel in the verl Slack workspace. (If not accessible, please try the Feishu group (飞书群).)

CLAassistant · 2025-08-27T12:20:23Z

All committers have signed the CLA.

gemini-code-assist

Code Review

This pull request introduces a new recipe for GUI grounding. The implementation of the custom reward function in reward_fn.py is mostly solid, but I've identified two high-severity issues that could affect the correctness of the reward calculation. One issue is related to the robustness of JSON parsing from the model's output, and the other concerns the use of direct equality comparison for floating-point numbers when checking for collinear points. I've provided suggestions to fix both. The rest of the changes, including the run scripts and documentation, look good.

recipe/infigui-g1/reward_fn.py

…ne#3242) ### What does this PR do? This PR introduces a new recipe, `infigui-g1`, for training Multimodal Large Language Models (MLLMs) in GUI grounding tasks. This recipe implements a reinforcement learning approach that significantly improves the model's ability to understand and interact with graphical user interfaces. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: https://github.com/search?q=repo%3Avolcengine%2Fverl+gui&type=pullrequests - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test The effectiveness of this recipe has been validated through experiments. Key results are as follows: - The training curves for reward, validation accuracy, and exploration success rate all show a upward trend. - After 156 steps of training on sample data, the 3b model achieves a score of **41.2** on the `screenspot-pro` benchmark, a substantial improvement over the base model's score of **18.2**. <img width="345" height="291" alt="Screenshot 2025-08-27 172010" src="https://github.com/user-attachments/assets/9ecd93d5-4f9b-4c40-831c-79a50fd197c4" /> <img width="347" height="292" alt="Screenshot 2025-08-27 171902" src="https://github.com/user-attachments/assets/2e437c1f-9eb0-4106-a6c3-b22125026a79" /> <img width="346" height="293" alt="Screenshot 2025-08-27 171928" src="https://github.com/user-attachments/assets/9c94515d-1501-40f4-979c-95e2f819dc62" /> ### API and Usage Example The recipe is self-contained and can be run using the provided scripts. For example, to run training with the 3B parameter model: ```bash # In verl path bash recipe/infigui-g1/run_3b.sh ``` ### Design & Code Changes This PR adds a new, independent recipe located in `recipe/infigui-g1/`. The changes are fully encapsulated within this directory and do not affect any other part of the codebase. The new files include: - `recipe/infigui-g1/README.md`: An introduction to the recipe. - `recipe/infigui-g1/run_3b.sh`, `run_7b.sh`: Scripts to launch training. - `recipe/infigui-g1/reward_fn.py`: Custom reward function implementation. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

Sirius-L1 added 4 commits August 24, 2025 05:18

add infigui-g1 recipe

3b72eb0

run pre-commit

5b5b4ef

updata

2efc20f

update

402963e

gemini-code-assist bot reviewed Aug 27, 2025

View reviewed changes

recipe/infigui-g1/reward_fn.py Show resolved Hide resolved

recipe/infigui-g1/reward_fn.py Outdated Show resolved Hide resolved

adopt suggestion from gemini-code-assist

7fd7429

vermouth1992 approved these changes Aug 27, 2025

View reviewed changes

vermouth1992 merged commit 1e41334 into volcengine:main Aug 27, 2025
7 checks passed

susumuota mentioned this pull request Aug 27, 2025

feat: add PPO training script for slurm clusters susumuota/verl#1

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[recipe] feat: Add InfiGUI-G1 recipe for MLLM GUI grounding #3242

[recipe] feat: Add InfiGUI-G1 recipe for MLLM GUI grounding #3242

Uh oh!

Sirius-L1 commented Aug 27, 2025

Uh oh!

CLAassistant commented Aug 27, 2025 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[recipe] feat: Add InfiGUI-G1 recipe for MLLM GUI grounding #3242

[recipe] feat: Add InfiGUI-G1 recipe for MLLM GUI grounding #3242

Uh oh!

Conversation

Sirius-L1 commented Aug 27, 2025

What does this PR do?

Checklist Before Starting

Test

API and Usage Example

Design & Code Changes

Checklist Before Submitting

Uh oh!

CLAassistant commented Aug 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

CLAassistant commented Aug 27, 2025 •

edited

Loading