[rollout] fix: fix tool_agent_loop gsm8k task not use ground_truth in dataset #2740

vllbc · 2025-07-24T13:17:14Z

… in dataset

What does this PR do?

tool_agent_loop did not pass in the call tool's' creat_kwargs', resulting in a missing ground_truth

Checklist Before Starting

Search for similar PRs. Paste at least one query link here: ...
Format the PR title as [{modules}] {type}: {description} (This will be checked by the CI)

issue

In the previous implementation, the parameters for tool calls in the dataset were not passed in, resulting in the absence of ground_truth in the gsm8k task. Like:

On this basis, passing tool_kwargs can solve this problem.

    async def _call_tool(self, tool_call: FunctionCall, tools_kwargs: dict[str, Any]) -> dict[str, str]:
        """Call tool and return tool response."""
        tool, instance_id = None, None
        try:
            # TODO: append malformed tool_call to the prompt: invalid function name or arguments
            tool_name = tool_call.name
            tool_args = json.loads(tool_call.arguments)
            tool = self.tools[tool_name]
            kwargs = tools_kwargs.get(tool_name, {})
            instance_id = await tool.create(create_kwargs=kwargs.get("create_kwargs", {}))
            tool_response, _, _ = await tool.execute(instance_id, tool_args)

So the ground_truth can be used in Tool:

Checklist Before Submitting

Important

Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.

Read the Contribute Guide.
Apply pre-commit checks: pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always
Add / Update the documentation.
Add unit or end-to-end test(s) to the CI workflow to cover all the code. If not feasible, explain why: ...
Once your PR is ready for CI, send a message in the ci-request channel in the verl Slack workspace. (If not accessible, please try the Feishu group (飞书群).)

… in dataset

gemini-code-assist

Code Review

This pull request correctly addresses the issue of ground_truth not being passed to the gsm8k tool by plumbing tools_kwargs through the agent loop. The changes look good in principle, but I've identified two critical issues that need to be addressed. One is a duplicated line of code that calls sleep twice on the servers, and the other is a logical error in a conditional check that will lead to a runtime exception. Please see my detailed comments below.

verl/experimental/agent_loop/agent_loop.py

verl/tools/gsm8k_tool.py

vllbc · 2025-07-25T03:24:57Z

If there is no problem with this writing method, I will modify all existing tool implementations to pass ground_truth from create_kwargs.

verl/experimental/agent_loop/agent_loop.py

wuxibin89 · 2025-07-28T02:18:58Z

verl/experimental/agent_loop/agent_loop.py

                tokenizer=self.tokenizer,
            )
-            output = await agent_loop.run(messages, sampling_params)
+            output = await agent_loop.run(messages, sampling_params, tools_kwargs)


This will break all subclasses of AgentLoopBase, we should make tools_kwargs an optional argument.
https://github.com/volcengine/verl/blob/main/verl/experimental/agent_loop/agent_loop.py#L171

Perhaps I should add **kwargs to the run method of the AgentLoopBase class?

vllbc · 2025-07-31T06:32:43Z

I have resolved all conflicts, can this PR be merged? @wuxibin89

wuxibin89 · 2025-07-31T06:50:22Z

@vllbc Please follow the instruction to format code:
https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting

vllbc · 2025-07-31T07:06:31Z

@vllbc Please follow the instruction to format code: https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting

Done. Should I modify all tool implementations that require passing ground_truth when creating? I found that SandboxFusionTool and Geo3kTool also fail to handle the logic of ground_truth correctly. Alternatively, a new PR can be created to address this issue.

wuxibin89 · 2025-07-31T07:12:21Z

@vllbc Please follow the instruction to format code: https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting

Done. Should I modify all tool implementations that require passing ground_truth when creating? I found that SandboxFusionTool and Geo3kTool also fail to handle the logic of ground_truth correctly. Alternatively, a new PR can be created to address this issue.

After we switch to agent loop, these tools should not be maintained in verl. We should move them to recipe, and clean them up eventually. #2618

… dataset (volcengine#2740) … in dataset ### What does this PR do? > tool_agent_loop did not pass in the call tool's' creat_kwargs', resulting in a missing ground_truth ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) ### issue In the previous implementation, the parameters for tool calls in the dataset were not passed in, resulting in the absence of ground_truth in the gsm8k task. Like: <img width="2022" height="186" alt="80084dd040d1a105c12403928ba36d08" src="https://github.com/user-attachments/assets/51ed35c6-3cab-4feb-a560-5cf6f64feced" /> On this basis, passing tool_kwargs can solve this problem. ```python async def _call_tool(self, tool_call: FunctionCall, tools_kwargs: dict[str, Any]) -> dict[str, str]: """Call tool and return tool response.""" tool, instance_id = None, None try: # TODO: append malformed tool_call to the prompt: invalid function name or arguments tool_name = tool_call.name tool_args = json.loads(tool_call.arguments) tool = self.tools[tool_name] kwargs = tools_kwargs.get(tool_name, {}) instance_id = await tool.create(create_kwargs=kwargs.get("create_kwargs", {})) tool_response, _, _ = await tool.execute(instance_id, tool_args) ``` So the `ground_truth` can be used in Tool: <img width="1984" height="188" alt="image" src="https://github.com/user-attachments/assets/08f75753-4bcb-42f9-a878-5d455e8ed552" /> ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

[fix] agent_loop: fix tool_agent_loop gsm8k task not use ground_truth…

20f3251

… in dataset

gemini-code-assist bot reviewed Jul 24, 2025

View reviewed changes

verl/experimental/agent_loop/agent_loop.py Outdated Show resolved Hide resolved

verl/tools/gsm8k_tool.py Outdated Show resolved Hide resolved

vllbc added 2 commits July 24, 2025 21:21

fix some bugs

3202fb0

Do not modify any parameters of the gsm8k tool

2e3432b

chenhaiq requested a review from wuxibin89 July 25, 2025 03:48

JoostvDoorn reviewed Jul 25, 2025

View reviewed changes

verl/experimental/agent_loop/agent_loop.py Outdated Show resolved Hide resolved

rename tool_kwargs to tool_kwargs_batch and use ruff format

ef90aec

wuxibin89 reviewed Jul 28, 2025

View reviewed changes

add kwargs to avoid break all subclasses of AgentLoopBase

82a9542

vllbc requested a review from wuxibin89 July 28, 2025 07:05

vllbc and others added 2 commits July 31, 2025 14:12

Merge branch 'main' into fix_tool_agent_loop

8a6d421

remove some blank after fix conflict

632caae

wuxibin89 changed the title ~~[agent] fix: fix tool_agent_loop gsm8k task not use ground_truth…~~ [agent] fix: fix tool_agent_loop gsm8k task not use ground_truth in dataset Jul 31, 2025

wuxibin89 changed the title ~~[agent] fix: fix tool_agent_loop gsm8k task not use ground_truth in dataset~~ [rollout] fix: fix tool_agent_loop gsm8k task not use ground_truth in dataset Jul 31, 2025

using pre-commit to format code

21892ad

wuxibin89 approved these changes Jul 31, 2025

View reviewed changes

wuxibin89 merged commit f5bc3ca into volcengine:main Jul 31, 2025
45 of 48 checks passed

vllbc deleted the fix_tool_agent_loop branch August 1, 2025 02:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[rollout] fix: fix tool_agent_loop gsm8k task not use ground_truth in dataset #2740

[rollout] fix: fix tool_agent_loop gsm8k task not use ground_truth in dataset #2740

Uh oh!

vllbc commented Jul 24, 2025 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

vllbc commented Jul 25, 2025 •

edited

Loading

Uh oh!

Uh oh!

wuxibin89 Jul 28, 2025

Uh oh!

vllbc Jul 28, 2025

Uh oh!

vllbc commented Jul 31, 2025

Uh oh!

wuxibin89 commented Jul 31, 2025

Uh oh!

vllbc commented Jul 31, 2025

Uh oh!

wuxibin89 commented Jul 31, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[rollout] fix: fix tool_agent_loop gsm8k task not use ground_truth in dataset #2740

[rollout] fix: fix tool_agent_loop gsm8k task not use ground_truth in dataset #2740

Uh oh!

Conversation

vllbc commented Jul 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Checklist Before Starting

issue

Checklist Before Submitting

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

vllbc commented Jul 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

wuxibin89 Jul 28, 2025

Choose a reason for hiding this comment

Uh oh!

vllbc Jul 28, 2025

Choose a reason for hiding this comment

Uh oh!

vllbc commented Jul 31, 2025

Uh oh!

wuxibin89 commented Jul 31, 2025

Uh oh!

vllbc commented Jul 31, 2025

Uh oh!

wuxibin89 commented Jul 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

vllbc commented Jul 24, 2025 •

edited

Loading

vllbc commented Jul 25, 2025 •

edited

Loading

wuxibin89 commented Jul 31, 2025 •

edited

Loading