Skip to content

Conversation

@vllbc
Copy link
Contributor

@vllbc vllbc commented Jul 24, 2025

… in dataset

What does this PR do?

tool_agent_loop did not pass in the call tool's' creat_kwargs', resulting in a missing ground_truth

Checklist Before Starting

  • Search for similar PRs. Paste at least one query link here: ...
  • Format the PR title as [{modules}] {type}: {description} (This will be checked by the CI)

issue

In the previous implementation, the parameters for tool calls in the dataset were not passed in, resulting in the absence of ground_truth in the gsm8k task. Like:

80084dd040d1a105c12403928ba36d08

On this basis, passing tool_kwargs can solve this problem.

    async def _call_tool(self, tool_call: FunctionCall, tools_kwargs: dict[str, Any]) -> dict[str, str]:
        """Call tool and return tool response."""
        tool, instance_id = None, None
        try:
            # TODO: append malformed tool_call to the prompt: invalid function name or arguments
            tool_name = tool_call.name
            tool_args = json.loads(tool_call.arguments)
            tool = self.tools[tool_name]
            kwargs = tools_kwargs.get(tool_name, {})
            instance_id = await tool.create(create_kwargs=kwargs.get("create_kwargs", {}))
            tool_response, _, _ = await tool.execute(instance_id, tool_args)

So the ground_truth can be used in Tool:

image

Checklist Before Submitting

Important

Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly addresses the issue of ground_truth not being passed to the gsm8k tool by plumbing tools_kwargs through the agent loop. The changes look good in principle, but I've identified two critical issues that need to be addressed. One is a duplicated line of code that calls sleep twice on the servers, and the other is a logical error in a conditional check that will lead to a runtime exception. Please see my detailed comments below.

@vllbc
Copy link
Contributor Author

vllbc commented Jul 25, 2025

If there is no problem with this writing method, I will modify all existing tool implementations to pass ground_truth from create_kwargs.

@chenhaiq chenhaiq requested a review from wuxibin89 July 25, 2025 03:48
tokenizer=self.tokenizer,
)
output = await agent_loop.run(messages, sampling_params)
output = await agent_loop.run(messages, sampling_params, tools_kwargs)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will break all subclasses of AgentLoopBase, we should make tools_kwargs an optional argument.
https://github.com/volcengine/verl/blob/main/verl/experimental/agent_loop/agent_loop.py#L171

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps I should add **kwargs to the run method of the AgentLoopBase class?

@vllbc vllbc requested a review from wuxibin89 July 28, 2025 07:05
@vllbc
Copy link
Contributor Author

vllbc commented Jul 31, 2025

I have resolved all conflicts, can this PR be merged? @wuxibin89

@wuxibin89 wuxibin89 changed the title [agent] fix: fix tool_agent_loop gsm8k task not use ground_truth… [agent] fix: fix tool_agent_loop gsm8k task not use ground_truth in dataset Jul 31, 2025
@wuxibin89 wuxibin89 changed the title [agent] fix: fix tool_agent_loop gsm8k task not use ground_truth in dataset [rollout] fix: fix tool_agent_loop gsm8k task not use ground_truth in dataset Jul 31, 2025
@wuxibin89
Copy link
Collaborator

@vllbc
Copy link
Contributor Author

vllbc commented Jul 31, 2025

@vllbc Please follow the instruction to format code: https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting

Done. Should I modify all tool implementations that require passing ground_truth when creating? I found that SandboxFusionTool and Geo3kTool also fail to handle the logic of ground_truth correctly. Alternatively, a new PR can be created to address this issue.

@wuxibin89
Copy link
Collaborator

wuxibin89 commented Jul 31, 2025

@vllbc Please follow the instruction to format code: https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting

Done. Should I modify all tool implementations that require passing ground_truth when creating? I found that SandboxFusionTool and Geo3kTool also fail to handle the logic of ground_truth correctly. Alternatively, a new PR can be created to address this issue.

After we switch to agent loop, these tools should not be maintained in verl. We should move them to recipe, and clean them up eventually. #2618

@wuxibin89 wuxibin89 merged commit f5bc3ca into volcengine:main Jul 31, 2025
45 of 48 checks passed
@vllbc vllbc deleted the fix_tool_agent_loop branch August 1, 2025 02:29
yellowbee686 pushed a commit to yellowbee686/verl that referenced this pull request Aug 2, 2025
… dataset (volcengine#2740)

… in dataset

### What does this PR do?

> tool_agent_loop did not pass in the call tool's' creat_kwargs',
resulting in a missing ground_truth

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)


### issue
In the previous implementation, the parameters for tool calls in the
dataset were not passed in, resulting in the absence of ground_truth in
the gsm8k task. Like:

<img width="2022" height="186" alt="80084dd040d1a105c12403928ba36d08"
src="https://github.com/user-attachments/assets/51ed35c6-3cab-4feb-a560-5cf6f64feced"
/>

On this basis, passing tool_kwargs can solve this problem.
```python
    async def _call_tool(self, tool_call: FunctionCall, tools_kwargs: dict[str, Any]) -> dict[str, str]:
        """Call tool and return tool response."""
        tool, instance_id = None, None
        try:
            # TODO: append malformed tool_call to the prompt: invalid function name or arguments
            tool_name = tool_call.name
            tool_args = json.loads(tool_call.arguments)
            tool = self.tools[tool_name]
            kwargs = tools_kwargs.get(tool_name, {})
            instance_id = await tool.create(create_kwargs=kwargs.get("create_kwargs", {}))
            tool_response, _, _ = await tool.execute(instance_id, tool_args)
```

So the `ground_truth` can be used in Tool:

<img width="1984" height="188" alt="image"
src="https://github.com/user-attachments/assets/08f75753-4bcb-42f9-a878-5d455e8ed552"
/>


### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
Juniper1021 pushed a commit to Juniper1021/verl that referenced this pull request Aug 7, 2025
… dataset (volcengine#2740)

… in dataset

### What does this PR do?

> tool_agent_loop did not pass in the call tool's' creat_kwargs',
resulting in a missing ground_truth

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)


### issue
In the previous implementation, the parameters for tool calls in the
dataset were not passed in, resulting in the absence of ground_truth in
the gsm8k task. Like:

<img width="2022" height="186" alt="80084dd040d1a105c12403928ba36d08"
src="https://github.com/user-attachments/assets/51ed35c6-3cab-4feb-a560-5cf6f64feced"
/>

On this basis, passing tool_kwargs can solve this problem.
```python
    async def _call_tool(self, tool_call: FunctionCall, tools_kwargs: dict[str, Any]) -> dict[str, str]:
        """Call tool and return tool response."""
        tool, instance_id = None, None
        try:
            # TODO: append malformed tool_call to the prompt: invalid function name or arguments
            tool_name = tool_call.name
            tool_args = json.loads(tool_call.arguments)
            tool = self.tools[tool_name]
            kwargs = tools_kwargs.get(tool_name, {})
            instance_id = await tool.create(create_kwargs=kwargs.get("create_kwargs", {}))
            tool_response, _, _ = await tool.execute(instance_id, tool_args)
```

So the `ground_truth` can be used in Tool:

<img width="1984" height="188" alt="image"
src="https://github.com/user-attachments/assets/08f75753-4bcb-42f9-a878-5d455e8ed552"
/>


### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
whatadayG pushed a commit to whatadayG/verl that referenced this pull request Sep 5, 2025
… dataset (volcengine#2740)

… in dataset

### What does this PR do?

> tool_agent_loop did not pass in the call tool's' creat_kwargs',
resulting in a missing ground_truth

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)


### issue
In the previous implementation, the parameters for tool calls in the
dataset were not passed in, resulting in the absence of ground_truth in
the gsm8k task. Like:

<img width="2022" height="186" alt="80084dd040d1a105c12403928ba36d08"
src="https://github.com/user-attachments/assets/51ed35c6-3cab-4feb-a560-5cf6f64feced"
/>

On this basis, passing tool_kwargs can solve this problem.
```python
    async def _call_tool(self, tool_call: FunctionCall, tools_kwargs: dict[str, Any]) -> dict[str, str]:
        """Call tool and return tool response."""
        tool, instance_id = None, None
        try:
            # TODO: append malformed tool_call to the prompt: invalid function name or arguments
            tool_name = tool_call.name
            tool_args = json.loads(tool_call.arguments)
            tool = self.tools[tool_name]
            kwargs = tools_kwargs.get(tool_name, {})
            instance_id = await tool.create(create_kwargs=kwargs.get("create_kwargs", {}))
            tool_response, _, _ = await tool.execute(instance_id, tool_args)
```

So the `ground_truth` can be used in Tool:

<img width="1984" height="188" alt="image"
src="https://github.com/user-attachments/assets/08f75753-4bcb-42f9-a878-5d455e8ed552"
/>


### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants