-
Notifications
You must be signed in to change notification settings - Fork 2.4k
[rollout] fix: fix tool_agent_loop gsm8k task not use ground_truth in dataset #2740
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request correctly addresses the issue of ground_truth not being passed to the gsm8k tool by plumbing tools_kwargs through the agent loop. The changes look good in principle, but I've identified two critical issues that need to be addressed. One is a duplicated line of code that calls sleep twice on the servers, and the other is a logical error in a conditional check that will lead to a runtime exception. Please see my detailed comments below.
|
If there is no problem with this writing method, I will modify all existing tool implementations to pass ground_truth from create_kwargs. |
| tokenizer=self.tokenizer, | ||
| ) | ||
| output = await agent_loop.run(messages, sampling_params) | ||
| output = await agent_loop.run(messages, sampling_params, tools_kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will break all subclasses of AgentLoopBase, we should make tools_kwargs an optional argument.
https://github.com/volcengine/verl/blob/main/verl/experimental/agent_loop/agent_loop.py#L171
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps I should add **kwargs to the run method of the AgentLoopBase class?
|
I have resolved all conflicts, can this PR be merged? @wuxibin89 |
|
@vllbc Please follow the instruction to format code: |
Done. Should I modify all tool implementations that require passing ground_truth when creating? I found that SandboxFusionTool and Geo3kTool also fail to handle the logic of ground_truth correctly. Alternatively, a new PR can be created to address this issue. |
After we switch to agent loop, these tools should not be maintained in verl. We should move them to recipe, and clean them up eventually. #2618 |
… dataset (volcengine#2740) … in dataset ### What does this PR do? > tool_agent_loop did not pass in the call tool's' creat_kwargs', resulting in a missing ground_truth ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) ### issue In the previous implementation, the parameters for tool calls in the dataset were not passed in, resulting in the absence of ground_truth in the gsm8k task. Like: <img width="2022" height="186" alt="80084dd040d1a105c12403928ba36d08" src="https://github.com/user-attachments/assets/51ed35c6-3cab-4feb-a560-5cf6f64feced" /> On this basis, passing tool_kwargs can solve this problem. ```python async def _call_tool(self, tool_call: FunctionCall, tools_kwargs: dict[str, Any]) -> dict[str, str]: """Call tool and return tool response.""" tool, instance_id = None, None try: # TODO: append malformed tool_call to the prompt: invalid function name or arguments tool_name = tool_call.name tool_args = json.loads(tool_call.arguments) tool = self.tools[tool_name] kwargs = tools_kwargs.get(tool_name, {}) instance_id = await tool.create(create_kwargs=kwargs.get("create_kwargs", {})) tool_response, _, _ = await tool.execute(instance_id, tool_args) ``` So the `ground_truth` can be used in Tool: <img width="1984" height="188" alt="image" src="https://github.com/user-attachments/assets/08f75753-4bcb-42f9-a878-5d455e8ed552" /> ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
… dataset (volcengine#2740) … in dataset ### What does this PR do? > tool_agent_loop did not pass in the call tool's' creat_kwargs', resulting in a missing ground_truth ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) ### issue In the previous implementation, the parameters for tool calls in the dataset were not passed in, resulting in the absence of ground_truth in the gsm8k task. Like: <img width="2022" height="186" alt="80084dd040d1a105c12403928ba36d08" src="https://github.com/user-attachments/assets/51ed35c6-3cab-4feb-a560-5cf6f64feced" /> On this basis, passing tool_kwargs can solve this problem. ```python async def _call_tool(self, tool_call: FunctionCall, tools_kwargs: dict[str, Any]) -> dict[str, str]: """Call tool and return tool response.""" tool, instance_id = None, None try: # TODO: append malformed tool_call to the prompt: invalid function name or arguments tool_name = tool_call.name tool_args = json.loads(tool_call.arguments) tool = self.tools[tool_name] kwargs = tools_kwargs.get(tool_name, {}) instance_id = await tool.create(create_kwargs=kwargs.get("create_kwargs", {})) tool_response, _, _ = await tool.execute(instance_id, tool_args) ``` So the `ground_truth` can be used in Tool: <img width="1984" height="188" alt="image" src="https://github.com/user-attachments/assets/08f75753-4bcb-42f9-a878-5d455e8ed552" /> ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
… dataset (volcengine#2740) … in dataset ### What does this PR do? > tool_agent_loop did not pass in the call tool's' creat_kwargs', resulting in a missing ground_truth ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) ### issue In the previous implementation, the parameters for tool calls in the dataset were not passed in, resulting in the absence of ground_truth in the gsm8k task. Like: <img width="2022" height="186" alt="80084dd040d1a105c12403928ba36d08" src="https://github.com/user-attachments/assets/51ed35c6-3cab-4feb-a560-5cf6f64feced" /> On this basis, passing tool_kwargs can solve this problem. ```python async def _call_tool(self, tool_call: FunctionCall, tools_kwargs: dict[str, Any]) -> dict[str, str]: """Call tool and return tool response.""" tool, instance_id = None, None try: # TODO: append malformed tool_call to the prompt: invalid function name or arguments tool_name = tool_call.name tool_args = json.loads(tool_call.arguments) tool = self.tools[tool_name] kwargs = tools_kwargs.get(tool_name, {}) instance_id = await tool.create(create_kwargs=kwargs.get("create_kwargs", {})) tool_response, _, _ = await tool.execute(instance_id, tool_args) ``` So the `ground_truth` can be used in Tool: <img width="1984" height="188" alt="image" src="https://github.com/user-attachments/assets/08f75753-4bcb-42f9-a878-5d455e8ed552" /> ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
… in dataset
What does this PR do?
Checklist Before Starting
[{modules}] {type}: {description}(This will be checked by the CI)issue
In the previous implementation, the parameters for tool calls in the dataset were not passed in, resulting in the absence of ground_truth in the gsm8k task. Like:
On this basis, passing tool_kwargs can solve this problem.
So the
ground_truthcan be used in Tool:Checklist Before Submitting
Important
Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.
pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=alwaysci-requestchannel in theverlSlack workspace. (If not accessible, please try the Feishu group (飞书群).)