-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Fix and refactor final answer checks #1448
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| yield action_step | ||
| yield FinalAnswerStep(handle_agent_output_types(final_answer)) | ||
|
|
||
| def _execute_step(self, memory_step: ActionStep) -> Generator[ChatMessageStreamDelta | FinalOutput]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think porting this logic into a separate function obfuscates the logic more than it clarifies.
| tools: list[Tool], | ||
| model: Model, | ||
| prompt_templates: PromptTemplates | None = None, | ||
| instructions: str | None = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is from #1442, will not appear after merging pr 1442.
a2e464e to
fc7cc89
Compare
b5d3d18 to
54dbde6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR refactors the final answer handling in agents and updates related tests to use PIL.Image instances and the renamed step_number field.
- Refactored
ActionStep.dict()to serialize images, addis_final_answer, and replace"step"with"step_number". - Replaced the old
FinalOutputtype withActionOutputand updated streaming logic inagents.py. - Updated tests to use
Image.new(...), renamed keys in dict assertions, and added scenarios for final answer checks.
Reviewed Changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| tests/test_memory.py | Switched from string paths to PIL.Image objects, renamed "step" to "step_number", added checks for observations_images. |
| tests/test_agents.py | Added a test case for final_answer_checks, updated max_steps usage, and renamed planning step variables. |
| src/smolagents/memory.py | Updated ActionStep.dict() to output step_number, serialize observations_images, and include is_final_answer. |
| src/smolagents/agents.py | Removed FinalOutput, introduced ActionOutput, and refactored the streaming loop to validate and flag final answers. |
Comments suppressed due to low confidence (2)
tests/test_memory.py:101
- Consider adding an assertion to verify that
action_step_dict["observations_images"]contains the expected byte data (e.g., matchingImage.new(...).tobytes()).
assert "observations_images" in action_step_dict
tests/test_memory.py:203
- Add assertions in
test_task_step_to_messagesto confirm that the generated messages include an image entry (e.g., check for a content dict with"type": "image").
task_step = TaskStep(task="This is a task.", task_images=[Image.new("RGB", (100, 100))])
| @dataclass | ||
| class FinalOutput: | ||
| output: Any | None | ||
| class ActionOutput: |
Copilot
AI
Jun 18, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ActionOutput class is missing the @dataclass decorator, so instances won’t accept constructor arguments. Add @dataclass above this class definition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No it isn't, go home copilot you're drunk
albertvillanova
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for addressing these issues.
Not sure about the need to replace FinalOutput with ActionOutput and the need to add the is_final_answer...
tests/test_agents.py
Outdated
| agent = CodeAgent( | ||
| model=FakeCodeModel(), | ||
| tools=[], | ||
| final_answer_checks=[lambda x, y: x == 7.2904], | ||
| verbosity_level=1000, | ||
| ) | ||
| output = agent.run("Dummy task.") | ||
| assert output == 7.2904 # Check that output is correct | ||
| assert len([step for step in agent.memory.steps if isinstance(step, ActionStep)]) == 2 | ||
| assert "Error raised in check" not in str(agent.write_memory_to_messages()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this test passes before the fixes introduced in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, just fixed it to not be passing before!
e0b2e39 to
ca5a539
Compare
albertvillanova
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand correctly, the ActionOutput class will be:
- Either
ActionOutput(output: Any, is_final_answer=True)for final answer - Or
ActionOutput(output=None, is_final_answer=False)for non final answer
IMHO it would be simpler to return:
- Either
FinalOutput(output: Any)for final answer - Or
Nonefor non final answer
I find the ActionOutput is non-optimal because it adds complexity without adding relevant information, but feel free to ignore it!
|
@Albert pasting my slack message here for visibility:
|
|
@aymeric-roucher I think when streaming, users are already consuming The alternative approach just adds a None: |
This refactor fixes #1440 and refactors the logic to address underlying issues such as "agent cannot return None in final_answer"